Signal 1M (available here for research purposes)
This dataset was released by Signal Media to facilitate conducting research on news articles. It was initially used for submissions to the NewsIR'16 workshop, but is intended to serve the community for research on news retrieval in general.
Signal-1M Related Tweets (available here for research purposes)
A TREC-like data collection to evaluate approaches for
the task of related-tweet retrieval for news articles.
This collection was described in a peer-reviewed paper in ECIR 2018.
Signal Media shared some sample code for uploading and processing the one million article collection.