A TREC-like data collection to evaluate approaches for the task of related-tweet retrieval for news articles.

https://goo.gl/forms/R9yYo3lQSQTUtnHc2

### Format

Upon downloading the data, you get a single compressed file. You can uncompress it using unzip. Uncompressing yields a folder with 2 files:

• topics : a file containing all of the topics (also known as articles) used as queries to retrieve tweets.
• signal1m_tweets_qrels : A TREC Qrels formatted file with the following fields:
• TOPIC - a unique identifier for an article
• ITERATION - Unused (always 0); included to match TREC Qrels format.
• DOCUMENT - a tweet ID
• RELEVANCY -
• 0: not relevant
• 1: somewhat relevant
• 2: highly relevant

### Using the dataset

As in any TREC task, to use the dataset:
1. Use the topics file as an input to your tweet retrieval approach. In particular, your approach should return a ranked list of tweet IDs for each news article (topic) in a TREC results file format. Let's call it approach.result.
Each line in your file should conform to the following:

topic Q0 tweet-id rank score NAME

You can find the tweet collection used to build this dataset here.

2. Use trec_eval to evaluate the effectiveness of your approach by running:
trec_eval -q signal1m_tweets_qrels approach.result

### Citing

This collection was described in a paper on ECIR 2018: A Data Collection for Evaluating the Retrieval of Related Tweets to News Articles .

@inproceedings{Signal1MRelatedTweetsRetrieval2018,
author    = {Axel Suarez, Dyaa Albakour, David Corney, Miguel Martinez and Jose Esquivel},
title     = {A Data Collection for Evaluating the Retrieval of Related Tweets to News Articles},
booktitle = {40th European Conference on Information Retrieval Research {(ECIR} 2018), Grenoble, France, March, 2018.},
year      = {2018},
pages     = (To appear),
url       = (To appear)
}