I have TREC CDS PubMed document corpus and 30 queries for which I want to perform document retrieval. I am interested to use any learning to rank algorithm. Please help me where to start? In such kind of information retrieval what will be the training and testing data? Please guide me
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A training data file, the ranking algorithm you want to use, the metric you want to use for the training. Furthermore, you can specify additional validation and test data files, or just allow RankLib to split up the single file into training, validation and test pieces. See https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/ .
That's for you to decide. There are tons of different values in documents (and queries) that have been used for features: a document BM25 score, a document length (number of words), number of matching query terms, some sort of tf-idf score, etc., etc..
Note: You are learning to rank documents you retrieve. The training/validation/test data represent ranked retrieved lists. You are trying to learn a model that ranks those documents the best way possible.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear All
I have TREC CDS PubMed document corpus and 30 queries for which I want to perform document retrieval. I am interested to use any learning to rank algorithm. Please help me where to start? In such kind of information retrieval what will be the training and testing data? Please guide me
Thanks
https://sourceforge.net/p/lemur/wiki/RankLib/
Thank. I have explored this website but still I am not clear about the following points. I am not getting from where to start LTR in RankLib.
3 What will be the features in feature file as I want to retrieve documents?
Features that you think aid in proper ranking of a document along with relevance judgments for documents.
Format is defined in https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/
A training data file, the ranking algorithm you want to use, the metric you want to use for the training. Furthermore, you can specify additional validation and test data files, or just allow RankLib to split up the single file into training, validation and test pieces. See https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/ .
That's for you to decide. There are tons of different values in documents (and queries) that have been used for features: a document BM25 score, a document length (number of words), number of matching query terms, some sort of tf-idf score, etc., etc..
Note: You are learning to rank documents you retrieve. The training/validation/test data represent ranked retrieved lists. You are trying to learn a model that ranks those documents the best way possible.
Thanks for your detailed reply.
I have understood the training format.
I’ve queries in the given format
“Old man with coffee-ground emesis, tachycardia, hypoxia, hypotension and cool, clammy extremities”
Did you understand how to build a feature file? Do you have an example you can share for the correct format?