The Lemur Project / Discussion / RankLib: How to use RankLib

How to use RankLib

Forum: RankLib

Creator: SMM

Created: 2018-04-21

Updated: 2018-05-29

SMM - 2018-04-21

Dear All

I have TREC CDS PubMed document corpus and 30 queries for which I want to perform document retrieval. I am interested to use any learning to rank algorithm. Please help me where to start? In such kind of information retrieval what will be the training and testing data? Please guide me

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2018-04-22

https://sourceforge.net/p/lemur/wiki/RankLib/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SMM - 2018-04-26

Thank. I have explored this website but still I am not clear about the following points. I am not getting from where to start LTR in RankLib.

What to include in training file?

How to make training file?

What I have to provide to model while training ?
3 What will be the features in feature file as I want to retrieve documents?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2018-04-27

Features that you think aid in proper ranking of a document along with relevance judgments for documents.

Format is defined in https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/

A training data file, the ranking algorithm you want to use, the metric you want to use for the training. Furthermore, you can specify additional validation and test data files, or just allow RankLib to split up the single file into training, validation and test pieces. See https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/ .

That's for you to decide. There are tons of different values in documents (and queries) that have been used for features: a document BM25 score, a document length (number of words), number of matching query terms, some sort of tf-idf score, etc., etc..

Note: You are learning to rank documents you retrieve. The training/validation/test data represent ranked retrieved lists. You are trying to learn a model that ranks those documents the best way possible.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SMM - 2018-04-28

Thanks for your detailed reply.
I have understood the training format.

Could you please explain how to generate the feature values for each query?

I’ve queries in the given format
“Old man with coffee-ground emesis, tachycardia, hypoxia, hypotension and cool, clammy extremities”
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yoni Weidenfeld - 2018-05-29

Did you understand how to build a feature file? Do you have an example you can share for the correct format?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.