The Lemur Project / Discussion / RankLib: Using RankLib

Search engine and data mining applications and ClueWeb datasets.

Using RankLib

Forum: RankLib

Creator: yonatan

Created: 2016-08-21

Updated: 2016-08-23

yonatan - 2016-08-21

Hello,

I would like to use ranklib in order to apply lambdamart algorithm to my query results.
I am trying to understand what is the current version, and where can i see a list of version binaries?

I need to use ranklib not as a java application, but more like a SDK.
I need to train the model once; then load the model file once on application startup; then ask for the score of each document, every time a query is running.
In order to do so, i need to access the loading model function directly, and the scoring function.
Do you have any documented API for that?

How do you suggest to update the model with new queries results?

How can I assign document id for each document in a query?

Thanks!
Yonatan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-08-23

If you go to https://sourceforge.net/projects/lemur/files/lemur , you will get a listing of folders containing officially released components of the Lemur Project (RankLib, Indri and Galago). The current latest release is RankLib-2.7.

If you're going to be modifying RankLib, you'll need sources which are obtained through an SVN checkout from SourceForge.

svn checkout svn://svn.code.sf.net/p/lemur/code/RankLib/trunk \ RankLib

which will get you the latest snapshot (of version 2.8).

RankLib doesn't really have an SDK API. It is a central application (Evaluator.java) that enables running of different LTR algorithms. Each algorithm is a subclass of Ranker which will have some init() and learn() methods for doing the processing. Data is in the form of DataPoints (Sparse and Dense) that hold documents in rank order along with the features you have specified for each document-query pair.

You can use the loadFromString() method to load a saved model text, but I'm not sure how one would merge different model instances beyond adding to your training/test/validation data that comprise the newly retrieved documents as separate ranked lists, but this results in relearning/training for a new model.

The input data does not explicitly hold document IDs. It just stores the rank value of the document as a sort of pseudo-ID. This input data typically stores external document ID as a comment at the end of the input data line (after the # character). This information is stored in the DataPoint structure in a description field which can be accessed.

There is an unlisted command line argument that makes use of the document external ID as part of the ranking output. This is the indri option. Look in the code in Evaluator.java for it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Using RankLib

Search engine and data mining applications and ClueWeb datasets.

Forums

Help

Using RankLib

Using RankLib

Search engine and data mining applications and ClueWeb datasets.

Forums

Help

Using RankLib document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Using RankLib