Training LambdaMart model incrementally

Search engine and data mining applications and ClueWeb datasets.

Brought to you by: cammiemw, david_fisher, gregorybrooks, jamiecallan, sm-harding

Training LambdaMart model incrementally

Forum: RankLib

Created: 2018-04-19

Updated: 2018-04-27

Luffy - 2018-04-19

Hi,
I am training a lambdamart model for a ranking problem. However I get new data every fixed interval, Is it possible to incrementally train, a trained model, without completely retraining.

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2018-04-19

RankLib does not support incremental training.

Van Dang, the implementer of RankLib, states that it is possible to modify the code to support such training though.

See
https://sourceforge.net/p/lemur/discussion/ranklib/thread/b7a34b89/ for his suggestions on such code modification.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Luffy - 2018-04-19

Hey Stephen, Thanks for the reply.

-

I have looked into van's suggestion. I however am a bit confused. Van suggested to use existing load function to read the pratially trained model. We can initialize "Ensemble" and "features" from this. This Ensemble can be then used in learn() method, instead of creating a new Ensemble Object. are we supposed to load values other than ensemble and features from the saved model? As far as I have understood the init() method should be called unmodified and create a new method partialTrain()[similarsimilartosimilar similar to where Ensemble is loaded from saved model instead of a new object. Is this correct?

Last edit: Luffy 2018-04-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2018-04-20

I believe that is correct, but I'll have to look over the LambdaMART process since I am not sure.

As you mention, my assumption was that the loaded model simply becomes the previous best ensemble model (starting point) for learning with the new feature data. However, I'm not certain what data structures are assumed to be instantiated at the start, from which new feature data builds upon.

Van seemed to indicate this was all fairly easy to do. I'll have to look. Might be a good feature addition.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Luffy - 2018-04-26

Hey Stephen, I tried implementing van's suggestions. I also parameterized the code to allow calling "partial train" from cmd line. However when I am Re-training on a loaded ensemble, I am getting wrong outputs, since it just keep on adding new regression trees to the ensemble. Can we make the new learning while checking that the older results are also preserved?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2018-04-27

I was wondering how one would control the number of trees since with each incrimental addition to the model, one adds to the tree count.

As far as preserving previous model results, I'm not sure one can, since the incrementally built model is a different model from the one it originated from. Even models trained on the same training data multiple times can result in models that will differ slightly since there can be randomization differences.

I would hope that the incremental model would be at least a little better than the original model it was built from, unless an issue of over-training comes into play.

We'd be happy to apply any of your code additions to the RankLib LambdaMART ranker if you end up with something you are satisfied with. So far, you're only the second person in a few years that has mentioned the need for an incremental build. It sounds like a good thing though.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.