Menu

Training LambdaMart model incrementally

RankLib
Luffy
2018-04-19
2018-04-27
  • Luffy

    Luffy - 2018-04-19

    Hi,
    I am training a lambdamart model for a ranking problem. However I get new data every fixed interval, Is it possible to incrementally train, a trained model, without completely retraining.

    Thanks

     
  • Lemur Project

    Lemur Project - 2018-04-19

    RankLib does not support incremental training.

    Van Dang, the implementer of RankLib, states that it is possible to modify the code to support such training though.

    See
    https://sourceforge.net/p/lemur/discussion/ranklib/thread/b7a34b89/ for his suggestions on such code modification.

     
  • Luffy

    Luffy - 2018-04-19

    Hey Stephen, Thanks for the reply.

    -

    I have looked into van's suggestion. I however am a bit confused. Van suggested to use existing load function to read the pratially trained model. We can initialize "Ensemble" and "features" from this. This Ensemble can be then used in learn() method, instead of creating a new Ensemble Object.   are we supposed to load values other than ensemble and features from the saved model? As far as I have understood the init() method should be called unmodified and create a new method partialTrain()[similarsimilartosimilar similar to where Ensemble is loaded from saved model instead of a new object. Is this correct?

     

    Last edit: Luffy 2018-04-20
  • Lemur Project

    Lemur Project - 2018-04-20

    I believe that is correct, but I'll have to look over the LambdaMART process since I am not sure.

    As you mention, my assumption was that the loaded model simply becomes the previous best ensemble model (starting point) for learning with the new feature data. However, I'm not certain what data structures are assumed to be instantiated at the start, from which new feature data builds upon.

    Van seemed to indicate this was all fairly easy to do. I'll have to look. Might be a good feature addition.

     
  • Luffy

    Luffy - 2018-04-26

    Hey Stephen, I tried implementing van's suggestions. I also parameterized the code to allow calling "partial train" from cmd line. However when I am Re-training on a loaded ensemble, I am getting wrong outputs, since it just keep on adding new regression trees to the ensemble. Can we make the new learning while checking that the older results are also preserved?

     
  • Lemur Project

    Lemur Project - 2018-04-27

    I was wondering how one would control the number of trees since with each incrimental addition to the model, one adds to the tree count.

    As far as preserving previous model results, I'm not sure one can, since the incrementally built model is a different model from the one it originated from. Even models trained on the same training data multiple times can result in models that will differ slightly since there can be randomization differences.

    I would hope that the incremental model would be at least a little better than the original model it was built from, unless an issue of over-training comes into play.

    We'd be happy to apply any of your code additions to the RankLib LambdaMART ranker if you end up with something you are satisfied with. So far, you're only the second person in a few years that has mentioned the need for an incremental build. It sounds like a good thing though.

     

Log in to post a comment.

MongoDB Logo MongoDB