Menu

How to dealing with the tied ranks in the training phase

RankLib
2016-01-29
2016-02-03
  • Shahab Jalalvand

    I trained a model on a training data full of tied ranks. I got bad results. Then I broke the ties in the training set according to some information and I got better result.
    Now I wonder if the tied trianing samples are discarded during the ties or not. What is the reason for this improvement?

    Note that my tied instances are not necessarily similar. Their feature vectors might differ.
    I'm using Random Forest with the pairwise strategy option.

    Any Idea?

     
  • Lemur Project

    Lemur Project - 2016-02-03

    Which bagging ranker (rtype parameter) did you use? Default is MART (ranker 0), but you might use LambdaMART if your doing pairwise comparisons.

    The algorithms look over all the features individually (and in random sets) keeping track of max, min and unique values, along with the variance and deviation of the samples for each feature.

    Furthermore, ensemble weights are based on label values of samples, skipping over pairs that have matching labels. This might be a problem if you have a large number of identical labels in the training set (more difficult to generalize over the features).

    If you have a feature that has the same or very closely the same values across many samples, then it's not especially useful for learning. The artificially low variance/deviation for the feature may aid in producing a model that underfits the data.

    If your model evaluated OK against training data but poorly against test/validation data, your model likely overfit the data; if poorly against both training and test/validation data, underfitting is a possibility and you might consider adding more or better features.

     

Log in to post a comment.

MongoDB Logo MongoDB