How to dealing with the tied ranks in the training phase

Search engine and data mining applications and ClueWeb datasets.

Brought to you by: cammiemw, david_fisher, gregorybrooks, jamiecallan, sm-harding

How to dealing with the tied ranks in the training phase

Forum: RankLib

Creator: Shahab Jalalvand

Created: 2016-01-29

Updated: 2016-02-03

Shahab Jalalvand - 2016-01-29

I trained a model on a training data full of tied ranks. I got bad results. Then I broke the ties in the training set according to some information and I got better result.
Now I wonder if the tied trianing samples are discarded during the ties or not. What is the reason for this improvement?

Note that my tied instances are not necessarily similar. Their feature vectors might differ.
I'm using Random Forest with the pairwise strategy option.

Any Idea?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-02-03

Which bagging ranker (rtype parameter) did you use? Default is MART (ranker 0), but you might use LambdaMART if your doing pairwise comparisons.

The algorithms look over all the features individually (and in random sets) keeping track of max, min and unique values, along with the variance and deviation of the samples for each feature.

Furthermore, ensemble weights are based on label values of samples, skipping over pairs that have matching labels. This might be a problem if you have a large number of identical labels in the training set (more difficult to generalize over the features).

If you have a feature that has the same or very closely the same values across many samples, then it's not especially useful for learning. The artificially low variance/deviation for the feature may aid in producing a model that underfits the data.

If your model evaluated OK against training data but poorly against test/validation data, your model likely overfit the data; if poorly against both training and test/validation data, underfitting is a possibility and you might consider adding more or better features.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.