The Lemur Project / Discussion / RankLib: Malformed log file and NaN ranker using AdaRank

Patrick L - 2016-11-27

Hey guys,

Im using AdaRank to train a model using the k-fold mode (10 folds). Almost every fold computation ends with a malformed log file like this one:

...
103 | 5 | 0.0272 | | OK |
104 | 5 | 0.0272 | | OK |
105 | 5 | 0.0272 | | OK |
106 | 5 | 0.0272 | | OK |
107 | 5 | 0.0272 | | OK |
108 | 5 | 0.0272 | | OK |
109 | 5 | 0.0272 | | OK |
110 | 5 | 0.0272 | | OK |
111 | 5 | 0.0272 | | OK |
112 | 5 | 0.0272 | | OK |
113 | 5 | 0.0272 | | OK |
114 | 5 | 0.7924 | | OK |
115 | 115 | 115 | 115 | 115 | 115 | 115 | 115 | 115 | 115 | 115 | 115 | 115 | --------------------------------------------------------
Finished sucessfully.
NDCG@15 on training data: 0.7924

Last added weak ranker is a "NaN". I checked the training data, it looks absolutely normal to me?

....
1 qid:6476 1:0.00000 2:0.00000 3:0.00000 4:0.00000 5:0.00000 6:0.00000 7:0.00000 8:0.69789 9:0.00000 10:0.00000 11:0.00000 12:0.67780 13:0.00000 14:0.12249
0 qid:6476 1:0.65229 2:0.61308 3:0.42410 4:0.36891 5:0.00000 6:0.88454 7:0.88275 8:0.90317 9:0.00000 10:0.88487 11:0.87646 12:0.85536 13:0.00000 14:0.13413
3 qid:6476 1:0.57395 2:0.50366 3:0.35114 4:0.27311 5:0.00000 6:0.87313 7:0.87661 8:0.75848 9:0.00000 10:0.86981 11:0.86382 12:0.72657 13:0.00000 14:0.14819
17 qid:6476 1:0.43631 2:0.38303 3:0.00000 4:0.00000 5:0.00000 6:0.00000 7:0.00000 8:0.57272 9:0.00000 10:0.00000 11:0.00000 12:0.59169 13:0.00000 14:0.16097
20 qid:6476 1:0.33554 2:0.29669 3:0.31379 4:0.29113 5:0.00000 6:0.81922 7:0.78361 8:0.73338 9:0.00000 10:0.82155 11:0.77441 12:0.66986 13:0.00000 14:0.17765
0 qid:6476 1:0.52996 2:0.47312 3:0.31961 4:0.25025 5:0.00000 6:0.88643 7:0.85118 8:0.75732 9:0.00000 10:0.88568 11:0.84646 12:0.71264 13:0.00000 14:0.20701

...

Any known bugs, ideas or anything?
Thanks in advance!

Last edit: Patrick L 2016-11-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-11-28

I don't see anything specifically wrong with your data. How many ranked lists are in your full training set? Do your relevance labels range from 0 to 20? Use -gmax argument if labels range more than 5 (0-4 values)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Patrick L - 2016-11-29

7094 ranked lists, 1535410 entries read.

I retried using -gmax, since my labels are up to 30, but the results are still the same.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-11-29

It appears the problem is data driven. Something about your data is messing up the kcv processing. I have obtained similar results when using different ranking algorithms, but it doesn't occur when using other sample data I have on hand.

Failures always occur at the final fold, and with feature 14 of your data being consecutively selected for the weak ranker. It can no longer determine a performance difference between current and previous weak rankers.

Your data may have brought out a bug in the KCV processing or it may be related to a possible bug in some metric calculations (Bug #291) or something else altogether.

This has been added as a possible bug ( https://sourceforge.net/p/lemur/bugs/292/ ) and will be looked into it further.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Patrick L - 2016-12-01

Okay, I changed to LambdaMART which seem to work fine for my problem right now. If you need more sample data, just let me know. Thanks for your time!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-12-01

Maybe a couple queries worth of data, if not too large would be useful. The little snippet of example data you provided is really too small to be very definitive in reproducing the failure.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Malformed log file and NaN ranker using AdaRank

Search engine and data mining applications and ClueWeb datasets.

Forums

Help

Malformed log file and NaN ranker using AdaRank

Malformed log file and NaN ranker using AdaRank

Search engine and data mining applications and ClueWeb datasets.

Forums

Help

Malformed log file and NaN ranker using AdaRank document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Malformed log file and NaN ranker using AdaRank