I don't see anything specifically wrong with your data. How many ranked lists are in your full training set? Do your relevance labels range from 0 to 20? Use -gmax argument if labels range more than 5 (0-4 values)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It appears the problem is data driven. Something about your data is messing up the kcv processing. I have obtained similar results when using different ranking algorithms, but it doesn't occur when using other sample data I have on hand.
Failures always occur at the final fold, and with feature 14 of your data being consecutively selected for the weak ranker. It can no longer determine a performance difference between current and previous weak rankers.
Your data may have brought out a bug in the KCV processing or it may be related to a possible bug in some metric calculations (Bug #291) or something else altogether.
Maybe a couple queries worth of data, if not too large would be useful. The little snippet of example data you provided is really too small to be very definitive in reproducing the failure.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey guys,
Im using AdaRank to train a model using the k-fold mode (10 folds). Almost every fold computation ends with a malformed log file like this one:
Last added weak ranker is a "NaN". I checked the training data, it looks absolutely normal to me?
Any known bugs, ideas or anything?
Thanks in advance!
Last edit: Patrick L 2016-11-27
I don't see anything specifically wrong with your data. How many ranked lists are in your full training set? Do your relevance labels range from 0 to 20? Use -gmax argument if labels range more than 5 (0-4 values)?
7094 ranked lists, 1535410 entries read.
I retried using -gmax, since my labels are up to 30, but the results are still the same.
It appears the problem is data driven. Something about your data is messing up the kcv processing. I have obtained similar results when using different ranking algorithms, but it doesn't occur when using other sample data I have on hand.
Failures always occur at the final fold, and with feature 14 of your data being consecutively selected for the weak ranker. It can no longer determine a performance difference between current and previous weak rankers.
Your data may have brought out a bug in the KCV processing or it may be related to a possible bug in some metric calculations (Bug #291) or something else altogether.
This has been added as a possible bug ( https://sourceforge.net/p/lemur/bugs/292/ ) and will be looked into it further.
Okay, I changed to LambdaMART which seem to work fine for my problem right now. If you need more sample data, just let me know. Thanks for your time!
Maybe a couple queries worth of data, if not too large would be useful. The little snippet of example data you provided is really too small to be very definitive in reproducing the failure.