Hi everyone, I know that pointwise and pairwise algorithms have their own loss function, for example defined on the number of misclassified pairs and as I expected in RankLib tutorial there is a very clear indication of how the parameter -metric2t doesn't affect the results. But i'm actually experimenting the opposite, if i run several times for example MART with different train metric as parameter i get different results. How is this possible?
Thank you very much
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What do you mean by different results? Different metric improvements?
Not sure what metrics you are using, but MAP and NDCG metrics need a query relevance file over all relevant documents, not just those inside of a cutoff point to be fully valid.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sure. I mean that i get different scores for my documents running different times the same algorithm (e.g. MART) with different -metric2t (e.g NDCG, ERR and MAP) even if this parameters shouldn't affect the results due to gbrt nature. Or am i wrong?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Different metrics to optimize are going to produce at least slightly different models, and thus different scores. There's no normalization of optimizations produced across different metrics that I am aware of.
The produced models are text files, so you can actually look at the thresholds and weights of the trees produced for the model. You will note the models will be different for different metrics used in producing the model.
It would therefore be normal to end up with different scores.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Important Note: -metric2t (e.g. NDCG, ERR, etc) only applies to list-wise algorithms (AdaRank, Coordinate Ascent and LambdaMART). Point-wise and pair-wise techniques (MART, RankNet, RankBoost), due to their nature, always use their internal RMSE / pair-wise loss as the optimization criteria. Thus, -metric2t has no effects on them.
Tomorrow I'll look at the source to better undertand what's happening.. Thank you :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone,
I tried to run RankNet [1] algorithm and if I don't specify the training metric (by the -metric2t option), it seems that the default is ERR@10, while specifying the training metric it seems that the algorithm is using it.
So, I don't understand the note either.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone, I know that pointwise and pairwise algorithms have their own loss function, for example defined on the number of misclassified pairs and as I expected in RankLib tutorial there is a very clear indication of how the parameter -metric2t doesn't affect the results. But i'm actually experimenting the opposite, if i run several times for example MART with different train metric as parameter i get different results. How is this possible?
Thank you very much
What do you mean by different results? Different metric improvements?
Not sure what metrics you are using, but MAP and NDCG metrics need a query relevance file over all relevant documents, not just those inside of a cutoff point to be fully valid.
Sure. I mean that i get different scores for my documents running different times the same algorithm (e.g. MART) with different -metric2t (e.g NDCG, ERR and MAP) even if this parameters shouldn't affect the results due to gbrt nature. Or am i wrong?
After calling this command :
with models trained with mart with different -metric2t i get different scores..
Different metrics to optimize are going to produce at least slightly different models, and thus different scores. There's no normalization of optimizations produced across different metrics that I am aware of.
The produced models are text files, so you can actually look at the thresholds and weights of the trees produced for the model. You will note the models will be different for different metrics used in producing the model.
It would therefore be normal to end up with different scores.
Then I don't understand this note
Tomorrow I'll look at the source to better undertand what's happening.. Thank you :)
Hi everyone,
I tried to run RankNet [1] algorithm and if I don't specify the training metric (by the -metric2t option), it seems that the default is ERR@10, while specifying the training metric it seems that the algorithm is using it.
So, I don't understand the note either.