The Lemur Project / Discussion / RankLib: RankLib error with normalization and saved models

Let me start by saying thanks for creating this toolkit and sharing it with the world.

I think I hit a bug however. Using the downloadable binary, it seems that z-score normalization messes up a saved model. For instance, note the different "MAP on test data" (this is all done on the MQ2008 data, Fold1):

$ ranklib_jar -train train.txt -test test.txt -ranker 2 -norm zscore -silent -metric2t MAP -save model.txt

[+] General Parameters:
Training data:  train.txt
Test data:  test.txt
Feature vector representation: Dense.
Ranking method: RankBoost
Feature description file:   Unspecified. All features will be used.
Train metric:   MAP
Test metric:    MAP
Feature normalization: zscore
Model file: model.txt

[+] RankBoost's Parameters:

Reading feature file [train.txt]... [Done.]
(471 ranked lists, 9630 entries read)
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

Model saved to: model.txt

$ ranklib_jar -load model.txt -test test.txt -metric2T MAP

[+] General Parameters:
Model file: model.txt
Feature normalization: No
Test metric:    MAP
Model:      RankBoost
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.3902

Without normalization it works fine:

$ ranklib_jar -train train.txt -test test.txt -ranker 6 -silent -metric2t MAP -save model.txt

[+] General Parameters:
Training data:  train.txt
Test data:  test.txt
Feature vector representation: Dense.
Ranking method: LambdaMART
Feature description file:   Unspecified. All features will be used.
Train metric:   MAP
Test metric:    MAP
Feature normalization: No
Model file: model.txt

[+] LambdaMART's Parameters:

Reading feature file [train.txt]... [Done.]
(471 ranked lists, 9630 entries read)
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.4279

Model saved to: model.txt

$ ranklib_jar -load model.txt -test test.txt -metric2T MAP

[+] General Parameters:
Model file: model.txt
Feature normalization: No
Test metric:    MAP
Model:      LambdaMART
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.4279

Note that when I try to do the same normalization during testing, I get an error:

$ ranklib_jar -load model.txt -test test.txt -norm zscore -metric2T MAP

[+] General Parameters:
Model file: model.txt
Feature normalization: zscore
Test metric:    MAP
Model:      LambdaMART
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
Exception in thread "main" java.lang.NullPointerException
    at ciir.umass.edu.features.ZScoreNormalizor.normalize(Unknown Source)
    at ciir.umass.edu.eval.Evaluator.normalize(Unknown Source)
    at ciir.umass.edu.eval.Evaluator.test(Unknown Source)
    at ciir.umass.edu.eval.Evaluator.main(Unknown Source)

This is fixed in the SVN version. Compiling the trunk and using that yields the different scores again:

$ ranklib_jar -train train.txt -test test.txt -ranker 2 -norm zscore -silent -metric2t MAP -save model.txt
[+] General Parameters:
Training data:  train.txt
Test data:  test.txt
Feature vector representation: Dense.
Ranking method: RankBoost
Feature description file:   Unspecified. All features will be used.
Train metric:   MAP
Test metric:    MAP
Feature normalization: zscore
Model file: model.txt

[+] RankBoost's Parameters:

Reading feature file [train.txt]... [Done.]
(471 ranked lists, 9630 entries read)
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

Model saved to: model.txt

$ ranklib_jar -load model.txt -test test.txt -norm zscore -metric2T MAP

[+] General Parameters:
Model file: model.txt
Feature normalization: zscore
Test metric:    MAP
Model:      RankBoost
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.3937

Can it be that the zscore normalization is applied to the combination of training and testing instances?

David Fisher - 2013-11-14

Added as a bug, https://sourceforge.net/p/lemur/bugs/221/ thanks for reporting it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Van Dang - 2013-11-19

This issue has been fixed. Please do an update from trunk and let us know if it's really gone.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Edgar Meij - 2013-11-21

Yep, this is fixed now.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Van Dang - 2013-11-21
  
  Sounds good. Thanks for confirming.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mohammad A - 2016-09-16

I'm still experiencing the same issue using this version: 2.1-patched-2

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-09-19

Please use a more recent version.

Current release version is RankLib-2.7.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yonatan - 2016-11-07

having same problem using 2.7 version taken from here: https://sourceforge.net/projects/lemur/files/lemur/RankLib-2.7/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-11-09

I don't seem to be getting the difference in scores between a saved normed model and a normed loaded model. I definitely am not getting any sort of NullPointerException reported in the original problem from 2013.

I would not expect scores to be the same between non-normed saved models and normed data loaded into a non-normed model.

I have tried running some save/load tests using RankLib-2.3, which was the version in which the original problem was reportedly fixed, as well as the downloaded RankLib-2.7 model from downloads, and a current jar built from RankLib-2.8-SNAPSHOT sources.

Have I misunderstood your reported problem, or inadequately attempted to reproduce the problem?

Below is a listing of the testing I did based on what was done in the original error report as well as the Bug 221 report ( https://sourceforge.net/p/lemur/bugs/221/ ). I used MQ2008/Fold1 data for the runs.

// Downloaded RankLib-2.7
// Create the normed model
$ java -jar RankLib-2.7-download.jar -train train.txt -test test.txt -ranker 2 -norm zscore \
-silent -metric2t MAP -save download-2.7-save.txt

[+] General Parameters:
Training data: train.txt
Test data: test.txt
Feature vector representation: Dense.
Ranking method: RankBoost
Feature description file: Unspecified. All features will be used.
Train metric: MAP
Test metric: MAP
Feature normalization: zscore
Model file: download-2.7-save.txt

[+] RankBoost's Parameters:

Reading feature file [train.txt]: 0...
Reading feature file [train.txt]... [Done.]
(471 ranked lists, 9630 entries read)

Reading feature file [test.txt]: 0...
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

Model saved to: download-2.7-save.txt

// Run loaded normed model
$ java -jar RankLib-2.7-download.jar -test test.txt -norm zscore \
-silent -metric2T MAP -load download-2.7-save.txt

[+] General Parameters:
Model file: download-2.7-save.txt
Feature normalization: zscore
Test metric: MAP
Model: RankBoost

Reading feature file [test.txt]: 0...
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

// Current RankLib-2.8-SNAPSHOT
// Create normed model
$ java -jar RankLib-2.8-SNAPSHOT.jar -train train.txt -test test.txt -ranker 2 \
-norm zscore -silent -metric2t MAP -save RL2.8-save.txt

[+] General Parameters:
Training data: train.txt
Test data: test.txt
Feature vector representation: Dense.
Ranking method: RankBoost
Feature description file: Unspecified. All features will be used.
Train metric: MAP
Test metric: MAP
Feature normalization: zscore
Model file: RL2.8-save.txt

[+] RankBoost's Parameters:

Reading feature file [train.txt]: 0...
Reading feature file [train.txt]... [Done.]
(471 ranked lists, 9630 entries read)

Reading feature file [test.txt]: 0...
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

Model saved to: RL2.8-save.txt

// Run loaded normed model
$ java -jar RankLib-2.8-SNAPSHOT.jar -test test.txt -norm zscore \
-silent -metric2T MAP -load RL2.8-save.txt

[+] General Parameters:
Model file: RL2.8-save.txt
Feature normalization: zscore
Test metric: MAP
Model: RankBoost

Reading feature file [test.txt]: 0...
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

// RankLib-2.3 in which Bug 221 was reported fixed.
// Create normed model
$ java -jar RankLib-2.3.jar -train train.txt -test test.txt -ranker 2 -norm zscore \
-silent -metric2t MAP -save RL2.3-save.txt

[+] General Parameters:
Training data: train.txt
Test data: test.txt
Feature vector representation: Dense.
Ranking method: RankBoost
Feature description file: Unspecified. All features will be used.
Train metric: MAP
Test metric: MAP
Feature normalization: zscore
Model file: RL2.3-save.txt

[+] RankBoost's Parameters:

Reading feature file [train.txt]: 0...
Reading feature file [train.txt]... [Done.]
(471 ranked lists, 9630 entries read)

Reading feature file [test.txt]: 0...
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

Model saved to: RL2.3-save.txt

// Run loaded normed model
$ java -jar RankLib-2.3.jar -test test.txt -norm zscore \
-silent -metric2T MAP -load RL2.3-save.txt

[+] General Parameters:
Model file: RL2.3-save.txt
Feature normalization: zscore
Test metric: MAP
Model: RankBoost

Reading feature file [test.txt]: 0...
Reading feature file [test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
MAP on test data: 0.453

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

RankLib error with normalization and saved models

Search engine and data mining applications and ClueWeb datasets.

Forums

Help

RankLib error with normalization and saved models

RankLib error with normalization and saved models

Search engine and data mining applications and ClueWeb datasets.

Forums

Help

RankLib error with normalization and saved models document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

RankLib error with normalization and saved models