Possible bug in RankLib LambdaMART algorithm implementation.
LambdaMART generates OutOfMemoryError exceptions for some processing threads when run large training data, in particular, one query of data set containing 500,000 documents.
Some debugging checks involved the following:
o Plenty of RAM on running host -- 768G: asking for 500G via Java -Xmx argument
o No system memory limits for processes
o Adding Java -Xms value similar to -Xmx still failed
o Tried LamdaMART using NDCG instead of ERR still failed.
o No problem using different algorithm (MART with ERR or NDCG)
o No problem using reduced dataset with ERR or NDCG
So problem appears to be with LambdaMART implementation given problem shows up only with high number of document (almost 500k for a specific query) dataset.
Note: it is possible this is not actually a bug, but an implementation error where algorithm speed was optimized over space usage. The algorithm simply might not be scalable.
The bug manifest itself also with dataset containing 68491 documents for one query, but it's not showing with 45661 documents.
Thanks Diego, so if the number of documents for each query is less than 45661, then the bug is not triggered even if there are thousands of queries? And is this fixed in new versions of RankLib?
Last edit: Surabhi Amit Chembra 2020-09-08
Ehm...I have no memories of the problem other than the mail I've sent.
Il mar 8 set 2020, 20:51 Surabhi Amit Chembra schembra@users.sourceforge.net ha scritto:
Related
Bugs: #294
Thanks Diego. Appreciate the response.
Last edit: Surabhi Amit Chembra 2020-09-09