First of all, thanks for creating and maintaining this library. There are not many open source tools which offer a collection of ranking algorithms.
I took a look at the sparse dataset support in RankLib and thought it could be made more efficient. I have attached a patch file which does not add any new functionality but refactors the current code to make loading and accessing sparse datasets more efficient. I think you had this refactoring in your roadmap anyway but I thought I could lend a hand.
Thank you very much for your help. Could you give me the url for your patch as well as the patching instructions? I'll put it in the main codes as soon as I find some free time.
Strange that the patch file did not attach to the first post. Hopefully, it will attach this time. To patch your current SVN revision:
1. Copy the file to the RankLib/trunk folder.
2. In a Linux or Mac OS terminal, cd to RankLib/trunk.
3. Execute 'patch -p0 -i sparse_data.patch' without the quotes.
Thanks a lot, Siddhartha. I'll check it out as soon as I find some time.
Change that, not added, the patch conflicts with other changes in the current head.
The patch is now in. However, I have to change the access pattern to RANDOM because SEQUENTIAL isn't thread-safe (i.e. in some cases, we have different threads processing different subsets of features at the same time -- this will screw up the current SEQUENTIAL pattern). So the speed up regarding sparse vectors isn't really clear at the moment. However, this patch does give me a very good basis to work up. Thanks again :-)