Sparse datasets support

2013-11-02
2013-12-14
  • Siddhartha Bagaria

    First of all, thanks for creating and maintaining this library. There are not many open source tools which offer a collection of ranking algorithms.

    I took a look at the sparse dataset support in RankLib and thought it could be made more efficient. I have attached a patch file which does not add any new functionality but refactors the current code to make loading and accessing sparse datasets more efficient. I think you had this refactoring in your roadmap anyway but I thought I could lend a hand.

     
    • Van Dang

      Van Dang - 2013-11-02

      Siddhartha,

      Thank you very much for your help. Could you give me the url for your patch as well as the patching instructions? I'll put it in the main codes as soon as I find some free time.

       
  • Siddhartha Bagaria

    Hi!

    Strange that the patch file did not attach to the first post. Hopefully, it will attach this time. To patch your current SVN revision:
    1. Copy the file to the RankLib/trunk folder.
    2. In a Linux or Mac OS terminal, cd to RankLib/trunk.
    3. Execute 'patch -p0 -i sparse_data.patch' without the quotes.

    That's it.

     
  • Van Dang

    Van Dang - 2013-11-05

    Thanks a lot, Siddhartha. I'll check it out as soon as I find some time.

     
  • David Fisher

    David Fisher - 2013-11-19

    Added.

    Change that, not added, the patch conflicts with other changes in the current head.

     
    Last edit: David Fisher 2013-11-19
  • Van Dang

    Van Dang - 2013-12-14

    The patch is now in. However, I have to change the access pattern to RANDOM because SEQUENTIAL isn't thread-safe (i.e. in some cases, we have different threads processing different subsets of features at the same time -- this will screw up the current SEQUENTIAL pattern). So the speed up regarding sparse vectors isn't really clear at the moment. However, this patch does give me a very good basis to work up. Thanks again :-)

     

Log in to post a comment.