Indexing the clueweb12 CatA

feige
2014-07-29
2014-07-29
  • feige

    feige - 2014-07-29

    Hi, I want to index the clueweb12 CatA data set using Indri. However, the capacity of my disk is limited, so the disk cannot store the data after decomposing the data set. My question is that can I build the index without decomposing the original clueweb12 data?

     
  • David Fisher

    David Fisher - 2014-07-29

    See: http://www.lemurproject.org/clueweb12/indri-howto.php

    Indri can process the compressed files directly. They do not need to be decompressed.

    See also the storeDocs parameter (in the above page), that can be used to reduce the total space required.

     
    • feige

      feige - 2014-07-29

      Thanks for your timely reply, I will try your suggestions.

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks