Hi, I want to index the clueweb12 CatA data set using Indri. However, the capacity of my disk is limited, so the disk cannot store the data after decomposing the data set. My question is that can I build the index without decomposing the original clueweb12 data?
Indri can process the compressed files directly. They do not need to be decompressed.
See also the storeDocs parameter (in the above page), that can be used to reduce the total space required.
Thanks for your timely reply, I will try your suggestions.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.