More robust version of getting the doc ID for a WARC record.
What happens if you set <trecformat> to true? What version of Indri are you using? The "doc" tags should be all caps.</trecformat>
What happens if you set <trecformat> to true?</trecformat>
RankLib
You should be passing the "-output" path from harvest links in the <inlinks> parameter to build the index. </inlinks>
what parameters did you pass to harvestlinks?
This was caused becuse the initial run used the parameter --filetype=trecweb and specified a galagoJobDir. The run failed becuase the documents were not in trecweb format. Even with the filetype set to trectext, Galago used the initial filetype saved in the galagoJobDir. When the galagoJobDir folder was changed or removed, the new filetype was used and built the index.
Galago filetype not handled correctly