I ran into a strange problem with Indri when using indexes built with
an older version. I'm not sure exactly when the change occurred, but
if I run Indri with using a default BOW run of queries 701 -- 850 on
the GOV2 collection, there is a big difference when running queries
between v5.0 and v5.5.
Indri 5.5 - default config
map all 0.2326
P_10 all 0.4886
ndcg all 0.4836
ndcg_cut_10 all 0.3995
Indri 5.0 - default config
map all 0.2805
P_10 all 0.5678
ndcg all 0.5578
ndcg_cut_10 all 0.4646
Notes: I did not use the HTML parser in Indri. Rather I used
Boilerpipe to generate plaintext and then wrapped the documents in old
trectext style. So, I went to the manifest file to see that my index
was built using Indri 5.0 a while ago.
Next, I reindexed the text collection using the new v5.5 indexer. Now
when I run IndriRunQuery v5.5, I get the following scores:
map all 0.2816
P_10 all 0.5577
ndcg all 0.5570
ndcg_cut_10 all 0.4579
I suppose that is also a little surprising that the scoring changes
slightly between 5.0 - 5.5 but maybe that is known. I expected them to
be rank equivalent.
I think but have not tested fully that if you build an index using
v5.2 of Indri, the using IndriRunQuery v5.5 returns the same results.
Just thought I should let you know. Maybe Version 5.5 or 5.6 should
print a warning to stderr if the index version is too old? I have to
assume there was some sort of index formatting changes that happened
post 5.0.
To ship in 12/2013 release.
When the major version number is different, or when the major version number is 5 and the minor is less than 3 (special case), or when there is a more than 4 difference in minor versions (2-1/2 years), throw an Exception.