In my parameter file, I have these specified stopwords <stopper> <word>stopword1</word> <word>stopword2</word> </stopper>
Can I save stopword1 and stopword2 in one stopword file and specify this file in my parameter file as follows: <stopper> <word>stopword</word> </stopper>
or
<stopper>stopword</stopper>
Question 2:
How can I know my stopwords are removed from indexing? Can I check it in the manifest file after Indri builds index?
Thanx.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You mean for ranking indexed documents, the ranking score is exactly same by using a single index or by using a group of individual indices? Or in other words, the term frequency in the same collection is same no matter using a single or a group of indices?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there an option to specify stopword list as a file, instead of specifying them in the parameter file?
no
Question 1:
In my parameter file, I have these specified stopwords
<stopper>
<word>stopword1</word>
<word>stopword2</word>
</stopper>
Can I save stopword1 and stopword2 in one stopword file and specify this file in my parameter file as follows:
<stopper>
<word>stopword</word>
</stopper>
or
<stopper>stopword</stopper>
Question 2:
How can I know my stopwords are removed from indexing? Can I check it in the manifest file after Indri builds index?
Thanx.
Create a stopword parameter file with this format:
<parameters>
<stopper>
<word>stopword1</word>
<word>stopword2</word>
</stopper>
</parameters>
Then, when you build your index, you can use both your parameter file and your stopword parameter file with IndriBuildIndex.
IndriBuildIndex <parameterfile> <stopwordsfile>
The manifest file will show the stopwords that were used with the indexing.
For a more detailed description, see the example here under the heading "Indexing and Parameters": http://lemurproject.org/clueweb09/indri-howto.php
Last edit: David Pane 2014-03-16
Great, Thanks!
Instead of using a cluster of servers to build a single index by IndriBuildIndex, is it possible to merge two built indices into one single index?
You can create a parameter file containing the paths to the indexes:
<parameters>
<index>/path/to/first/index</index>
<index>/path/to/second/index</index>
</parameters>
You can then pass the parameter file as an argument when using IndriRunQuery. All indexes specified in the parameter file will be used.
Thanks.
You mean for ranking indexed documents, the ranking score is exactly same by using a single index or by using a group of individual indices? Or in other words, the term frequency in the same collection is same no matter using a single or a group of indices?
Yes.