Menu

Can we specify stop words as a file?

vinnie
2013-04-20
2014-03-21
  • vinnie

    vinnie - 2013-04-20

    Is there an option to specify stopword list as a file, instead of specifying them in the parameter file?

     
  • David Fisher

    David Fisher - 2013-04-21

    no

     
  • rosemary

    rosemary - 2014-03-16

    Question 1:

    In my parameter file, I have these specified stopwords
    <stopper>
    <word>stopword1</word>
    <word>stopword2<
    /
    word>
    </stopper>

    Can I save stopword1 and stopword2 in one stopword file and specify this file in my parameter file as follows:
    <stopper>
    <word>stopword</word>
    <
    /
    stopper>

    or

    <stopper>stopword</stopper>

    Question 2:
    How can I know my stopwords are removed from indexing? Can I check it in the manifest file after Indri builds index?

    Thanx.

     
  • David Pane

    David Pane - 2014-03-16

    Create a stopword parameter file with this format:

    <parameters>
    <stopper>
    <word>stopword1</word>
    <word>stopword2</word>
    </stopper>
    </parameters>

    Then, when you build your index, you can use both your parameter file and your stopword parameter file with IndriBuildIndex.

    IndriBuildIndex <parameterfile> <stopwordsfile>

    The manifest file will show the stopwords that were used with the indexing.

    For a more detailed description, see the example here under the heading "Indexing and Parameters": http://lemurproject.org/clueweb09/indri-howto.php

     

    Last edit: David Pane 2014-03-16
  • rosemary

    rosemary - 2014-03-18

    Great, Thanks!

    Instead of using a cluster of servers to build a single index by IndriBuildIndex, is it possible to merge two built indices into one single index?

     
  • David Pane

    David Pane - 2014-03-19

    You can create a parameter file containing the paths to the indexes:

    <parameters>
    <index>/path/to/first/index</index>
    <index>/path/to/second/index</index>
    </parameters>

    You can then pass the parameter file as an argument when using IndriRunQuery. All indexes specified in the parameter file will be used.

     
  • rosemary

    rosemary - 2014-03-21

    Thanks.

    You mean for ranking indexed documents, the ranking score is exactly same by using a single index or by using a group of individual indices? Or in other words, the term frequency in the same collection is same no matter using a single or a group of indices?

     
  • David Pane

    David Pane - 2014-03-21

    Yes.

     

Log in to post a comment.