Menu

combining the results of using stemmer and without stemmer

Galago
Jiho Noh
2018-03-26
2018-04-13
  • Jiho Noh

    Jiho Noh - 2018-03-26

    Hi, I have question about using a stemmer.
    In some cases, you want to use a stemmer and sometimes not.
    I wonder if there’s a way to combine those results together via one query statement.
    I was able to get those two distinct results by running search multiple times, but I
    want to know if there’s a practical way to get combined results.

    Thanks.

     
  • Lemur Project

    Lemur Project - 2018-03-26

    You don't mention what you are using for your indexing/query software.

    Indri builds indexes with one stemmer at a time. Querying that index uses the stemmer (or lack thereof) defined during the build.

    Galago allows you to build indexes with multiple stemmed parts. However, you must specify which stemmer (or no stemming) you are using to process queries. The stemmer defined is applied to all the queries.

    There are ways to mix unstemmed and stemmed parts in very low level Galago queries but the work required to fill in query smoothing parameters is excessive and not really worthwhile.

    Not sure how meaningful combined results would be. The scoring of terms will be different depending on the query terms and stemmers used, making the result scores/rankings somewhat confusing to interpret.

     
  • Lemur Project

    Lemur Project - 2018-03-26

    Furthermore, I am not certain how Galago handles duplicate document IDs having different scores in a ranking. I don't think it is possible to get the same document ID multiple times in a ranking (different scores).

     
  • Jiho Noh

    Jiho Noh - 2018-04-13

    Thanks for the response.
    In my domain, searching with the use of a stemmer works generally better. However some of the terms in queries, usually proper nouns, needs to be searched as is. I can search multiple times with options, but the scoring schemes are different as you already mentioned. Merging those scores by means of averaging becomes impractical. One other approach I can think of is to decide whether to use stemmer by preprocessing the query, but it seems too much of feature-engineering to me.

     

Log in to post a comment.