Need to understand the scoring method.

  • Kajari GhoshDastidar

    I am new to text mining, and of course new to Indri. I saw that when I run
    query on a set of indexed documents a negative scores is associated to each
    doc. Can you tell me in a line or two how the documents are scored? What dies
    the score mean?
    Is it based on how many times a term is present in the doc? Does the score
    changes with the size of the document? Are there other parameters involved?

    Here is a scenario I have. I have some hundreds of blogs, each saved in a
    separate document. I want to find from the blogs which author is a java
    expert. So, first I indexed the documents. Then I ran the query on them. In
    the query.xml file I entered query words like "java", "j2ee", etc all related
    java words. Now, I want to score the docs which contain some or all of these
    words. So, I used #or in writing the query words.

    Now, I want to interpret the scores. Does the highest scored doc will mean it
    contains most of the search words? WIll the score depend on the size of the
    file as well? Are there other parameters considered in the scoring that I
    should be aware of?

    At one place I read Indri follows OKAPI scoring. I searched about it but could
    not follow the literature. So, I would appreciate if you can explain the
    scoring to me in a few lines.

    Thanks a lot for your time,

    thanks a lot for the prompt reply! The archive helped!


