User Activity

  • Posted a comment on discussion Retrieval on The Lemur Project

    Please review https://sourceforge.net/p/lemur/wiki/Indri%20Retrieval%20Model/ The formulae for both are given there. The value of cf passed in to the JelinekMercerTermScoreFunction is the computed estimate of P(t|C), which is collection term frequency/|C|.

  • Posted a comment on discussion Retrieval on The Lemur Project

    I expect that the problem relates more to your conversion of the WARC files, as you have multiple queries experiencing the issue. You can modify the dumpindex code to iterate over your internal document ids to identify the ones that have an empty string for their docno element. You can then use dumpindex to retrieve the ParsedDocument and see which ones have the problem..

  • Posted a comment on discussion Indexing and Parsing on The Lemur Project

    The opening and closing DOC tags must appear on a separate line, by themselves, eg: <DOC> .... </DOC>

  • Posted a comment on discussion Retrieval on The Lemur Project

    Since you processed the WARC files into your own document set, I am unable to replicate your issue. Beacuse you have changed to a subset of the collection, I am unable to replicate your issue. The exact query you show is number 102, but your output from the original unedited post indicated query number 157.

  • Posted a comment on discussion Retrieval on The Lemur Project

    Can't say for certain where your problem lies. Please provide me with the indexing paramters that you used (include a copy of the index manifest files), and the exact form of the offending query (#157) from your run so that I can try to replicate the issue.

  • Posted a comment on discussion Retrieval on The Lemur Project

    The missing document id is indicative of a duplicate document entry or some string hashing bug. What operating system did you build this on? What version of indri? Did you compile the code yourself? If so, what are your configuration options? I beleive that there is a document in the TREC-B set from wikipedia which contains a TRECWEB example document embedded in it, which could be incorrectly parsed by some versions of indri. Retrieving that embedded document could also cause the document id to be...

  • Posted a comment on discussion Retrieval on The Lemur Project

    Not really, no. It would require modification of the code to change that behavior.

  • Posted a comment on discussion Retrieval on The Lemur Project

    #combinep scores all p entries for any document containing one of the query terms.

View All

Personal Data

Username:
david_fisher
Joined:
2006-02-28 14:45:12
Location:
Palmer / United States / EDT
Gender:
Male
Web Site:
  1. http://ciir.cs.umass.edu/~dfisher

Projects

Skills

  • C
  • Topic
  • C++
  • Java

Personal Tools

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks