Relevance Model 3

Retrieval
Eyal K
2011-10-16
2012-09-27
  • Eyal K

    Eyal K - 2011-10-16

    Hi,

    I would like to use RM3 to retrieve documents from an index built with Indri
    (ClueWeb09).
    In addition, I would like to set the weights of the documents used to
    construct the relevance model (p(d|q)) using some parameter file (not the
    weights assigned by RM3).
    In lemur, this was easy - RelFBEval did exactly this.
    However, I cannot find something equivalent in Indri.
    When I tried using RelFBEval (version 4.12) on a small index constructed with
    Indri, it only produced RM1 (zero weight to the original query) results
    regardless of the value assigned to <feedbackMixtureNoise>.

    1. Is there an Indri application that does this?
      If the answer is YES, the following questions can be ignored.

    2. Why does RelFBEval produce RM1 results and not RM3 results (even when <feedbackMixtureNoise> is set to values smaller than 1)? (you can see the parameter file below)

    3. Will RelFBEval even work on a large collection such as ClueWeb?

    Than you!

    parameter file for RelFBEval:

    <parameters>
    <index>tmpIndex/tmpIndex.key</index>
    <retModel>2</retModel>
    <feedbackDocCount>100</feedbackDocCount>
    <feedbackTermCount>100</feedbackTermCount>
    <feedbackCoefficient>0.8</feedbackCoefficient>
    <textQuery>query.DELME</textQuery>
    <smoothMethod>dir</smoothMethod>
    <DirichletPrior>1000</DirichletPrior>
    <smoothStrategy>interpolate</smoothStrategy>
    <queryUpdateMethod>3</queryUpdateMethod>
    <resultFile>res.DELME</resultFile>
    <resultFormat>1</resultFormat>
    <resultCount>100</resultCount>
    <feedbackMixtureNoise>0.9</feedbackMixtureNoise>
    <feedbackDocuments>FBdocs.DELME</feedbackDocuments>
    <adjustedScoreMethod>ql</adjustedScoreMethod>
    </parameters>

     
  • David Fisher

    David Fisher - 2011-10-17

    1) RelFBEval can not work with collections the size of ClueWeb (nor can any of
    the Lemur RetrievalMethod APIs or applications).

    2) RelFBEval offers RM1 or RM2 when using the SimpleKLRetrievalMethod, neither
    of which use the original query. The feedbackMixtureNoise parameter is used by
    the some of the other expansion methods (divmin, markov chain, mixture).
    <queryUpdateMethod>3</queryUpdateMethod> requests RM1.

    3) IndriRunQuery allows specifying feedback documents, via the feedbackDocno
    parameter. It does not permit specifying a score for those documents, you
    could modify IndriRunQuery to do so.

     
  • Eyal K

    Eyal K - 2011-10-17

    Thank you David for your reply!

    If I understand 3) correctly, then setting feedbackDocno to a certain document
    amounts to using that document --- and that document alone --- for feedback.

    I tried doing that but no matter which document this parameter points, the
    result stays the same.

    Here is a minimal working example that demonstrates this issue:

    -bash-3.2$ cat text/myDoc
    <DOC>
    <DOCNO> D1 </DOCNO>
    <TEXT>
    a
    b
    c
    </TEXT>
    </DOC>
    <DOC>
    <DOCNO> D2 </DOCNO>
    <TEXT>
    a
    </TEXT>
    </DOC>

    -bash-3.2$ cat IndriBuildIndex.par
    <parameters>
    <memory>1G</memory>
    <storeDocs>false</storeDocs>
    <index>tmpIndex</index>
    <corpus>
    text
    <class>trectext</class>
    </corpus>
    <stemmer><name>porter</name></stemmer>
    </parameters>

    -bash-3.2$ ./IndriBuildIndex_5.1 IndriBuildIndex.par
    0:00: Created repository tmpIndex
    0:00: Opened text/myDoc
    0:00: Documents parsed: 2 Documents indexed: 2
    0:00: Closed text/myDoc
    0:00: Closing index
    0:00: Finished

    -bash-3.2$ cat query.indri
    <parameters>
    <query> <number>1</number> #combine( a )</query>
    </parameters>

    -bash-3.2$ head -20 IndriRunQuery_fb?.par
    ==> IndriRunQuery_fb1.par <==
    <parameters>
    <index>tmpIndex</index>
    <rule>method:dirichlet,mu:10</rule>
    <trecFormat>true</trecFormat>
    <count>2</count>
    <fbDocs>1</fbDocs>
    <fbTerms>3</fbTerms>
    <fbMu>20</fbMu>
    <fbOrigWeight>0.1</fbOrigWeight>
    <feedbackDocno>D1</feedbackDocno>
    </parameters>

    ==> IndriRunQuery_fb2.par <==
    <parameters>
    <index>tmpIndex</index>
    <rule>method:dirichlet,mu:10</rule>
    <trecFormat>true</trecFormat>
    <count>2</count>
    <fbDocs>1</fbDocs>
    <fbTerms>3</fbTerms>
    <fbMu>20</fbMu>
    <fbOrigWeight>0.1</fbOrigWeight>
    <feedbackDocno>D2</feedbackDocno>
    </parameters>

    -bash-3.2$ ./IndriRunQuery_5.1 IndriRunQuery_fb1.par query.indri
    1 Q0 D2 1 -0.606136 indri
    1 Q0 D1 2 -0.77319 indri
    -bash-3.2$ ./IndriRunQuery_5.1 IndriRunQuery_fb2.par query.indri
    1 Q0 D2 1 -0.606136 indri
    1 Q0 D1 2 -0.77319 indri

    Am I doing something wrong?

    Thank you!

     
  • David Fisher

    David Fisher - 2011-10-17

    Please review the parameters for IndriRunQuery (http://lemur.sourceforge.net/
    indri/IndriRunQuery.html),
    feedbackDocno is a per query parameter, it has to be specified
    within the query element. Multiple feedbackDocno elements may appear in a
    query element:

    harvey:~/Development/indri/test/test-fb$ cat fb.p query1.indri query2.indri
    <parameters>
           <index>tmpIndex</index>
    <rule>method:dirichlet,mu:10</rule>
           <trecFormat>true</trecFormat>
           <count>2</count>
           <fbDocs>1</fbDocs>
           <fbTerms>3</fbTerms>
           <fbMu>20</fbMu>
    <fbOrigWeight>0.1</fbOrigWeight>
    </parameters>
    <parameters>
    <query> <number>1</number> <text> #combine( a )</text>
           <feedbackDocno>D1</feedbackDocno>
    </query>
    </parameters>
    <parameters>
    <query> <number>1</number> <text> #combine( a )</text>
           <feedbackDocno>D2</feedbackDocno>
    </query>
    </parameters>
    
    harvey:~/Development/indri/test/test-fb$ ../../runquery/IndriRunQuery fb.p -printQuery=true query1.indri
    # query:  #combine( a )
    # expanded: #weight( 0.10000000000000000555111512312578 #combine(  #combine( a ) ) 0.90000000000000002220446049250313 #weight(  0.47826086956521740578551771250204 "a"  0.26086956521739129710724114374898 "b"  0.26086956521739129710724114374898 "c"  ) ) 
    1 Q0 D2 1 -1.01723 indri
    1 Q0 D1 2 -1.02628 indri
    harvey:~/Development/indri/test/test-fb$ ../../runquery/IndriRunQuery fb.p -printQuery=true query2.indri
    # query:  #combine( a )
    # expanded: #weight( 0.10000000000000000555111512312578 #combine(  #combine( a ) ) 0.90000000000000002220446049250313 #weight(  0.52380952380952383595769106250373 "a"  ) ) 
    1 Q0 D2 1 -0.606136 indri
    1 Q0 D1 2 -0.77319 indri
    
     
  • Eyal K

    Eyal K - 2011-10-17

    It works, thank you!
    One last thing (hopefully), the fbMu parameter is supposed to control the
    smoothing over the terms within the constructed relevance model (according to
    http://ciir.cs.umass.edu/~metzler/indriretmodel.html#prf).
    However, setting this parameter to many different values yields the same
    result. (I'm using version 5.1.)
    Thanks again!

     
  • David Fisher

    David Fisher - 2011-10-17

    I don't know, it works just fine for me:

    harvey:~/Development/indri/test/test-fb$ cat fb1.p fb2.p
    <parameters>
           <index>tmpIndex</index>
    <rule>method:dirichlet,mu:10</rule>
           <trecFormat>true</trecFormat>
           <count>2</count>
           <fbDocs>1</fbDocs>
           <fbTerms>3</fbTerms>
           <fbMu>20</fbMu>
    <fbOrigWeight>0.1</fbOrigWeight>
    </parameters>
    <parameters>
           <index>tmpIndex</index>
    <rule>method:dirichlet,mu:10</rule>
           <trecFormat>true</trecFormat>
           <count>2</count>
           <fbDocs>1</fbDocs>
           <fbTerms>3</fbTerms>
           <fbMu>2000</fbMu>
    <fbOrigWeight>0.1</fbOrigWeight>
    </parameters>
    harvey:~/Development/indri/test/test-fb$ ../../runquery/IndriRunQuery fb1.p -printQuery=true query1.indri# query:  #combine( a )# expanded: #weight( 0.10000000000000000555111512312578 #combine(  #combine( a ) ) 0.90000000000000002220446049250313 #weight(  0.47826086956521740578551771250204 "a"  0.26086956521739129710724114374898 "b"  0.26086956521739129710724114374898 "c"  ) ) 
    1 Q0 D2 1 -1.01723 indri
    1 Q0 D1 2 -1.02628 indri
    harvey:~/Development/indri/test/test-fb$ ../../runquery/IndriRunQuery fb2.p -printQuery=true query1.indri
    # query:  #combine( a )
    # expanded: #weight( 0.10000000000000000555111512312578 #combine(  #combine( a ) ) 0.90000000000000002220446049250313 #weight(  0.49975037443834247063989550952101 "a"  0.25012481278082876468005224523949 "b"  0.25012481278082876468005224523949 "c"  ) ) 
    1 Q0 D2 1 -1.00029 indri
    1 Q0 D1 2 -1.01586 indri
    
     
  • Eyal K

    Eyal K - 2011-10-17

    Yes, my mistake - when using D2 (that has a single term) fbMu has no effect.
    Thanks for your help, David!

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks