The basic usage for retrieval is to use the !IndriRunQuery application. The basic command line usage is:
` $ ./IndriRunQuery <parameter_file>`
The full set of parameters accepted by IndriRunQuery.
For IndriRunQuery, the input queries are specified in the parameters file:
For example, the following query had id 503 and will evaluate the query "#combine(prime factor)" on the 3 listed documents.
<parameters> <query> <number>503</number> <text>#combine(prime factor)</text> <workingSetDocno>clueweb09-en0000-00-00004</workingSetDocno> <workingSetDocno>clueweb09-en0000-00-00005</workingSetDocno> <workingSetDocno>clueweb09-en0000-00-00006</workingSetDocno> </query> </parameters>
You can query multiple indexes by specifying them in a parameter file:
`<parameters> <index>/path/to/index1</index> <index>/path/to/index2</index> </parameters>`
Optionally, you can also specify smoothing rules for the method to use. For example:
` <rule>method:linear,collectionLambda:0.4,documentLambda:0.2</rule> <rule>method:dirichlet,mu:1000</rule> <rule>method:twostage,mu:1500,lambda:0.4</rule>`
You can also specify different smoothing rules for different types of fields.
The following set of rules uses two level Dirichlet smoothing, and smooths
sentence fields differently from the default. The default smooths a document
with the collection by Dirichlet smoothing with mu=50, and then smooths any
field (that is not a sentence) with the smoothed document model by Dirichlet with mu=5:
`<parameters> <rule>method:d,mu:50,documentMu:5</rule> <rule>method:d,mu:1200,documentMu:150,field:sentence</rule> </parameters>`
If you do not specify smoothing rules, default is Dirichlet smoothing with mu:2500,
which may not be the best parameter for your collection and set of queries.
Table 7 and 8 of Fang et al 2004 include optimal mu's and Lambda's for different collections and queries.
Additionally, Zhao and Callan 2008 and Zhao and Callan 2009 include field smoothing setup guidelines.
For formatting results in TREC format, you can also use the following parameters:
* runID: a string specifying the id for a query run, used in TREC scorable output.
* trecFormat: true to produce TREC scoreable output, otherwise use false (default).
You can also format results for INEX processing:
* participant-id: specifies the participant-id attribute used in submissions.
* task: specifies the task attribute (default CO.Thorough).
* query: specifies the query attribute (default automatic).
* topic-part: specifies the topic-part attribute (default T).
* description: specifies the contents of the description tag.
The default output from !IndriRunQuery will return a list of results, 1 result per line, with 4 columns:
* score: the score of the returned document. An Indri query will always return a negative value for a result.
* docID: the document ID
* extent_begin: the starting token number of the extent that was retrieved
* extent_end: the ending token number of the extent that was retrieved
As an example:
-4.83646 AP890101-0001 0 485 -7.06236 AP890101-0015 0 385
If the results were formatted with TREC formatting as described above, the output will be in the format:
` <queryID> Q0 <DocID> <rank> <score> <runID>`
As an example:
150 Q0 AP890101-0001 1 -4.83646 runName 150 Q0 AP890101-0015 2 -7.06236 runName