Parameter files for !IndriBuildIndex are well-formed XML documents that must be wrapped in <parameter> </parameter>
tags. To specify the use of a parameter file on the command line, use:
$ IndriBuildIndex <parameter_file> [<parameter_file_2> ... <parameter_file_n>]
Note that you can specify more than one parameter file (say, if you have a standard set of stopwords you wish to use for all the indexes you build).
Alternatively, you can specify various parameter values directly on the command line as specified below.
<path>/path/to/file_or_directory</path></corpus>
in the parameter file and as -corpus.path=/path/to/file_or_directory on the command line.<corpus><class>trecweb</class></corpus>
in the parameter file and as -corpus.class=trecweb on the command line. For a list of default known classes, see the [Indexer File Formats].<corpus><annotations>/path/to/file</annotations></corpus>
in the parameter file and as -corpus.annotations=/path/to/file on the command line. For a full description of how to use offset annotations, see [Inline and Offset Annotations].<corpus><metadata>/path/to/file</metadata></corpus>
in the parameter file and as -corpus.metadata=/path/to/file on the command line.<index>/path/to/repository</index>
in the parameter file and as -index=/path/to/repository on the command line.<memory>100M</memory>
in the parameter file and as -memory 100M on the command line.<stopper><word>stopword</word></stopper>
and as -stopper.word=stopword on the command line. This is an optional parameter with the default of no stopping.<stemmer><name>stemmername</name></stemmer>
and as -stemmer.name=stemmername on the command line. This is an optional parameter with the default of no stemming.<metadata><field>fieldname</field></metadata>
in the parameter file and as metadata.field=fieldname on the command line.<metadata><forward>fieldname</forward></metadata>
in the parameter file and as metadata.forward=fieldname on the command line. The external document id field "docno" is automatically added as a forward metadata field.<metadata><backward>fieldname</backward></metadata>
in the parameter file and as metadata.backward=fieldname on the command line. The external document id field "docno" is automatically added as a backward metadata field.<field><name>fieldname</name></field>
in the parameter file and as -field.name=fieldname on the command line.<field><numeric>true</numeric></field>
in the parameter file and as -field.numeric=true on the command line. This is an optional parameter, defaulting to false. Note that 0 can be used for false and 1 can be used for true.The example parameter file below will create (or add to) and index at /home/lemur/testindex. The indexer will use a soft-limit of 1GB of RAM before flushing out its internal indexing buffers to disk. The source data for the example comes from two different corpora, one at /home/lemur/testdata/firstCorpus and the other located at /home/lemur/testdata/secondCorpus. Note that the classes of the two corpora are different. The parameter file also specifies that stemming is to be performed using the Krovetz method, and one field (the HTML paragraph tag "p") should be made available for searching on.
<parameters> <index>/home/lemur/testindex</index> <memory>1G</memory> <corpus> <path>/home/lemur/testdata/firstCorpus</path> <class>trectext</class> </corpus> <corpus> <path>/home/lemur/testdata/secondCorpus</path> <class>trecweb</class> </corpus> <stemmer><name>krovetz</name></stemmer> <field> <name>p</name> </field> </parameters>
Wiki: Indexer File Formats
Wiki: Inline and Offset Annotations
Wiki: Numeric and Date Fields in Indri
Wiki: Quick Start