luca - 2016-01-27

Beginners of the Indri platform on Windows often encounter a basic problem when running IndriMakeIndex.exe for the first time. Namely, they often might get the error message "couldn't create a repository" ... " parent directory not found".

Unfortunately an extensive search of documentation or mailinglists was also not very fruitful.
Hereby an example that works well, on Windows, without Cygwin.

The corpus to be indexed, consisting of txt files, is in the folder C:\temp\mycorpus\ The indexes should be stored in the folder C:\temp\mycorpus-index\ I manually created the folder C:\temp\mycorpus-index\ The Indri suite is installed under C:\temp\Indri\

The File containing the indexing directives is stored in
C:\temp\indri_indexing.parameter

Please carefully note how each value is formatted. The slightest change to these values will deliver the above error message.

<parameters>
<index>C:/temp/mycorpus-index/</index>
<memory>2G</memory>
<corpus>
<path>C:\temp\mycorpus</path>
<class>txt</class>
</corpus>
<stemmer><name>krovetz</name></stemmer>
</parameters>

Several things needs to be noticed, such as
1. no space between the <index> and the </index> tags
2. all in one line
3. the forward slashes for the<index> directive
4. the backward slashes for the <path> directive
5. final forward slash for the<index> directive
6. no final slash in the <path> directive

It has not been trivial to figure out those details, and I hope this will be of help also for other Windows users.

 

Last edit: luca 2016-01-27