Share

The Lemur Toolkit

The forum address has changed, you have been automatically redirected. Please update any bookmarks to use the new URL.

Subscribe

IndriBuildIndex DOCNO indexed

You are viewing a single message from this topic. View all messages.

  1. 2009-08-01 13:04:28 UTC
    When I use IndriBuildIndex for indexing files, the index created stores the doc-numbers as if they were part of the text.

    For instance, a single file contains:
    <DOC>
    <DOCNO>D1</DOCNO>
    <TEXT>
    test
    </TEXT>
    </DOC>

    the build parameter file:
    <parameters>
    <index>/home/usr/tmpIndex</index>
    <corpus>
    <path>/home/usr/dox</path>
    <class>trectext</class>
    </corpus>
    <stemmer><name>porter</name></stemmer>
    </parameters>


    the manifest of the index created (notice the 2 total-terms):
    <parameters>
    <code-build-date>Jun 16 2009</code-build-date>
    <corpus>
    <document-base>1</document-base>
    <frequent-terms>0</frequent-terms>
    <maximum-document>2</maximum-document>
    <total-documents>1</total-documents>
    <total-terms>2</total-terms>
    <unique-terms>2</unique-terms>
    </corpus>
    <fields>
    <field>
    <byte-offset>0</byte-offset>
    <isNumeric>false</isNumeric>
    <isOrdinal>true</isOrdinal>
    <isParental>true</isParental>
    <name>document</name>
    <total-documents>1</total-documents>
    <total-terms>2</total-terms>
    </field>
    </fields>
    <indri-distribution>Indri development release 2.9</indri-distribution>
    <type>DiskIndex</type>
    </parameters>

    If I run a query q="d1", it finds "d1" as if it were part of the text.

    I know in Lemur's BuildIndex (key for example) the index does not treat the doc-numbers as an integral part of the text.
    Is there any way to make IndriBuildIndex ignore the DOCNO or at least not store it as if it were part of the text? the way it is now, it corrputs the index statistics.

    Thank you very much
< Previous | 1 | Next >

Add a Reply

This forum does not allow anonymous participation.

Log in to add a reply. Not registered? Create an account to participate and receive email updates when replies are posted to this topic.