Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Indri creates a void (empty) index

2011-12-30
2012-09-27
  • michaelprem123
    michaelprem123
    2011-12-30

    Hello!
    I am trying to create an index from a directory including files in the
    trectext format.
    Im using the version 5.1 of the lemur toolkit on a Linux Debian machine.

    This is a tiny index devided into about 800 files (6500 documents
    alltogeather).
    Indris console output is normal,(adding fieids, opening and closing files) no
    error messages. The repository is created normally, but the index/ directory
    is void, and the <indexCount> in the manifest file has the value 0.

    Aslo when I try to dump a text of an arbitrary document I get an error
    message: CompressedCollectio.cpp (705): Unable to find document 10613 in
    collection.

    Is there a way of making Indri a bit more verbose, so that I can get a hint of
    whats going on?

    Thanks

    Michael

     
  • michaelprem123
    michaelprem123
    2011-12-30

    BTW: Here is a copy of the parameter file

    <parameters>
    <index>/data/michaelp-
    inex/indexes/confBySpecificity_nonstemmed_onlyQreled</index>
    <memory>2G</memory>
    <corpus>
    /data/michaelp-inex/repositories/real_50Kfiles/prove_by_conf/outconf01_n
    onstemmed_onlyQreled

    <class>trectext</class>
    </corpus>
    <field><name>TEXT</name></field>
    <field><name>CONF0_0</name></field>
    <field><name>CONF0_1</name></field>
    <field><name>CONF0_2</name></field>
    <field><name>CONF0_3</name></field>
    <field><name>CONF0_4</name></field>
    <field><name>CONF0_5</name></field>
    <field><name>CONF0_6</name></field>
    <field><name>CONF0_7</name></field>
    <field><name>CONF0_8</name></field>
    <field><name>CONF0_9</name></field>

    <field><name>CONF1_0</name></field>
    <field><name>CONF1_1</name></field>
    <field><name>CONF1_2</name></field>
    <field><name>CONF1_3</name></field>
    <field><name>CONF1_4</name></field>
    <field><name>CONF1_5</name></field>
    <field><name>CONF1_6</name></field>
    <field><name>CONF1_7</name></field>
    <field><name>CONF1_8</name></field>
    <field><name>CONF1_9</name></field>

    <field><name>CONF2_0</name></field>
    <field><name>CONF2_10</name></field>
    <field><name>CONF2_15</name></field>
    <field><name>CONF2_20</name></field>
    <field><name>CONF2_25</name></field>
    <field><name>CONF2_30</name></field>
    <field><name>CONF2_35</name></field>
    <field><name>CONF2_40</name></field>
    <field><name>CONF2_45</name></field>
    <field><name>CONF2_50</name></field>
    <field><name>CONF2_55</name></field>
    <field><name>CONF2_60</name></field>
    <field><name>CONF2_65</name></field>
    <field><name>CONF2_70</name></field>
    <field><name>CONF2_75</name></field>
    <field><name>CONF2_80</name></field>
    <field><name>CONF2_85</name></field>
    <field><name>CONF2_90</name></field>
    <field><name>CONF2_95</name></field>

    <stopper>
    <word>a</word>
    <word>about</word>
    <word>above</word>
    <word>according</word>
    <word>across</word>
    <word>after</word>
    <word>afterwards</word>
    <word>again</word>
    <word>against</word>
    <word>albeit</word>
    <word>all</word>
    <word>almost</word>
    <word>alone</word>
    <word>along</word>
    <word>already</word>
    <word>also</word>
    <word>although</word>
    <word>always</word>
    <word>am</word>
    <word>among</word>
    <word>amongst</word>
    <word>an</word>
    <word>and</word>
    <word>another</word>
    <word>any</word>
    <word>anybody</word>
    <word>anyhow</word>
    <word>anyone</word>
    <word>anything</word>
    <word>anyway</word>
    <word>anywhere</word>
    <word>apart</word>
    <word>are</word>
    <word>around</word>
    <word>as</word>
    <word>at</word>
    <word>av</word>
    <word>be</word>
    <word>became</word>
    <word>because</word>
    <word>become</word>
    <word>becomes</word>
    <word>becoming</word>
    <word>been</word>
    <word>before</word>
    <word>beforehand</word>
    <word>behind</word>
    <word>being</word>
    <word>below</word>
    <word>beside</word>
    <word>besides</word>
    <word>between</word>
    <word>beyond</word>
    <word>both</word>
    <word>but</word>
    <word>by</word>
    <word>can</word>
    <word>cannot</word>
    <word>canst</word>
    <word>certain</word>
    <word>cf</word>
    <word>choose</word>
    <word>contrariwise</word>
    <word>cos</word>
    <word>could</word>
    <word>cu</word>
    <word>day</word>
    <word>do</word>
    <word>does</word>
    <word>doesn't</word>
    <word>doing</word>
    <word>dost</word>
    <word>doth</word>
    <word>double</word>
    <word>down</word>
    <word>dual</word>
    <word>during</word>
    <word>each</word>
    <word>either</word>
    <word>else</word>
    <word>elsewhere</word>
    <word>enough</word>
    <word>et</word>
    <word>etc</word>
    <word>even</word>
    <word>ever</word>
    <word>every</word>
    <word>everybody</word>
    <word>everyone</word>
    <word>everything</word>
    <word>everywhere</word>
    <word>except</word>
    <word>excepted</word>
    <word>excepting</word>
    <word>exception</word>
    <word>exclude</word>
    <word>excluding</word>
    <word>exclusive</word>
    <word>far</word>
    <word>farther</word>
    <word>farthest</word>
    <word>few</word>
    <word>ff</word>
    <word>first</word>
    <word>for</word>
    <word>formerly</word>
    <word>forth</word>
    <word>forward</word>
    <word>from</word>
    <word>front</word>
    <word>further</word>
    <word>furthermore</word>
    <word>furthest</word>
    <word>get</word>
    <word>go</word>
    <word>had</word>
    <word>halves</word>
    <word>hardly</word>
    <word>has</word>
    <word>hast</word>
    <word>hath</word>
    <word>have</word>
    <word>he</word>
    <word>hence</word>
    <word>henceforth</word>
    <word>her</word>
    <word>here</word>
    <word>hereabouts</word>
    <word>hereafter</word>
    <word>hereby</word>
    <word>herein</word>
    <word>hereto</word>
    <word>hereupon</word>
    <word>hers</word>
    <word>herself</word>
    <word>him</word>
    <word>himself</word>
    <word>hindmost</word>
    <word>his</word>
    <word>hither</word>
    <word>hitherto</word>
    <word>how</word>
    <word>however</word>
    <word>howsoever</word>
    <word>i</word>
    <word>ie</word>
    <word>if</word>
    <word>in</word>
    <word>inasmuch</word>
    <word>inc</word>
    <word>include</word>
    <word>included</word>
    <word>including</word>
    <word>indeed</word>
    <word>indoors</word>
    <word>inside</word>
    <word>insomuch</word>
    <word>instead</word>
    <word>into</word>
    <word>inward</word>
    <word>inwards</word>
    <word>is</word>
    <word>it</word>
    <word>its</word>
    <word>itself</word>
    <word>just</word>
    <word>kind</word>
    <word>kg</word>
    <word>km</word>
    <word>last</word>
    <word>latter</word>
    <word>latterly</word>
    <word>less</word>
    <word>lest</word>
    <word>let</word>
    <word>like</word>
    <word>little</word>
    <word>ltd</word>
    <word>many</word>
    <word>may</word>
    <word>maybe</word>
    <word>me</word>
    <word>meantime</word>
    <word>meanwhile</word>
    <word>might</word>
    <word>moreover</word>
    <word>most</word>
    <word>mostly</word>
    <word>more</word>
    <word>mr</word>
    <word>mrs</word>
    <word>ms</word>
    <word>much</word>
    <word>must</word>
    <word>my</word>
    <word>myself</word>
    <word>namely</word>
    <word>need</word>
    <word>neither</word>
    <word>never</word>
    <word>nevertheless</word>
    <word>next</word>
    <word>no</word>
    <word>nobody</word>
    <word>none</word>
    <word>nonetheless</word>
    <word>noone</word>
    <word>nope</word>
    <word>nor</word>
    <word>not</word>
    <word>nothing</word>
    <word>notwithstanding</word>
    <word>now</word>
    <word>nowadays</word>
    <word>nowhere</word>
    <word>of</word>
    <word>off</word>
    <word>often</word>
    <word>ok</word>
    <word>on</word>
    <word>once</word>
    <word>one</word>
    <word>only</word>
    <word>onto</word>
    <word>or</word>
    <word>other</word>
    <word>others</word>
    <word>otherwise</word>
    <word>ought</word>
    <word>our</word>
    <word>ours</word>
    <word>ourselves</word>
    <word>out</word>
    <word>outside</word>
    <word>over</word>
    <word>own</word>
    <word>per</word>
    <word>perhaps</word>
    <word>plenty</word>
    <word>provide</word>
    <word>quite</word>
    <word>rather</word>
    <word>really</word>
    <word>round</word>
    <word>said</word>
    <word>sake</word>
    <word>same</word>
    <word>sang</word>
    <word>save</word>
    <word>saw</word>
    <word>see</word>
    <word>seeing</word>
    <word>seem</word>
    <word>seemed</word>
    <word>seeming</word>
    <word>seems</word>
    <word>seen</word>
    <word>seldom</word>
    <word>selves</word>
    <word>sent</word>
    <word>several</word>
    <word>shalt</word>
    <word>she</word>
    <word>should</word>
    <word>shown</word>
    <word>sideways</word>
    <word>since</word>
    <word>slept</word>
    <word>slew</word>
    <word>slung</word>
    <word>slunk</word>
    <word>smote</word>
    <word>so</word>
    <word>some</word>
    <word>somebody</word>
    <word>somehow</word>
    <word>someone</word>
    <word>something</word>
    <word>sometime</word>
    <word>sometimes</word>
    <word>somewhat</word>
    <word>somewhere</word>
    <word>spake</word>
    <word>spat</word>
    <word>spoke</word>
    <word>spoken</word>
    <word>sprang</word>
    <word>sprung</word>
    <word>stave</word>
    <word>staves</word>
    <word>still</word>
    <word>such</word>
    <word>supposing</word>
    <word>than</word>
    <word>that</word>
    <word>the</word>
    <word>thee</word>
    <word>their</word>
    <word>them</word>
    <word>themselves</word>
    <word>then</word>
    <word>thence</word>
    <word>thenceforth</word>
    <word>there</word>
    <word>thereabout</word>
    <word>thereabouts</word>
    <word>thereafter</word>
    <word>thereby</word>
    <word>therefore</word>
    <word>therein</word>
    <word>thereof</word>
    <word>thereon</word>
    <word>thereto</word>
    <word>thereupon</word>
    <word>these</word>
    <word>they</word>
    <word>this</word>
    <word>those</word>
    <word>thou</word>
    <word>though</word>
    <word>thrice</word>
    <word>through</word>
    <word>throughout</word>
    <word>thru</word>
    <word>thus</word>
    <word>thy</word>
    <word>thyself</word>
    <word>till</word>
    <word>to</word>
    <word>together</word>
    <word>too</word>
    <word>toward</word>
    <word>towards</word>
    <word>ugh</word>
    <word>unable</word>
    <word>under</word>
    <word>underneath</word>
    <word>unless</word>
    <word>unlike</word>
    <word>until</word>
    <word>up</word>
    <word>upon</word>
    <word>upward</word>
    <word>upwards</word>
    <word>us</word>
    <word>use</word>
    <word>used</word>
    <word>using</word>
    <word>very</word>
    <word>via</word>
    <word>vs</word>
    <word>want</word>
    <word>was</word>
    <word>we</word>
    <word>week</word>
    <word>well</word>
    <word>were</word>
    <word>what</word>
    <word>whatever</word>
    <word>whatsoever</word>
    <word>when</word>
    <word>whence</word>
    <word>whenever</word>
    <word>whensoever</word>
    <word>where</word>
    <word>whereabouts</word>
    <word>whereafter</word>
    <word>whereas</word>
    <word>whereat</word>
    <word>whereby</word>
    <word>wherefore</word>
    <word>wherefrom</word>
    <word>wherein</word>
    <word>whereinto</word>
    <word>whereof</word>
    <word>whereon</word>
    <word>wheresoever</word>
    <word>whereto</word>
    <word>whereunto</word>
    <word>whereupon</word>
    <word>wherever</word>
    <word>wherewith</word>
    <word>whether</word>
    <word>whew</word>
    <word>which</word>
    <word>whichever</word>
    <word>whichsoever</word>
    <word>while</word>
    <word>whilst</word>
    <word>whither</word>
    <word>who</word>
    <word>whoa</word>
    <word>whoever</word>
    <word>whole</word>
    <word>whom</word>
    <word>whomever</word>
    <word>whomsoever</word>
    <word>whose</word>
    <word>whosoever</word>
    <word>why</word>
    <word>will</word>
    <word>wilt</word>
    <word>with</word>
    <word>within</word>
    <word>without</word>
    <word>worse</word>
    <word>worst</word>
    <word>would</word>
    <word>wow</word>
    <word>ye</word>
    <word>yet</word>
    <word>year</word>
    <word>yippee</word>
    <word>you</word>
    <word>your</word>
    <word>yours</word>
    <word>yourself</word>
    <word>yourselves</word>
    </stopper>
    </parameters>

     
  • David Fisher
    David Fisher
    2012-01-03

    Most likely the documents are not in trectext format. See https://sourceforge
    .net/apps/trac/lemur/wiki/Indexer%20File%20Formats

    Note that the <DOC> and </DOC> tags must each appear on a separate line, with
    no other characters.

    DOS format files, such as created by Notepad.exe, which use \r\n line
    separators, will not be processed correctly.

    The <DOCNO>...</DOCNO> element must appear on a single line with no other
    characters outside of the element.

    Separately, you said:

    "This is a tiny index devided into about 800 files (6500 documents
    alltogeather).
    Indris console output is normal,(adding fieids, opening and closing files) no
    error messages. The repository is created normally, but the index/ directory
    is void, and the <indexCount> in the manifest file has the value 0. "

    And then said:

    "Aslo when I try to dump a text of an arbitrary document I get an error
    message: CompressedCollectio.cpp (705): Unable to find document 10613 in
    collection."

    Trying to get a document with an id 4,000 higher than the number of documents
    in the collection will not be successful.

    The output of IndriBuildIndex includes the number of documents parsed and the
    number indexed, eg:

    0:00: Opened corpora/ClueWeb09/ClueWeb09_English_1/enwp00/00.warc.gz
    0:57: Documents parsed: 21575 Documents indexed: 21575
    0:57: Closed corpora/ClueWeb09/ClueWeb09_English_1/enwp00/00.warc.gz
    0:57: Closing index
    1:01: Finished
    
     
  • mehri
    mehri
    2012-01-05

    Hi ,I download indri source code of this site but I can’t compile execution
    file named “indri.sln” .
    What I do compile this project and where I can find library C++ for this
    project.
    Thanks.

     
  • varvara tzika
    varvara tzika
    2012-05-08

    Hey,

    I make the indexing with the use of IndriBuildIndex and when i run a query
    like " #combine(a b )" with the following way:

    QueryRequest queryRequest = new QueryRequest();
    queryRequest.query = myQuery;
    queryRequest.resultsRequested = numberOfResults;
    queryRequest.options = QueryRequest.TextSnippet;
    QueryResults qResults = querEnv.runQuery(queryRequest);
    QueryResult queryResults = qResults.results;
    return queryResults;

    It gives me the error:../src/CompressedCollection.cpp(705): Unable to find
    document 10 in the collection

    Also, when i run: parsedDocs = querEnv.documents(scoredResults);

    It gives me the same error.

    I don't know what is going wrong.

    Thanks in advance

     
  • David Fisher
    David Fisher
    2012-05-09

    What were your indexing parameters? Did you set the parameter storeDocs to
    false? If so, You can not use the QueryRequest API.

     
  • varvara tzika
    varvara tzika
    2012-05-10

    Thank you very much,
    Everything runs perfect after i change it.