Menu

IndriRunQuery quietly fails to deliver results

Retrieval
luca
2016-02-04
2017-03-05
1 2 > >> (Page 1 of 2)
  • luca

    luca - 2016-02-04

    Dear All,
    I am experiencing a very strange behaviour by IndriRunQuery.exe when running queries against a specific index it fails to deliver any output for some queries.
    When I then resubmit the queries that failed, then for some of those queries I do get a result, while for most of the others don't.
    The same set of queries run fine against another index.
    The indexing didn't deliver any result.

    the index\0\manifest of the failing index

    <parameters>
    <code-build-date>Dec 22 2015</code-build-date>
    <corpus>
    <document-base>1</document-base>
    <frequent-terms>257</frequent-terms>
    <maximum-document>79457</maximum-document>
    <total-documents>79456</total-documents>
    <total-terms>2408792</total-terms>
    <unique-terms>14889</unique-terms>
    </corpus>
    <fields></fields>
    <indri-distribution>Indri development release 5.8</indri-distribution>
    <type>DiskIndex</type>
    </parameters>

    the same file for the index that properly works

    <parameters>
    <code-build-date>Dec 22 2015</code-build-date>
    <corpus>
    <document-base>1</document-base>
    <frequent-terms>4328</frequent-terms>
    <maximum-document>66175</maximum-document>
    <total-documents>66174</total-documents>
    <total-terms>48902567</total-terms>
    <unique-terms>662912</unique-terms>
    </corpus>
    <fields></fields>
    <indri-distribution>Indri development release 5.8</indri-distribution>
    <type>DiskIndex</type>
    </parameters>

    I notice that the failing index has more documents (79457) compared to those that did not fail (66174).
    Could there be the reason ?

    Is there an upper limit for IndriRunIndex 5.8 on 32 Bit windows machines ?

     
  • Lemur Project

    Lemur Project - 2016-02-04

    IndriRunIndex isn't going to be limited to processing indexes built by the same version as the run.

    Clearly you have two different indexes as indicated by the doc counts, so I would have to say build parameters were not the same for both indexes.

    These appear to be very small indexes, so Just rebuild them and query again.

     
    • luca

      luca - 2016-02-05

      Thankyou, however in my opinion there is a bug in IndriRunQuery in not delivering any result for some specific queries.

       
  • David Fisher

    David Fisher - 2016-02-05

    There might be, and you should submit a bug report in the tickets section. You need to include all of the relevant details, such as, operating system version, indri version, whether you compiled it yourself or are using the binary distribution, Visual Studio version if you compiled it yourself, full description of the conditions necessary to replicate the behavior, etc. Note that the same details are requiered for the index build.

    Your description above is unclear, do you mean you have two indexes that are both supposed to contain the same data? Or do you mean you just have two different indexes and only produce the behavior on one of them?

    In either case, if you experience an odd behavior at retrieval time, the first test is to rebuild your index and compare to determine if perhaps the misbehaving index is corrupted in some fashion.

     
  • Lemur Project

    Lemur Project - 2016-02-05

    Are you able to dump the indexes (dumpindex)?

    If you can't [fully] dump the contents of the failing index, then it probably is corrupted in some way.

    If you can, look for your query terms in the listings to confirm that at least some documents should have been returned for the query.

     
  • luca

    luca - 2016-02-08

    dear David and Stephan, thankyou for your support.
    I have now re-run the indexing, on the same corpus, and stored the indexes in a different folder.
    I then rerun the queries and compared the results, identically : incomplete.
    This means that for some queries the IndriRuQuery does not return any result.

    Let me explain the goal: I would like to query the individual phrases of english wikipedia and retrieve the scores of the best matching queries. This is in the context of the kaggle competition that will complete this week www.kaggle.com/c/the-allen-ai-science-challenge.

    There are thousands of questions and for each question 4 possible answers. With my team ("LTSB" - currently achieving 33.6% score -random result is 25%) we have implemented a strategy that in a similar competition (http://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-BhattacharyaEt2012.pdf) scored at the second top position and consisted in the following steps:

    a) generate pseudo sentences combining each question with each of the possible answers
    b) POS tagging and filter the pseudo sentences, keeping only specific term POS
    c) Process a large text corpus of relevant content by
    c-1) split each sentence into an individual txt file
    c-2) IndriBuildIndex using krovetz stemming
    d) IndriRunQuery using okapi, count=1, trecFormat

    I have downloaded enwikibook in XML format http://dumps.wikimedia.org/enwiki/20160113/ and then extracted for each of its pages the content, in txt format. Each "book" is now a file and I processed that with Indri, obtaining the above results. That resulted in 66174 documents.

    I have then selected the set of books with max score for each query (2092) with the intention of finding the sentences that best match (and hopefully achieve a higher score) and then generated for each of them one sentence per text. That resulted 79458 sentences.

    I then indexed the 79458 sentences with IndriBuildIndex and then run IndriRunQuery against them however only a small portion of queries then delivers any result, I would expect instead that even in case of bad result a score would be returned.

     
  • luca

    luca - 2016-02-08

    dumpindex does not deliver any error message with any of the indices

     
  • Lemur Project

    Lemur Project - 2016-02-08

    And the documents you expect to be retrieved do indeed have the query term(s) represented in them according to the index dump? Use "dumpindex t" to get inverted lists for query terms.

    This is just to confirm that the creation process for the second index did not result in documents that contained some of your search terms being excluded from the second index sources, or perhaps formatting of the sources resulted in parts of documents not being indexed.

     
  • luca

    luca - 2016-02-09

    I processed few queries that did not deliver any results using IndriRunQuery, using the Java user interface:

    set PATH= %PATH%;C:\tool-ir-indri\bin\ java -jar lib\RetUI.jar

    and then queried against the enwikibooks-index , and against the enwikibooksphrases-index

    This confirmed the results of IndriRunQuery, and namely meaningful results against the enwikibooks-index and no results against the enwikibooksphrases-index.

    I then decomposed those queries in the individual terms, and then run

    dumpindex enwikibooks-index t <term>
    dumpindex enwikibooksphrases-index t <term>

    and out of 10 individual terms, none had any result in the enwikibooksphrases-index but they all matched the enwikibooks-index

    As positive control of the integrity of the enwikibooksphrases-index I then

    dumpindex enwikibooksphrases-index v | more

    that delivered many lines. I then took the first result (food) and tried:

    dumpindex enwikibooksphrases-index t food | more
    that delivered many rows.

    Conclusion: the indexes are properly done, however for some reason the process that generated the sentences did not have any matching term !

    So, no bugs in your system but perhaps in the system I used to generate the sentences corpus.
    Thankyou for your support.
    with this exercise I've learned a lot and also how to use Indri from Java.

    That could turn out to be useful, for example to integrate Indri in the KNIME platform :-)

     

    Last edit: luca 2016-02-09
  • Sameh souma

    Sameh souma - 2016-11-08

    plz help me to correct this error
    Exception in thread "main" java.lang.UnsatisfiedLinkError: no indri_jni in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at lemurproject.indri.indriJNI.<clinit>(indriJNI.java:109)
    at lemurproject.indri.QueryEnvironment.<init>(QueryEnvironment.java:39)
    at exemples.TestIndri.main(TestIndri.java:15)
    thank you in advance

     
  • Sameh souma

    Sameh souma - 2016-11-09

    I resolved the problem by defining the path in Netbeans

    Setting the java.library path. using Netbeans: In order to define the java.library.path property in Netbeans, the following steps must be completed:
    1/ Select your project in the Projects area and press a right click on it.
    2/ Select Properties and then, move to the Run tab.
    3/ In the VM Options field, add the following option, based on your library’s path:
    4/ java -Djava.library.path=<path_to_dll>
    path_to_dll = path to folder contain the file indri_jni.dll
    5/ Click on OK in order for the window to close.

     
  • Sameh souma

    Sameh souma - 2016-11-12

    Plz can you help me to find the function to modify the search method in java like setBaseline("tfidf") in C++.? i think that the two lines is the response but nothing changes when i runnig system.
    String[] rules = {"method:d,mu:1200,documentMu:150,field:text"};
    env.setScoringRules(rules);

    I used also the basic command line '$ ./IndriRunQuery <parameter_file>' but nothing changes. i find usually the same results of search
    <rule>method:linear,collectionLambda:0.4,documentLambda:0.2</rule>
    and
    <rule>method:twostage,mu:1500,lambda:0.4</rule>

    plz help me to find a solution

    thank you in advance

     

    Last edit: Sameh souma 2016-11-13
  • David Fisher

    David Fisher - 2016-11-14

    That API call is only available in C++, it can not be used with the java wrappers.

     
    • Sameh souma

      Sameh souma - 2016-11-19

      So i can't modify the search method neither in java programming nor in commande line??
      thank you

       
    • Sameh souma

      Sameh souma - 2016-12-01

      Thank you Dr for your time and effort.
      So i can't modify the search method neither in java programming nor in commande line???
      that's right??

       
  • Sameh souma

    Sameh souma - 2017-03-02

    Dear All,
    please how i can modify the retrieval method to running query?
    I try with parameter file to change the retrieval method <retModel>0</retModel> to execute the TFIDF method but nothing change in the results (the same result with the default method)
    the parameter file
    <parameters>
    <retModel>0</retModel>
    <query>
    <number> query-number </number>
    text-of-query
    <type> indri </type>

    </query>
    <index> path-of-repository </index>
    <trecFormat>true</trecFormat>
    </parameters>

     
  • Sameh souma

    Sameh souma - 2017-03-02

    Greet.

    I tried to execute query with IndriRunQuery using this parameter file but I get an error message:
    **file parameter
    <parameters>
    <baseline>tfidf,k1:1.0,b:0.3</baseline>
    <query>
    <number>2009040</number>
    #1(steam engine)
    <type>indri</type>
    </query>
    <index>D:\INEX2009Index</index>
    <trecFormat>true</trecFormat>
    </parameters>

    *error message
    # EXCEPTION in query 2009040: IndriRunQuery.cpp(645): Can't run baseline on this query: #1(steam engine)
    indri query language operators are not allowed.

    Please what is the problem?
    Thank you in advance

     

    Last edit: Sameh souma 2017-03-02
  • David Fisher

    David Fisher - 2017-03-02

    You can't use indri query language queries with the baseline methods, as they do not support a structured query language. Your query must be terms only, no operators. Just as the erro message reported.

     
  • Sameh souma

    Sameh souma - 2017-03-02

    so I can't search about a phrase using the baseline?

     
  • David Fisher

    David Fisher - 2017-03-02

    No. The baseline models do not support phrases, or any other structured query operator. The baseline models are strictly bag of words.

     
  • Shoeb

    Shoeb - 2017-03-02

    Is there any other models support phrases except the default retrieval model that Indri implements?

     
  • David Fisher

    David Fisher - 2017-03-03

    No.

     
  • Sameh souma

    Sameh souma - 2017-03-03

    Sorry Dr.David. Can I use the RetEval to search about a phrase using the model TF/IDF ?
    thank you in advance

     
  • David Fisher

    David Fisher - 2017-03-03

    No. tf.idf does not support phrases, nor any other structured query language operator.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.