Dear All,
I am experiencing a very strange behaviour by IndriRunQuery.exe when running queries against a specific index it fails to deliver any output for some queries.
When I then resubmit the queries that failed, then for some of those queries I do get a result, while for most of the others don't.
The same set of queries run fine against another index.
The indexing didn't deliver any result.
There might be, and you should submit a bug report in the tickets section. You need to include all of the relevant details, such as, operating system version, indri version, whether you compiled it yourself or are using the binary distribution, Visual Studio version if you compiled it yourself, full description of the conditions necessary to replicate the behavior, etc. Note that the same details are requiered for the index build.
Your description above is unclear, do you mean you have two indexes that are both supposed to contain the same data? Or do you mean you just have two different indexes and only produce the behavior on one of them?
In either case, if you experience an odd behavior at retrieval time, the first test is to rebuild your index and compare to determine if perhaps the misbehaving index is corrupted in some fashion.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
dear David and Stephan, thankyou for your support.
I have now re-run the indexing, on the same corpus, and stored the indexes in a different folder.
I then rerun the queries and compared the results, identically : incomplete.
This means that for some queries the IndriRuQuery does not return any result.
Let me explain the goal: I would like to query the individual phrases of english wikipedia and retrieve the scores of the best matching queries. This is in the context of the kaggle competition that will complete this week www.kaggle.com/c/the-allen-ai-science-challenge.
There are thousands of questions and for each question 4 possible answers. With my team ("LTSB" - currently achieving 33.6% score -random result is 25%) we have implemented a strategy that in a similar competition (http://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-BhattacharyaEt2012.pdf) scored at the second top position and consisted in the following steps:
a) generate pseudo sentences combining each question with each of the possible answers
b) POS tagging and filter the pseudo sentences, keeping only specific term POS
c) Process a large text corpus of relevant content by
c-1) split each sentence into an individual txt file
c-2) IndriBuildIndex using krovetz stemming
d) IndriRunQuery using okapi, count=1, trecFormat
I have downloaded enwikibook in XML format http://dumps.wikimedia.org/enwiki/20160113/ and then extracted for each of its pages the content, in txt format. Each "book" is now a file and I processed that with Indri, obtaining the above results. That resulted in 66174 documents.
I have then selected the set of books with max score for each query (2092) with the intention of finding the sentences that best match (and hopefully achieve a higher score) and then generated for each of them one sentence per text. That resulted 79458 sentences.
I then indexed the 79458 sentences with IndriBuildIndex and then run IndriRunQuery against them however only a small portion of queries then delivers any result, I would expect instead that even in case of bad result a score would be returned.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And the documents you expect to be retrieved do indeed have the query term(s) represented in them according to the index dump? Use "dumpindex t" to get inverted lists for query terms.
This is just to confirm that the creation process for the second index did not result in documents that contained some of your search terms being excluded from the second index sources, or perhaps formatting of the sources resulted in parts of documents not being indexed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I processed few queries that did not deliver any results using IndriRunQuery, using the Java user interface:
set PATH= %PATH%;C:\tool-ir-indri\bin\
java -jar lib\RetUI.jar
and then queried against the enwikibooks-index , and against the enwikibooksphrases-index
This confirmed the results of IndriRunQuery, and namely meaningful results against the enwikibooks-index and no results against the enwikibooksphrases-index.
I then decomposed those queries in the individual terms, and then run
dumpindex enwikibooks-index t <term>
dumpindex enwikibooksphrases-index t <term>
and out of 10 individual terms, none had any result in the enwikibooksphrases-index but they all matched the enwikibooks-index
As positive control of the integrity of the enwikibooksphrases-index I then
dumpindex enwikibooksphrases-index v | more
that delivered many lines. I then took the first result (food) and tried:
dumpindex enwikibooksphrases-index t food | more
that delivered many rows.
Conclusion: the indexes are properly done, however for some reason the process that generated the sentences did not have any matching term !
So, no bugs in your system but perhaps in the system I used to generate the sentences corpus.
Thankyou for your support.
with this exercise I've learned a lot and also how to use Indri from Java.
That could turn out to be useful, for example to integrate Indri in the KNIME platform :-)
Last edit: luca 2016-02-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
plz help me to correct this error
Exception in thread "main" java.lang.UnsatisfiedLinkError: no indri_jni in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at lemurproject.indri.indriJNI.<clinit>(indriJNI.java:109)
at lemurproject.indri.QueryEnvironment.<init>(QueryEnvironment.java:39)
at exemples.TestIndri.main(TestIndri.java:15) thank you in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I resolved the problem by defining the path in Netbeans
Setting the java.library path. using Netbeans: In order to define the java.library.path property in Netbeans, the following steps must be completed:
1/ Select your project in the Projects area and press a right click on it.
2/ Select Properties and then, move to the Run tab.
3/ In the VM Options field, add the following option, based on your library’s path:
4/ java -Djava.library.path=<path_to_dll>
path_to_dll = path to folder contain the file indri_jni.dll
5/ Click on OK in order for the window to close.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Plz can you help me to find the function to modify the search method in java like setBaseline("tfidf") in C++.? i think that the two lines is the response but nothing changes when i runnig system.
String[] rules = {"method:d,mu:1200,documentMu:150,field:text"};
env.setScoringRules(rules);
I used also the basic command line '$ ./IndriRunQuery <parameter_file>' but nothing changes. i find usually the same results of search
<rule>method:linear,collectionLambda:0.4,documentLambda:0.2</rule>
and
<rule>method:twostage,mu:1500,lambda:0.4</rule>
plz help me to find a solution
thank you in advance
Last edit: Sameh souma 2016-11-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear All,
please how i can modify the retrieval method to running query?
I try with parameter file to change the retrieval method <retModel>0</retModel> to execute the TFIDF method but nothing change in the results (the same result with the default method) the parameter file
<parameters>
<retModel>0</retModel>
<query>
<number> query-number </number> text-of-query
<type> indri </type>
I tried to execute query with IndriRunQuery using this parameter file but I get an error message:
**file parameter
<parameters>
<baseline>tfidf,k1:1.0,b:0.3</baseline>
<query>
<number>2009040</number> #1(steam engine)
<type>indri</type>
</query>
<index>D:\INEX2009Index</index>
<trecFormat>true</trecFormat>
</parameters>
*error message # EXCEPTION in query 2009040: IndriRunQuery.cpp(645): Can't run baseline on this query: #1(steam engine)
indri query language operators are not allowed.
Please what is the problem?
Thank you in advance
Last edit: Sameh souma 2017-03-02
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can't use indri query language queries with the baseline methods, as they do not support a structured query language. Your query must be terms only, no operators. Just as the erro message reported.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear All,
I am experiencing a very strange behaviour by IndriRunQuery.exe when running queries against a specific index it fails to deliver any output for some queries.
When I then resubmit the queries that failed, then for some of those queries I do get a result, while for most of the others don't.
The same set of queries run fine against another index.
The indexing didn't deliver any result.
the index\0\manifest of the failing index
<parameters>
<code-build-date>Dec 22 2015</code-build-date>
<corpus>
<document-base>1</document-base>
<frequent-terms>257</frequent-terms>
<maximum-document>79457</maximum-document>
<total-documents>79456</total-documents>
<total-terms>2408792</total-terms>
<unique-terms>14889</unique-terms>
</corpus>
<fields></fields>
<indri-distribution>Indri development release 5.8</indri-distribution>
<type>DiskIndex</type>
</parameters>
the same file for the index that properly works
<parameters>
<code-build-date>Dec 22 2015</code-build-date>
<corpus>
<document-base>1</document-base>
<frequent-terms>4328</frequent-terms>
<maximum-document>66175</maximum-document>
<total-documents>66174</total-documents>
<total-terms>48902567</total-terms>
<unique-terms>662912</unique-terms>
</corpus>
<fields></fields>
<indri-distribution>Indri development release 5.8</indri-distribution>
<type>DiskIndex</type>
</parameters>
I notice that the failing index has more documents (79457) compared to those that did not fail (66174).
Could there be the reason ?
Is there an upper limit for IndriRunIndex 5.8 on 32 Bit windows machines ?
IndriRunIndex isn't going to be limited to processing indexes built by the same version as the run.
Clearly you have two different indexes as indicated by the doc counts, so I would have to say build parameters were not the same for both indexes.
These appear to be very small indexes, so Just rebuild them and query again.
Thankyou, however in my opinion there is a bug in IndriRunQuery in not delivering any result for some specific queries.
There might be, and you should submit a bug report in the tickets section. You need to include all of the relevant details, such as, operating system version, indri version, whether you compiled it yourself or are using the binary distribution, Visual Studio version if you compiled it yourself, full description of the conditions necessary to replicate the behavior, etc. Note that the same details are requiered for the index build.
Your description above is unclear, do you mean you have two indexes that are both supposed to contain the same data? Or do you mean you just have two different indexes and only produce the behavior on one of them?
In either case, if you experience an odd behavior at retrieval time, the first test is to rebuild your index and compare to determine if perhaps the misbehaving index is corrupted in some fashion.
Are you able to dump the indexes (dumpindex)?
If you can't [fully] dump the contents of the failing index, then it probably is corrupted in some way.
If you can, look for your query terms in the listings to confirm that at least some documents should have been returned for the query.
dear David and Stephan, thankyou for your support.
I have now re-run the indexing, on the same corpus, and stored the indexes in a different folder.
I then rerun the queries and compared the results, identically : incomplete.
This means that for some queries the IndriRuQuery does not return any result.
Let me explain the goal: I would like to query the individual phrases of english wikipedia and retrieve the scores of the best matching queries. This is in the context of the kaggle competition that will complete this week www.kaggle.com/c/the-allen-ai-science-challenge.
There are thousands of questions and for each question 4 possible answers. With my team ("LTSB" - currently achieving 33.6% score -random result is 25%) we have implemented a strategy that in a similar competition (http://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-BhattacharyaEt2012.pdf) scored at the second top position and consisted in the following steps:
a) generate pseudo sentences combining each question with each of the possible answers
b) POS tagging and filter the pseudo sentences, keeping only specific term POS
c) Process a large text corpus of relevant content by
c-1) split each sentence into an individual txt file
c-2) IndriBuildIndex using krovetz stemming
d) IndriRunQuery using okapi, count=1, trecFormat
I have downloaded enwikibook in XML format http://dumps.wikimedia.org/enwiki/20160113/ and then extracted for each of its pages the content, in txt format. Each "book" is now a file and I processed that with Indri, obtaining the above results. That resulted in 66174 documents.
I have then selected the set of books with max score for each query (2092) with the intention of finding the sentences that best match (and hopefully achieve a higher score) and then generated for each of them one sentence per text. That resulted 79458 sentences.
I then indexed the 79458 sentences with IndriBuildIndex and then run IndriRunQuery against them however only a small portion of queries then delivers any result, I would expect instead that even in case of bad result a score would be returned.
dumpindex does not deliver any error message with any of the indices
And the documents you expect to be retrieved do indeed have the query term(s) represented in them according to the index dump? Use "dumpindex t" to get inverted lists for query terms.
This is just to confirm that the creation process for the second index did not result in documents that contained some of your search terms being excluded from the second index sources, or perhaps formatting of the sources resulted in parts of documents not being indexed.
I processed few queries that did not deliver any results using IndriRunQuery, using the Java user interface:
set PATH= %PATH%;C:\tool-ir-indri\bin\ java -jar lib\RetUI.jar
and then queried against the enwikibooks-index , and against the enwikibooksphrases-index
This confirmed the results of IndriRunQuery, and namely meaningful results against the enwikibooks-index and no results against the enwikibooksphrases-index.
I then decomposed those queries in the individual terms, and then run
dumpindex enwikibooks-index t <term>
dumpindex enwikibooksphrases-index t <term>
and out of 10 individual terms, none had any result in the enwikibooksphrases-index but they all matched the enwikibooks-index
As positive control of the integrity of the enwikibooksphrases-index I then
dumpindex enwikibooksphrases-index v | more
that delivered many lines. I then took the first result (food) and tried:
dumpindex enwikibooksphrases-index t food | more
that delivered many rows.
Conclusion: the indexes are properly done, however for some reason the process that generated the sentences did not have any matching term !
So, no bugs in your system but perhaps in the system I used to generate the sentences corpus.
Thankyou for your support.
with this exercise I've learned a lot and also how to use Indri from Java.
That could turn out to be useful, for example to integrate Indri in the KNIME platform :-)
Last edit: luca 2016-02-09
plz help me to correct this error
Exception in thread "main" java.lang.UnsatisfiedLinkError: no indri_jni in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at lemurproject.indri.indriJNI.<clinit>(indriJNI.java:109)
at lemurproject.indri.QueryEnvironment.<init>(QueryEnvironment.java:39)
at exemples.TestIndri.main(TestIndri.java:15)
thank you in advance
I resolved the problem by defining the path in Netbeans
Setting the java.library path. using Netbeans: In order to define the java.library.path property in Netbeans, the following steps must be completed:
1/ Select your project in the Projects area and press a right click on it.
2/ Select Properties and then, move to the Run tab.
3/ In the VM Options field, add the following option, based on your library’s path:
4/ java -Djava.library.path=<path_to_dll>
path_to_dll = path to folder contain the file indri_jni.dll
5/ Click on OK in order for the window to close.
Plz can you help me to find the function to modify the search method in java like setBaseline("tfidf") in C++.? i think that the two lines is the response but nothing changes when i runnig system.
String[] rules = {"method:d,mu:1200,documentMu:150,field:text"};
env.setScoringRules(rules);
I used also the basic command line '$ ./IndriRunQuery <parameter_file>' but nothing changes. i find usually the same results of search
<rule>method:linear,collectionLambda:0.4,documentLambda:0.2</rule>
and
<rule>method:twostage,mu:1500,lambda:0.4</rule>
plz help me to find a solution
thank you in advance
Last edit: Sameh souma 2016-11-13
That API call is only available in C++, it can not be used with the java wrappers.
So i can't modify the search method neither in java programming nor in commande line??
thank you
Thank you Dr for your time and effort.
So i can't modify the search method neither in java programming nor in commande line???
that's right??
Dear All,
text-of-query
please how i can modify the retrieval method to running query?
I try with parameter file to change the retrieval method <retModel>0</retModel> to execute the TFIDF method but nothing change in the results (the same result with the default method)
the parameter file
<parameters>
<retModel>0</retModel>
<query>
<number> query-number </number>
<type> indri </type>
</query>
<index> path-of-repository </index>
<trecFormat>true</trecFormat>
</parameters>
The parameters you have shown are for the Lemur Toolkit program RetEval. They are not the parameters for IndriRunQuery.
Please review the documentation at https://sourceforge.net/p/lemur/wiki/IndriRunQuery/ and the full listing of parameters accepeted by IndriRunQuery at https://lemur.sourceforge.io/indri/IndriRunQuery.html
Greet.
I tried to execute query with IndriRunQuery using this parameter file but I get an error message:
#1(steam engine)
**file parameter
<parameters>
<baseline>tfidf,k1:1.0,b:0.3</baseline>
<query>
<number>2009040</number>
<type>indri</type>
</query>
<index>D:\INEX2009Index</index>
<trecFormat>true</trecFormat>
</parameters>
*error message
# EXCEPTION in query 2009040: IndriRunQuery.cpp(645): Can't run baseline on this query: #1(steam engine)
indri query language operators are not allowed.
Please what is the problem?
Thank you in advance
Last edit: Sameh souma 2017-03-02
You can't use indri query language queries with the baseline methods, as they do not support a structured query language. Your query must be terms only, no operators. Just as the erro message reported.
so I can't search about a phrase using the baseline?
No. The baseline models do not support phrases, or any other structured query operator. The baseline models are strictly bag of words.
Is there any other models support phrases except the default retrieval model that Indri implements?
No.
Sorry Dr.David. Can I use the RetEval to search about a phrase using the model TF/IDF ?
thank you in advance
No. tf.idf does not support phrases, nor any other structured query language operator.