I have successfully index data of FIRE 2010 for hindi . http://www.isical.ac.
which is similar to TREC format .
I am getting the result for the test query. result file look likes
$/IndriRunQuery -query="गुज्जरों और मीणा समुदाय के बीच संघर्ष" -index=/home/nitin/lemur_26/share/lemur/Hindi_index -count=100 -trecFormat=true -runID=1 -queryOffset=76
76 Q0 range32_save328_d00001_f01898 1 -8.52375 1
76 Q0 range30_save306_d00000_f00632 2 -8.54346 1
76 Q0 range31_save311_d00000_f00329 3 -8.57554 1
76 Q0 range32_save321_d00006_f00240 4 -8.62158 1
76 Q0 range32_save324_d00002_f01563 5 -8.64949 1
76 Q0 range31_save311_d00001_f01193 6 -8.65276 1
76 Q0 range30_save308_d00002_f00640 7 -8.65376 1
76 Q0 range31_save318_d00000_f00332 8 -8.68456 1
76 Q0 range32_save324_d00002_f01397 9 -8.70746 1
76 Q0 range31_save311_d00001_f01245 10 -8.71038 1
76 Q0 range29_save294_d00001_f01594 11 -8.71069 1
76 Q0 range30_save308_d00002_f00695 12 -8.71132 1
76 Q0 range32_save321_d00006_f00247 13 -8.7572 1
76 Q0 range27_save277_d00005_f01786 14 -8.84036 1
76 Q0 range27_save276_d00005_f00213 15 -8.84494 1
76 Q0 range27_save275_d00004_f01756 16 -8.95075 1
76 Q0 range27_save278_d00007_f00202 17 -8.95203 1
i put these results to > trec_top_file
I have a relevance judgement file for these data which look similar to TREC
76 Q0 fullnews_id_3847178_date_8_1_2006.utf8 0
76 Q0 fullnews_id_3857491_date_12_1_2006.utf8 0
76 Q0 fullnews_id_3864431_date_15_1_2006.utf8 0
76 Q0 fullnews_id_3890683_date_20_1_2006.utf8 0
76 Q0 fullnews_id_3907961_date_28_1_2006.utf8 0
76 Q0 fullnews_id_3913070_date_28_1_2006.utf8 1
76 Q0 fullnews_id_3913072_date_28_1_2006.utf8 0
76 Q0 fullnews_id_3913074_date_28_1_2006.utf8 0
76 Q0 fullnews_id_3913111_date_30_1_2006.utf8 0
76 Q0 fullnews_id_3915762_date_31_1_2006.utf8 0
76 Q0 fullnews_id_3931361_date_6_2_2006.utf8 0
76 Q0 fullnews_id_3946787_date_11_2_2006.utf8 0
76 Q0 fullnews_id_3951569_date_12_2_2006.utf8 0
put these to >>trec_rel_file
Now when i am running the command like
$sh trec_eval /home/nitin/research/experiment\
trec_eval: 1: Syntax error: "(" unexpected
What could be the problem ????
i have query files as hi.topics that contains
How can be we run these files as TREC can you give some guild line for that .
This is an OS issue.
Perhaps http://nwn.bioware.com/forums/viewtopic.html?topic=555240&forum=72 is similar
when i am running the command like
nitin@nitin-desktop:~/lemur_26/trec_eval.8.1$ sh trec_eval
still it's getting the error
if i change this file to unix file using
$ perl -pe 's/\r\n|\n|\r/\n/g' trec_eval>trec_eval
and than run
$sh trec_eval -q -a hi_unix fire_top_res_q
i do not get any thing ..
I am getting the following problem when i use trec_eval
trec_eval.form_res_qrels: duplicate docs AP890109-0219trec_eval: Can't
calculate measure 'num_ret'
Also there is a repetition of every document in the results retrieved with
same score but different rank. I do not know why is such a repetition there.
Is it because of that? What to do to fix it?
Please help, I am blocked on this.
Sounds like you may have added the same document twice to your index, remember
if you run BuildIndex consecutive times the index is update, not overwritten.
Try delete your index and re-index your documents, then re-run.
Thanks a lot for your reply.
I indexed everything again to make sure that the problem is not due to
indexing multiple times. But still every document is present twice in the
results with same score and different rank !
please help !
Hey resolved ! thank you so much :-) I used server and index both in the query
parameter file which was the result of duplication.
I got the mean average precision value using trec_eval to be
map all 0.0816
for running the titles of trec topics 51-100 as queries.
But I used the relevance judgements of trec51-100 ad-hoc for this. I read
somewhere that IndriRunQuery uses QueryLikelihood model by default. So is
something wrong with my approach which is the result of that map value?
Also I want to understand the query model in lemur and query expansion in
lemur. I would be grateful if you could please point me to some good sources
because I am totally new to information retrieval altogether and I want to
have my basics strong.
I indexed only the first two disks of trec data and also the missing <name>
tag in the stemmer tag in the build parameter file for IndriBuildIndex and the
value came up to 0.2026.
Please let me know if something else could have been incorrect,
I was implementing tfidf with IndriRunQuery and
I did not use any indri query language operators because they can't be used
with baseline retrieval.
But what if I have some query like -
"Alternative/renewable Energy Plant & Equipment Installation" OR
"Military Coups D'etat"
Then the following exception is being thrown
../src/QueryEnvironment.cpp(874): Couldn't understand this query: NoViableAlt
Do I need to use some encoding to avoid this exception?
Please review the query language grammar, https://sourceforge.net/apps/trac/l
Queries may not contain '/', '&', ''' (single quote character).
Separately, using apparent boolean operator words, such as OR, will result in
your querying on the literal word "or", which is unlikely to be your intent.
Thanks for your reply !
Now I would like to ask if Indri supports KL Divergence retrieval method by
any chance? Because I need to compare boolean retrieval and KLD based query
expansion. (Since Indri's support for boolean retrieval is very easy I want to
stick to Indri if possible) I came to know from http://www.lemurproject.org/l
that RetEval allows but since it is from the lemur toolkit, what to do if I
want to do something like that with Indri?
Do I need to build the module in indri? Also please let me know pointers which
discuss how to proceed with RetEval.
Also I came to know from these forums that some lemur apis are deprecated.
What about RetEval?
Please help !
Log in to post a comment.