help needed with Trec eval

  • nitin hardeniy

    nitin hardeniy - 2011-02-24

    I have successfully index data of FIRE 2010 for hindi .

    which is similar to TREC format .
    I am getting the result for the test query. result file look likes

    $/IndriRunQuery -query="गुज्जरों और मीणा समुदाय के बीच संघर्ष" -index=/home/nitin/lemur_26/share/lemur/Hindi_index -count=100 -trecFormat=true -runID=1 -queryOffset=76

    76 Q0 range32_save328_d00001_f01898 1 -8.52375 1
    76 Q0 range30_save306_d00000_f00632 2 -8.54346 1
    76 Q0 range31_save311_d00000_f00329 3 -8.57554 1
    76 Q0 range32_save321_d00006_f00240 4 -8.62158 1
    76 Q0 range32_save324_d00002_f01563 5 -8.64949 1
    76 Q0 range31_save311_d00001_f01193 6 -8.65276 1
    76 Q0 range30_save308_d00002_f00640 7 -8.65376 1
    76 Q0 range31_save318_d00000_f00332 8 -8.68456 1
    76 Q0 range32_save324_d00002_f01397 9 -8.70746 1
    76 Q0 range31_save311_d00001_f01245 10 -8.71038 1
    76 Q0 range29_save294_d00001_f01594 11 -8.71069 1
    76 Q0 range30_save308_d00002_f00695 12 -8.71132 1
    76 Q0 range32_save321_d00006_f00247 13 -8.7572 1
    76 Q0 range27_save277_d00005_f01786 14 -8.84036 1
    76 Q0 range27_save276_d00005_f00213 15 -8.84494 1
    76 Q0 range27_save275_d00004_f01756 16 -8.95075 1
    76 Q0 range27_save278_d00007_f00202 17 -8.95203 1

    i put these results to > trec_top_file

    I have a relevance judgement file for these data which look similar to TREC

    76 Q0 fullnews_id_3847178_date_8_1_2006.utf8 0
    76 Q0 fullnews_id_3857491_date_12_1_2006.utf8 0
    76 Q0 fullnews_id_3864431_date_15_1_2006.utf8 0
    76 Q0 fullnews_id_3890683_date_20_1_2006.utf8 0
    76 Q0 fullnews_id_3907961_date_28_1_2006.utf8 0
    76 Q0 fullnews_id_3913070_date_28_1_2006.utf8 1
    76 Q0 fullnews_id_3913072_date_28_1_2006.utf8 0
    76 Q0 fullnews_id_3913074_date_28_1_2006.utf8 0
    76 Q0 fullnews_id_3913111_date_30_1_2006.utf8 0
    76 Q0 fullnews_id_3915762_date_31_1_2006.utf8 0
    76 Q0 fullnews_id_3931361_date_6_2_2006.utf8 0
    76 Q0 fullnews_id_3946787_date_11_2_2006.utf8 0
    76 Q0 fullnews_id_3951569_date_12_2_2006.utf8 0
    put these to >>trec_rel_file

    Now when i am running the command like

    $sh trec_eval /home/nitin/research/experiment\ data/FIRE/hi.qrels.76-125.2010.txt fire_top_res_q1

    trec_eval: 1: Syntax error: "(" unexpected

    What could be the problem ????

    i have query files as hi.topics that contains

    <top lang="hi">
    गुज्जरों और मीणा समुदाय के बीच संघर्ष
    गुज्जरों को अनुसूचित जनजाति में वर्गीकृत करने के लिये मीणा नेताओं का
    प्रतिवाद प्रकट करना

    <narr>गुज्जर समुदाय अपने को अनुसूचित जनजाति में वर्गीकृत कराने के लिये आन्दोलन
    कर रहे हैं৷ इसके विरुद्ध मीणा समुदाय के नेताओं का प्रतिवाद करना৷मीणाओं के
    आपत्ति करने के प्रधान कारण क्या हैं? प्रासंगिक प्रलेख में इन दोंनो समुदायों के
    संघर्ष के मुख्य कारणों का उल्लेख रहना चाहिए</narr>

    <top lang="hi">
    हिजबुल्लाह गुरिल्लाओं के हमले
    हिजबुल्लाह गुरिल्लाओं का इस्राइली सैनिकों और भारतीय सैनिकों पर

    <narr>इस्राइल में शाँति स्थापित करने गये हुए भारतीय एवं इस्राइली सैनिकों पर
    हिजबुल्लाह गुरिल्लाओं का हमला৷ प्रासंगिक प्रलेख में इस हमले से सम्बन्धित
    सूचनाएँ होनी चाहिये</narr>

    <top lang="hi">
    राम मंदिर को लेकर आडवाणी-सिंघल विवाद
    राम मंदिर को लेकर भाजपा नेता एल.के आडवाणी और विश्व हिन्दू परिषद् के
    अध्यक्ष अशोक सिंघल के मध्य विवाद

    <narr>प्रासंगिक प्रलेख में भाजपा नेता एल.के आडवाणी और विश्व हिन्दू परिषद् के
    अध्यक्ष अशोक सिंघल के बीच वाद-विवाद से सम्बन्धित सूचनाएँ यहाँ होनी चाहिये। इन
    नेताओं का अन्य किसी विषय से संबंधित वितर्क अथवा भाजपा या विहिप के अन्य नेताओं
    के मध्य चलने वाला विवाद या अंतरद्वन्द्व यहाँ अप्रासंगिक हैं</narr>

    How can be we run these files as TREC can you give some guild line for that .

  • nitin hardeniy

    nitin hardeniy - 2011-02-24

    when i am running the command like
    nitin@nitin-desktop:~/lemur_26/trec_eval.8.1$ sh trec_eval

    trec_eval: 1: Syntax error: "(" unexpected

    still it's getting the error

    if i change this file to unix file using

    $ perl -pe 's/\r\n|\n|\r/\n/g' trec_eval>trec_eval

    and than run

    $sh trec_eval -q -a hi_unix fire_top_res_q

    i do not get any thing ..

  • mouni

    mouni - 2012-02-16


    I am getting the following problem when i use trec_eval

    trec_eval.form_res_qrels: duplicate docs AP890109-0219trec_eval: Can't
    calculate measure 'num_ret'

    Also there is a repetition of every document in the results retrieved with
    same score but different rank. I do not know why is such a repetition there.
    Is it because of that? What to do to fix it?

    Please help, I am blocked on this.


  • Bevan Koopman

    Bevan Koopman - 2012-02-16

    Sounds like you may have added the same document twice to your index, remember
    if you run BuildIndex consecutive times the index is update, not overwritten.

    Try delete your index and re-index your documents, then re-run.

    Good luck

  • mouni

    mouni - 2012-02-16


    Thanks a lot for your reply.

    I indexed everything again to make sure that the problem is not due to
    indexing multiple times. But still every document is present twice in the
    results with same score and different rank !

    please help !

  • mouni

    mouni - 2012-02-16

    Hey resolved ! thank you so much :-) I used server and index both in the query
    parameter file which was the result of duplication.

  • mouni

    mouni - 2012-02-17


    I got the mean average precision value using trec_eval to be
    map all 0.0816
    for running the titles of trec topics 51-100 as queries.

    But I used the relevance judgements of trec51-100 ad-hoc for this. I read
    somewhere that IndriRunQuery uses QueryLikelihood model by default. So is
    something wrong with my approach which is the result of that map value?

    Also I want to understand the query model in lemur and query expansion in
    lemur. I would be grateful if you could please point me to some good sources
    because I am totally new to information retrieval altogether and I want to
    have my basics strong.

    Thanks !

  • mouni

    mouni - 2012-02-18


    I indexed only the first two disks of trec data and also the missing <name>
    tag in the stemmer tag in the build parameter file for IndriBuildIndex and the
    value came up to 0.2026.

    Please let me know if something else could have been incorrect,

    Thank you.

  • mouni

    mouni - 2012-02-18

    I was implementing tfidf with IndriRunQuery and
    I did not use any indri query language operators because they can't be used
    with baseline retrieval.
    But what if I have some query like -
    "Alternative/renewable Energy Plant & Equipment Installation" OR
    "Military Coups D'etat"
    Then the following exception is being thrown

    EXCEPTION in query 84: IndriRunQuery.cpp(410): QueryThread::_runQuery

    ../src/QueryEnvironment.cpp(874): Couldn't understand this query: NoViableAlt

    Do I need to use some encoding to avoid this exception?

  • mouni

    mouni - 2012-03-01


    Thanks for your reply !

    Now I would like to ask if Indri supports KL Divergence retrieval method by
    any chance? Because I need to compare boolean retrieval and KLD based query
    expansion. (Since Indri's support for boolean retrieval is very easy I want to
    stick to Indri if possible) I came to know from

    that RetEval allows but since it is from the lemur toolkit, what to do if I
    want to do something like that with Indri?

    Do I need to build the module in indri? Also please let me know pointers which
    discuss how to proceed with RetEval.

    Also I came to know from these forums that some lemur apis are deprecated.
    What about RetEval?

    Please help !
    Thanks !


