Raul Banderas

Site Status

More
Show:

What's happening?

  • Eliminating javascript from text snippets

    Hi, is there any way in Lemur API, or any ideas to eliminate javascript code from text snippets in retrieval?. thanks in advance.

    2009-11-24 19:52:30 UTC in The Lemur Toolkit

  • Followup: RE: A few simple questions

    Hi, David.. about your comment here that is not recommendable to use trecweb format when building an Indri Index. I have all my input documents in trecweb, in separated files, and all in UTF-8. This eliminates the possible encoding mismatch you mention here? or should I do something else to avoid that mismatch?. Thanks.

    2009-11-04 02:38:35 UTC in The Lemur Toolkit

  • Followup: RE: How to generate a query file

    Hi, here is a link with information on how to specify retrieval parameters and how to format your queries [IndriRunQuery](http://sourceforge.net/apps/trac/lemur/wiki/RetEval%20and%20IndriRunQuery) regards. raul.

    2009-09-24 17:31:14 UTC in The Lemur Toolkit

  • Followup: RE: getting <meta data=...

    I forgot that I posted this before.. Anybody knows how to do this?.

    2009-09-05 19:45:23 UTC in The Lemur Toolkit

  • repeated domains in ranked search results

    Hi, Im testing retrieval on an Indri Index of ranked (with pagerank tool) documents. And Im comparing the results with the same Index but with out the priors added to it. The first difference i note, is that in the serch results of the ranked index, theres is a lot of documents from the same domain together, which doesn't happen with the no ranked index. Does any body knows how to...

    2009-09-05 03:16:53 UTC in The Lemur Toolkit

  • harvestlinks on indri index

    Hi, it is possible to use the harvestlinks tool using an indri index as source instead of the corpus path with the documents to index? Im using pagerank with an indri index as source and I want to know if I can do the same with the harvestlink app. Thanks -raul-.

    2009-09-04 03:35:50 UTC in The Lemur Toolkit

  • Followup: RE: harvestlinks problem

    you're completely right! there was a problem with the trecweb format, it seems like it needs always to have a line break after each tag ( <DOC>, <DOCNO> , etc) Thanks.

    2009-08-14 00:46:48 UTC in The Lemur Toolkit

  • Followup: RE: harvestlinks problem

    hello, any hints?.

    2009-08-12 17:00:00 UTC in The Lemur Toolkit

  • harvestlinks problem

    Hi, I'm trying to use harvestlinks app with a trecweb collection of 1580 separated files. The command line I'm using is : harvestlinks -corpus=./docs_test -output=./hrv_test but i get this error: 0:06 Phase 4: Combining harvested links to final output... Error opening sorted link file './hrv_test/harvest/linkFile.sorted'...

    2009-08-07 18:51:00 UTC in The Lemur Toolkit

  • getting <meta data=...

    Hi, Does anybody know if its possible to get the html metadata (i.e. "<meta var="content">) from the API ? I know you can index metadata embbeded in the TREC format specifying it in the parameters xml file like this: <metadata><field>fieldname</field></metadata> and you can get it back using the API, but this only works when the...

    2009-07-14 18:50:19 UTC in The Lemur Toolkit

About Me

  • 2009-03-18 (10 months ago)
  • 2443735
  • rbanderas (My Site)
  • Raul Banderas

Send me a message