Menu

MIREX / News: Recent posts

MIREX 0.3 for ClueWeb12

We released a new version 0.3 for TREC Web track participants that work on the new ClueWeb12 dataset.

Posted by Djoerd Hiemstra 2013-07-01

MIREX for Hadoop 1.0

We rewrote the code such that uses the new Hadoop API. This code will be released later, but for people that cannot wait: It is already available from the SVN repositiory.

Posted by Djoerd Hiemstra 2012-02-02

MIREX at TREC 2010

This draft report presents preliminary results for the TREC 2010 ad-hoc web search task. We ran our MIREX system on 0.5 billion web documents from the ClueWeb09 crawl. On average, the system retrieves at least 3 relevant documents on the first result page containing 10 results, using a simple index consisting of anchor texts, page titles, and spam removal.

http://www.cs.utwente.nl/~hiemstra/papers/trec19-draft.pdf

Posted by Djoerd Hiemstra 2010-10-28

MIREX presented at CLEF 2010

MIREX was presented at the CLEF 2010 Conference on Multilingual and Multimodal Information Access Evaluation that took place on 20-23 September 2010 in Padua Italy, see: http://www.clef2010.org/

Djoerd Hiemstra and Claudia Hauff. MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data". In: Multilingual and Multimodal Information Access Evaluation. Lecture Notes in Computer Science 6360. Springer Verlag. pages 64-69, September 2010.
http://eprints.eemcs.utwente.nl/18469

Posted by Djoerd Hiemstra 2010-09-28

New release: MIREX 0.2

We released a version 0.2 that supports several standard information retrieval models, such as language models with linear interpolation smoothing, language models with Dirichlet smoothing, and Okapi's BM25.

Posted by Djoerd Hiemstra 2010-06-23

Anchor text for ClueWeb09 Category A

We’ve put anchor text for the English Category A documents of the TREC CLueWeb09 collection on line at:

* http://pathfinder.cs.utwente.nl/cgi-bin/opensearch/mirex-anchors.txt.gz

The file contains anchor text for about 87% of the pages in Category A. The text is cut after more than 10MB of anchors have been collected for one page to keep the file manageable. The size is about 21 GB (gzipped). The file is a tab-separated text file consisting of (TREC-ID, URL, ANCHOR TEXT) The anchor text extraction is described in (please cite the report if you use the data in your research):... read more

Posted by Djoerd Hiemstra 2010-04-28

MIREX project created

We created a MIREX project on SourceForge (yes, that's here). Watch us for a first official release of the MIREX software.

Posted by Djoerd Hiemstra 2010-04-14