Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.txt | 2012-12-21 | 3.1 kB | |
galago-3.3.tar.gz | 2012-12-21 | 1.0 MB | |
galago-3.3-bin.tar.gz | 2012-12-21 | 12.9 MB | |
Totals: 3 Items | 13.9 MB | 0 |
This Lemur project release brings Indri 5.4, Galago 3.3 updates and the first release of RankLib (see https://sourceforge.net/p/lemur/wiki/RankLib/), a learning to rank system. Applications compiled with the Indri API require the following libraries: z, iberty, pthread, and m on linux. Applications built in Visual Studio require the additional library wsock32.lib. The java jar files were built with Java 6 (jdk 1.6.0). The java UIs require Java 6. We have tested using GCC 4.1.2 (CentOS 5.3 linux), 4.3.3 (Ubuntu 10.04 linux), 4.2.1 (OS/X), and Visual Studio 2008 (Windows Vista, WIN32 and x86_64). Galago additions include: 1. Learning module A learning module has been introduced into Galago 3.3. It allows automated parameter tuning for a wide variety of retrieval models. Currently, it supports coordinate ascent and a brute-force grid search. The module is extensible, and can perform cross-fold learning. 2. Direct phrase indexes Galago 3.3 allows the direct indexing of phrase features. These types of indexes can greatly improve retrieval efficiency for retrieval models that require these types of features. Two mechanisms are included to reduce space requirements for this type of index. Frequent indexing retains information about phrases features that occur more than threshold times in the collection. Sketch indexes use a set of functions to compress the vocabulary space of large phrase indexes. 3. Cache A caching system is built into the retrieval module. This cache is able to store intersected posting lists. This feature is most useful when performing parameter optimization. In this setting, a small number of queries are executed many times, use of a cache of intersected posting lists can dramatically reduce repeated computation. 4. Processing Model Support This new abstraction allows different forms of query evaluation to test for retrieval-time efficiency. Implemented models include MaxScore and WAND algorithms, as well as two-stage phrase evaluation. 5. Inline positional skips Position lists longer than length 2 can be skipped without decompressing the block, allowing for faster seeking to a particular position list in a given posting list. Bugs Fixed (see https://sourceforge.net/p/lemur/bugs/ for the complete tickets): BUG# 198 collection length overflow 197 DataPoint parsing text error in its constructor 175 Parameters - Null pointer exceptions 183 Out of Memory Errors in parsing do not result in failure 185 Searcher not thread safe 190 Null exceptions when building an empty index 191 dump functions fail for empty index files 192 ExtentRestrictionNode leaks 193 handleSearch: java.lang.NullPointerException 194 running in drmaa mode -> ExecutionException 196 QueryEnvironment::~QueryEnvironment calls close() producing double delete on memory Feature Requests (see https://sourceforge.net/p/lemur/feature-requests/ for the complete tickets): FR# 42 WarcRecord.java should allow access to WARC-Date et al 48 Parameter tuning 49 Cached Retrieval 51 Lengths Iterators 54 Extent Attribute Indexes 63 Universal Parser 64 parallel app execution