Home / lemur / galago-3.3
Name Modified Size InfoDownloads / Week
Parent folder
README.txt 2012-12-21 3.1 kB
galago-3.3.tar.gz 2012-12-21 1.0 MB
galago-3.3-bin.tar.gz 2012-12-21 12.9 MB
Totals: 3 Items   13.9 MB 0
This Lemur project release brings Indri 5.4, Galago 3.3 updates and the first release of RankLib (see https://sourceforge.net/p/lemur/wiki/RankLib/), a learning to rank system.

Applications compiled with the Indri API require the following
libraries: z, iberty, pthread, and m on linux. Applications built in
Visual Studio require the additional library wsock32.lib.  The java jar
files were built with Java 6 (jdk 1.6.0). The java UIs require Java
6. We have tested using GCC 4.1.2 (CentOS 5.3 linux), 4.3.3 (Ubuntu
10.04 linux), 4.2.1 (OS/X), and Visual Studio 2008 (Windows Vista, WIN32
and x86_64).

Galago additions include:


1. Learning module
A learning module has been introduced into Galago 3.3. It allows automated parameter tuning for a wide variety of retrieval models. Currently, it supports coordinate ascent and a brute-force grid search. The module is extensible, and can perform cross-fold learning. 

2. Direct phrase indexes

Galago 3.3 allows the direct indexing of phrase features. These types of indexes can greatly improve retrieval efficiency for retrieval models that require these types of features. Two mechanisms are included to reduce space requirements for this type of index. Frequent indexing retains information about phrases features that occur more than threshold times in the collection. Sketch indexes use a set of functions to compress the vocabulary space of large phrase indexes.

3. Cache

A caching system is built into the retrieval module. This cache is able to store intersected posting lists. This feature is most useful when performing parameter optimization. In this setting, a small number of queries are executed many times, use of a cache of intersected posting lists can dramatically reduce repeated computation.

4. Processing Model Support
This new abstraction allows different forms of query evaluation to test for retrieval-time efficiency. Implemented models include MaxScore and WAND algorithms, as well as two-stage phrase evaluation.

5. Inline positional skips
Position lists longer than length 2 can be skipped without decompressing the block, allowing for faster seeking to a particular position list in a given posting list.

Bugs Fixed (see https://sourceforge.net/p/lemur/bugs/ for the complete tickets):

BUG# 

198	collection length overflow
197	DataPoint parsing text error in its constructor
175	Parameters - Null pointer exceptions
183	Out of Memory Errors in parsing do not result in failure
185	Searcher not thread safe
190	Null exceptions when building an empty index 
191	dump functions fail for empty index files	 
192	ExtentRestrictionNode leaks	 
193	handleSearch: java.lang.NullPointerException
194	running in drmaa mode -> ExecutionException 
196	QueryEnvironment::~QueryEnvironment calls close() producing double delete on memory

Feature Requests (see https://sourceforge.net/p/lemur/feature-requests/ for the complete tickets):

FR# 
42	WarcRecord.java should allow access to WARC-Date et al
48	Parameter tuning	 
49	Cached Retrieval 
51	Lengths Iterators 
54	Extent Attribute Indexes	 
63	Universal Parser 
64	parallel app execution
Source: README.txt, updated 2012-12-21