Home / lemur / galago-3.4
Name Modified Size InfoDownloads / Week
Parent folder
README.txt 2013-06-21 2.8 kB
galago-3.4.tar.gz 2013-06-21 1.2 MB
galago-3.4-bin.tar.gz 2013-06-21 13.4 MB
Totals: 3 Items   14.6 MB 1
This Lemur project release brings Indri 5.5, Galago 3.4  and
RankLib 2.1 updates as well as the standalone Krovetz stemmer package,
implemented in both Java and C++ (numbered 3.4).

Applications compiled with the Indri API require the following
libraries: z, iberty, pthread, and m on linux. Applications built in
Visual Studio require the additional library wsock32.lib.  The java jar
files were built with Java 6 (jdk 1.6.0). The java UIs require Java
6. We have tested using GCC 4.1.2 (CentOS 5.3 linux), 4.3.3 (Ubuntu
10.04 linux), 4.2.1 (OS/X), and Visual Studio 2008 (Windows Vista, WIN32
and x86_64).

Indri additions include:

The ability to read gzip compressed anchor text files, as generated by
galago's harvestlinks.

Extend the pagerank application to read galago's pagerank output and
convert it to log probabilities suitable for use with makeprior.

Updated to index ClueWeb12 data.

Galago additions include:

Anchor text and link extraction, capable of writing indri readable
output for indri inlink indexing and pagerank computation.

PageRank generation from harvestlinks output. Can be converted to indri
log probability with use of indri's pagerank application.

Updated to index ClueWeb12 data.

RankLib additions include:

Support cross-validation, add linear regression, add linear normalizer,
add sparse representation of feature vector.

Bugs Fixed (see https://sourceforge.net/p/lemur/bugs/ for the complete
tickets): 

BUG# 

166 -- galago Tokenizing fields is not consistent

176 -- galago Incomplete Index warnings/exceptions

181 -- Galago UI hangs during indexing

199 -- galago Numeric fields with double values don't store correctly

200 -- galago #greater( field double) fails to evaluate

202 -- indri #wsyn operator can produce scores > 0

204 -- Indri indexes HTTP headers of empty documents (ClueWeb12)

205 -- Indri indexes <?xml tags in html documents

206 -- indri indexes content in html comments

207 -- Indri Retrieval UI fails to render html with meta http-equiv
attribute

208 -- galago Eval function fails for query ids < 100.

209 -- galago Passage retrieval doesn't identify windows correctly -
only windows occurring in the first passage of a document are identified
at retrieval time.

210 --  Indri truncates documents that contain ascii NUL characters

211 -- indri NestedListBeliefNode::hasMatch does not set smallestEnd in
all cases.

213 -- indri  #band(term stopword term) fails to match

Feature Requests (see https://sourceforge.net/p/lemur/feature-requests/
for the complete tickets): 

FR# 

66 -- Synchronize galago and indri Krovetz stemmers

67 -- indri Parameterize the injection of URL tokens into the document
text

68 -- indri Increase default memory parameter in IndriBuildIndex

69 -- disk lengths backwards compatibility removed.
Source: README.txt, updated 2013-06-21