Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.txt | 2013-06-21 | 2.8 kB | |
galago-3.4.tar.gz | 2013-06-21 | 1.2 MB | |
galago-3.4-bin.tar.gz | 2013-06-21 | 13.4 MB | |
Totals: 3 Items | 14.6 MB | 1 |
This Lemur project release brings Indri 5.5, Galago 3.4 and RankLib 2.1 updates as well as the standalone Krovetz stemmer package, implemented in both Java and C++ (numbered 3.4). Applications compiled with the Indri API require the following libraries: z, iberty, pthread, and m on linux. Applications built in Visual Studio require the additional library wsock32.lib. The java jar files were built with Java 6 (jdk 1.6.0). The java UIs require Java 6. We have tested using GCC 4.1.2 (CentOS 5.3 linux), 4.3.3 (Ubuntu 10.04 linux), 4.2.1 (OS/X), and Visual Studio 2008 (Windows Vista, WIN32 and x86_64). Indri additions include: The ability to read gzip compressed anchor text files, as generated by galago's harvestlinks. Extend the pagerank application to read galago's pagerank output and convert it to log probabilities suitable for use with makeprior. Updated to index ClueWeb12 data. Galago additions include: Anchor text and link extraction, capable of writing indri readable output for indri inlink indexing and pagerank computation. PageRank generation from harvestlinks output. Can be converted to indri log probability with use of indri's pagerank application. Updated to index ClueWeb12 data. RankLib additions include: Support cross-validation, add linear regression, add linear normalizer, add sparse representation of feature vector. Bugs Fixed (see https://sourceforge.net/p/lemur/bugs/ for the complete tickets): BUG# 166 -- galago Tokenizing fields is not consistent 176 -- galago Incomplete Index warnings/exceptions 181 -- Galago UI hangs during indexing 199 -- galago Numeric fields with double values don't store correctly 200 -- galago #greater( field double) fails to evaluate 202 -- indri #wsyn operator can produce scores > 0 204 -- Indri indexes HTTP headers of empty documents (ClueWeb12) 205 -- Indri indexes <?xml tags in html documents 206 -- indri indexes content in html comments 207 -- Indri Retrieval UI fails to render html with meta http-equiv attribute 208 -- galago Eval function fails for query ids < 100. 209 -- galago Passage retrieval doesn't identify windows correctly - only windows occurring in the first passage of a document are identified at retrieval time. 210 -- Indri truncates documents that contain ascii NUL characters 211 -- indri NestedListBeliefNode::hasMatch does not set smallestEnd in all cases. 213 -- indri #band(term stopword term) fails to match Feature Requests (see https://sourceforge.net/p/lemur/feature-requests/ for the complete tickets): FR# 66 -- Synchronize galago and indri Krovetz stemmers 67 -- indri Parameterize the injection of URL tokens into the document text 68 -- indri Increase default memory parameter in IndriBuildIndex 69 -- disk lengths backwards compatibility removed.