Thanks for the new nutchwax release 0.8.0
I haven't yet studied it deeper, only test-indexed one
collection. I had a problem with pdf files because a script
'parse-pdf' is missing. I can't find it in nutchwax-0.8.0/bin
Yes, I have xpdf installed in path but I guess this script
is needed to launch it?
Quote from logs =>
'External command /bin/bash ./bin/parse-pdf.sh failed with error:
/bin/bash: ./bin/parse-pdf.sh: No such file or directory..'
Otherwise, it's very useful to now have incremental indexing
and multiple collections in a single index.
Best,
Kaisa
---------- Forwarded message ----------
Date: Tue, 12 Dec 2006 17:45:20 -0800
From: Michael Stack <st...@ar...>
To: arc...@li...
Subject: [Archive-access-discuss] [ANN] nutchwax-0.8.0 released
This note is to announce release of NutchWAX 0.8.0. Its available for
download from sourceforge at
http://sourceforge.net/project/showfiles.php?group_id=118427&package_id=128933&release_id=470852.
NutchWAX 0.8.0 is built against Nutch 0.8.1, released 09/24/2006. A
version of this software was recently used to make an index of greater
than 400 million documents. See Release Notes
[http://archive-access.sourceforge.net/projects/nutch/articles/releasenotes.html]
for significant changes and fixes since NutchWAX 0.6.0. The site
documentation has also been significantly revised.
Yours,
Internet Archive Webteam
|