|
From: Michael S. <st...@ar...> - 2006-12-20 20:01:47
|
Hey Kaisa. The script is built into the nutchwax.jar but its a bug that its not found when you run in standalone mode (I'm guessing this is what you're doing since if you run it distributed -- or even pseudo-distributed the parse-pdf.sh script is found). As a workaround, you can download the script from here: http://archive-access.cvs.sourceforge.net/*checkout*/archive-access/archive-access/projects/nutch/src/plugin/parse-waxext/bin/parse-pdf.sh?content-type=text%2Fplain <http://archive-access.cvs.sourceforge.net/*checkout*/archive-access/archive-access/projects/nutch/src/plugin/parse-waxext/bin/parse-pdf.sh?content-type=text%2Fplain> (or unjar the jar and get it from there) and put it where it can be found by the indexing job -- such as under a 'bin' directory in your current working directory (where the latter is wherever you launched the indexing from) -- or you can try running pseudo-distributed mode. I should fix this issue but lets have 0.8 stew for a bit and see if any other issues show up first before I spend time on a new release. Thanks Kaisa. St.Ack Kaisa Kaunonen wrote: > Thanks for the new nutchwax release 0.8.0 > > I haven't yet studied it deeper, only test-indexed one > collection. I had a problem with pdf files because a script > 'parse-pdf' is missing. I can't find it in nutchwax-0.8.0/bin > Yes, I have xpdf installed in path but I guess this script > is needed to launch it? > > Quote from logs => > 'External command /bin/bash ./bin/parse-pdf.sh failed with error: > /bin/bash: ./bin/parse-pdf.sh: No such file or directory..' > > Otherwise, it's very useful to now have incremental indexing > and multiple collections in a single index. > > Best, > Kaisa > > > ---------- Forwarded message ---------- > Date: Tue, 12 Dec 2006 17:45:20 -0800 > From: Michael Stack <st...@ar...> > To: arc...@li... > Subject: [Archive-access-discuss] [ANN] nutchwax-0.8.0 released > > This note is to announce release of NutchWAX 0.8.0. Its available for > download from sourceforge at > http://sourceforge.net/project/showfiles.php?group_id=118427&package_id=128933&release_id=470852. > NutchWAX 0.8.0 is built against Nutch 0.8.1, released 09/24/2006. A > version of this software was recently used to make an index of greater > than 400 million documents. See Release Notes > [http://archive-access.sourceforge.net/projects/nutch/articles/releasenotes.html] > for significant changes and fixes since NutchWAX 0.6.0. The site > documentation has also been significantly revised. > > Yours, > Internet Archive Webteam > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |