|
From: <jl...@ex...> - 2006-06-28 17:29:18
|
Hi,
I've been trying to get the pair up and running for a while now but=20
had some problems..
Using nutchwax 0.4.3 and the wera (0.4.2RC1 & 0.5.0) I managed to=20
get it running but some of the related files (images)
aren't displayed. Those get:
<retrievermessage>
<head>
<errorcode>4</errorcode>
<errormessage>Unable to parse Archive Identifier</errormessage>
</head>
</retrievermessage>
Using wera debug I found that the "[archiveidentifier] =3D>=20
2770/IAH-20060619172903-00000-webarchive1" for a specific search i made.
(Starting tomcat from the nutchwax indexed data)
Using wayback I dont have the same problems(I dont use nutchwax with=20
wayback..).
I've tried to get nutchwax 0.6.1 and wera running but the opensearch=20
servlet for the rss from nutchwax gives an exception..
So i tried nutchwax 0.7.0 (with latest hadoop - standalone), but now=20
the arcretriever gives an exception when trying to get the document.
Using wera debug I found that the "[archiveidentifier] =3D>=20
2234331/filedesc://IAH-20060619172903-00000-webarchive1.arc" for the=20
same search i made.
(Starting tomcat from anywhere)
The exception:
7 Bad function argument Cause: java.io.FileNotFoundException:=20
/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/filedesc=
:/IAH-20060619172903-00000-webarchive1.arc=20
does not exist. Stack trace:=20
org.archive.io.arc.ARCUtils.isReadable(ARCUtils.java:171)=20
org.archive.io.arc.ARCUtils.testCompressedARCFile(ARCUtils.java:94)=20
org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:200)=20
org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:194)=20
no.nb.nwa.retriever.ARCRetriever.getDocument(ARCRetriever.java:410)=20
no.nb.nwa.retriever.ARCRetriever.doGet(ARCRetriever.java:131)
....
Also using,
JDK 1.5.0_05
Tomcat 5.5.16
Heritrix 1.6.0
I have tried to figure it out but i'm not having any luck.. I'm a=20
newbie with these tools so I appreciate all the help I can get in=20
getting the latest nutchwax+wera setting going.
Thanks in advance,
Jo=E3o Luzio
|