Thread: [Archive-access-discuss] Nutchwax+Wera problems

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,
    I've been trying to get the pair up and running for a while now but=20
had some problems..
    Using nutchwax 0.4.3 and the wera (0.4.2RC1 & 0.5.0) I managed to=20
get it running but some of the related files (images)
aren't displayed. Those get:
<retrievermessage>
    <head>
        <errorcode>4</errorcode>
        <errormessage>Unable to parse Archive Identifier</errormessage>
    </head>
</retrievermessage>
    Using wera debug I found that the "[archiveidentifier] =3D>=20
2770/IAH-20060619172903-00000-webarchive1" for a specific search i made.
(Starting tomcat from the nutchwax indexed data)

    Using wayback I dont have the same problems(I dont use nutchwax with=20
wayback..).

    I've tried to get nutchwax 0.6.1 and wera running but the opensearch=20
servlet for the rss from nutchwax gives an exception..
    So i tried nutchwax 0.7.0 (with latest hadoop - standalone), but now=20
the arcretriever gives an exception when trying to get the document.
    Using wera debug I found that the "[archiveidentifier] =3D>=20
2234331/filedesc://IAH-20060619172903-00000-webarchive1.arc" for the=20
same search i made.
(Starting tomcat from anywhere)

The exception:
7  Bad function argument Cause: java.io.FileNotFoundException:=20
/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/filedesc=
:/IAH-20060619172903-00000-webarchive1.arc=20
does not exist. Stack trace:=20
org.archive.io.arc.ARCUtils.isReadable(ARCUtils.java:171)=20
org.archive.io.arc.ARCUtils.testCompressedARCFile(ARCUtils.java:94)=20
org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:200)=20
org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:194)=20
no.nb.nwa.retriever.ARCRetriever.getDocument(ARCRetriever.java:410)=20
no.nb.nwa.retriever.ARCRetriever.doGet(ARCRetriever.java:131)
....

Also using,
JDK 1.5.0_05
Tomcat 5.5.16
Heritrix 1.6.0

    I have tried to figure it out but i'm not having any luck.. I'm a=20
newbie with these tools so I appreciate all the help I can get in=20
getting the latest nutchwax+wera setting going.

Thanks in advance,
    Jo=E3o Luzio

Thread: [Archive-access-discuss] Nutchwax+Wera problems

archive-access-discuss