|
From: <jl...@ex...> - 2006-06-29 10:51:11
|
Oops.. forgot to say that the arcs where on=20 /var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/=20 directory but with .arc.gz instead of .arc. Jo=E3o Cl=E1udio Luzio wrote: > Hi, > I've been trying to get the pair up and running for a while now but= =20 > had some problems.. > Using nutchwax 0.4.3 and the wera (0.4.2RC1 & 0.5.0) I managed to=20 > get it running but some of the related files (images) > aren't displayed. Those get: > <retrievermessage> > <head> > <errorcode>4</errorcode> > <errormessage>Unable to parse Archive Identifier</errormessage> > </head> > </retrievermessage> > Using wera debug I found that the "[archiveidentifier] =3D>=20 > 2770/IAH-20060619172903-00000-webarchive1" for a specific search i made= . > (Starting tomcat from the nutchwax indexed data) > > Using wayback I dont have the same problems(I dont use nutchwax wit= h=20 > wayback..). > > I've tried to get nutchwax 0.6.1 and wera running but the opensearc= h=20 > servlet for the rss from nutchwax gives an exception.. > So i tried nutchwax 0.7.0 (with latest hadoop - standalone), but no= w=20 > the arcretriever gives an exception when trying to get the document. > Using wera debug I found that the "[archiveidentifier] =3D>=20 > 2234331/filedesc://IAH-20060619172903-00000-webarchive1.arc" for the=20 > same search i made. > (Starting tomcat from anywhere) > > The exception: > 7 Bad function argument Cause: java.io.FileNotFoundException:=20 > /var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/filede= sc:/IAH-20060619172903-00000-webarchive1.arc=20 > does not exist. Stack trace:=20 > org.archive.io.arc.ARCUtils.isReadable(ARCUtils.java:171)=20 > org.archive.io.arc.ARCUtils.testCompressedARCFile(ARCUtils.java:94)=20 > org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:200)=20 > org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:194)=20 > no.nb.nwa.retriever.ARCRetriever.getDocument(ARCRetriever.java:410)=20 > no.nb.nwa.retriever.ARCRetriever.doGet(ARCRetriever.java:131) > .... > > Also using, > JDK 1.5.0_05 > Tomcat 5.5.16 > Heritrix 1.6.0 > > I have tried to figure it out but i'm not having any luck.. I'm a=20 > newbie with these tools so I appreciate all the help I can get in=20 > getting the latest nutchwax+wera setting going. > > Thanks in advance, > Jo=E3o Luzio > > Using Tomcat but need to do more? Need to support web services, securit= y? > Get stuff done quickly with pre-integrated technology to make your job = easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geron= imo > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat= =3D121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > =20 |