|
From: Natalia T. <nt...@ce...> - 2006-07-05 12:10:28
|
I have the same problem searchin any document (gif,html...)as JCL using=20 this versions of Wera and Nutchwax. , and Wayback works fine. I tried to change arc path in documentDispatcher but it doesn't work. Natalia arc...@li... wrote: > Send Archive-access-discuss mailing list submissions to > arc...@li... >=20 > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > or, via email, send a message with subject or body 'help' to > arc...@li... >=20 > You can reach the person managing the list at > arc...@li... >=20 > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Archive-access-discuss digest..." >=20 >=20 > Today's Topics: >=20 > 1. Re: Nutchwax+Wera problems (Michael Stack) >=20 >=20 > ---------------------------------------------------------------------- >=20 > Message: 1 > Date: Mon, 03 Jul 2006 16:03:27 -0700 > From: Michael Stack <st...@ar...> > Subject: Re: [Archive-access-discuss] Nutchwax+Wera problems > To: Jo?o Cl?udio Luzio <jl...@ex...> > Cc: arc...@li... > Message-ID: <44A...@ar...> > Content-Type: text/plain; charset=3DISO-8859-1; format=3Dflowed >=20 > Jo?o Cl?udio Luzio wrote: >=20 >>Oops.. forgot to say that the arcs where on=20 >>/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/=20 >>directory but with .arc.gz instead of .arc. >> >> =20 >=20 > This should be fine. >=20 >=20 >>Jo?o Cl?udio Luzio wrote: >> =20 >> >>>Hi, >>> I've been trying to get the pair up and running for a while now bu= t=20 >>>had some problems.. >>> Using nutchwax 0.4.3 and the wera (0.4.2RC1 & 0.5.0) I managed to=20 >>>get it running but some of the related files (images) >>>aren't displayed. Those get: >>><retrievermessage> >>> <head> >>> <errorcode>4</errorcode> >>> <errormessage>Unable to parse Archive Identifier</errormessage= > >>> </head> >>></retrievermessage> >>> Using wera debug I found that the "[archiveidentifier] =3D>=20 >>>2770/IAH-20060619172903-00000-webarchive1" for a specific search i mad= e. >>>(Starting tomcat from the nutchwax indexed data) >>> =20 >=20 >=20 > So, it generally works but some of the images don't show sometimes? >=20 >=20 >>> Using wayback I dont have the same problems(I dont use nutchwax wi= th=20 >>>wayback..). >>> >>> I've tried to get nutchwax 0.6.1 and wera running but the opensear= ch=20 >>>servlet for the rss from nutchwax gives an exception.. >>> =20 >=20 >=20 > Do you still have the exception? >=20 >=20 >=20 >>> So i tried nutchwax 0.7.0 (with latest hadoop - standalone), but n= ow=20 >>>the arcretriever gives an exception when trying to get the document. >>> Using wera debug I found that the "[archiveidentifier] =3D>=20 >>>2234331/filedesc://IAH-20060619172903-00000-webarchive1.arc" for the=20 >>>same search i made. >>>(Starting tomcat from anywhere) >>> >>>The exception: >>>7 Bad function argument Cause: java.io.FileNotFoundException:=20 >>>/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/filed= esc:/IAH-20060619172903-00000-webarchive1.arc=20 >>>does not exist. Stack trace:=20 >>>org.archive.io.arc.ARCUtils.isReadable(ARCUtils.java:171)=20 >>>org.archive.io.arc.ARCUtils.testCompressedARCFile(ARCUtils.java:94)=20 >>>org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:200)=20 >>>org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:194)=20 >>>no.nb.nwa.retriever.ARCRetriever.getDocument(ARCRetriever.java:410)=20 >>>no.nb.nwa.retriever.ARCRetriever.doGet(ARCRetriever.java:131) >>> =20 >=20 >=20 > Looks like we shouldn't be putting the 'filedesc:' on front of ARC=20 > name? Does ARCRetreiver work if you make a request with=20 > IAH-20060619172903-00000-webarchive1.arc instead of=20 > filedesc:/IAH-20060619172903-00000-webarchive1.arc? >=20 > St.Ack >=20 >=20 >=20 > ------------------------------ >=20 > Using Tomcat but need to do more? Need to support web services, securit= y? > Get stuff done quickly with pre-integrated technology to make your job = easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geron= imo > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat= =3D121642 >=20 > ------------------------------ >=20 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 >=20 > End of Archive-access-discuss Digest, Vol 2, Issue 2 > **************************************************** >=20 >=20 --=20 ...................................................................... __ / / Natalia Torres C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci=C3=B3 de Catalunya Gran Capit=C3=A0, 2-4 (Edifici Nexus) =E2=80=A2 08034 Barcelona T. 93 205 6464 =E2=80=A2 F. 93 205 6979 =E2=80=A2 nt...@ce... ...................................................................... |