I have the same problem searchin any document (gif,html...)as JCL using=20
this versions of Wera and Nutchwax.
, and Wayback works fine.
I tried to change arc path in documentDispatcher but it doesn't work.
Natalia
arc...@li... wrote:
> Send Archive-access-discuss mailing list submissions to
> arc...@li...
>=20
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
> or, via email, send a message with subject or body 'help' to
> arc...@li...
>=20
> You can reach the person managing the list at
> arc...@li...
>=20
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Archive-access-discuss digest..."
>=20
>=20
> Today's Topics:
>=20
> 1. Re: Nutchwax+Wera problems (Michael Stack)
>=20
>=20
> ----------------------------------------------------------------------
>=20
> Message: 1
> Date: Mon, 03 Jul 2006 16:03:27 -0700
> From: Michael Stack <st...@ar...>
> Subject: Re: [Archive-access-discuss] Nutchwax+Wera problems
> To: Jo?o Cl?udio Luzio <jl...@ex...>
> Cc: arc...@li...
> Message-ID: <44A...@ar...>
> Content-Type: text/plain; charset=3DISO-8859-1; format=3Dflowed
>=20
> Jo?o Cl?udio Luzio wrote:
>=20
>>Oops.. forgot to say that the arcs where on=20
>>/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/=20
>>directory but with .arc.gz instead of .arc.
>>
>> =20
>=20
> This should be fine.
>=20
>=20
>>Jo?o Cl?udio Luzio wrote:
>> =20
>>
>>>Hi,
>>> I've been trying to get the pair up and running for a while now bu=
t=20
>>>had some problems..
>>> Using nutchwax 0.4.3 and the wera (0.4.2RC1 & 0.5.0) I managed to=20
>>>get it running but some of the related files (images)
>>>aren't displayed. Those get:
>>><retrievermessage>
>>> <head>
>>> <errorcode>4</errorcode>
>>> <errormessage>Unable to parse Archive Identifier</errormessage=
>
>>> </head>
>>></retrievermessage>
>>> Using wera debug I found that the "[archiveidentifier] =3D>=20
>>>2770/IAH-20060619172903-00000-webarchive1" for a specific search i mad=
e.
>>>(Starting tomcat from the nutchwax indexed data)
>>> =20
>=20
>=20
> So, it generally works but some of the images don't show sometimes?
>=20
>=20
>>> Using wayback I dont have the same problems(I dont use nutchwax wi=
th=20
>>>wayback..).
>>>
>>> I've tried to get nutchwax 0.6.1 and wera running but the opensear=
ch=20
>>>servlet for the rss from nutchwax gives an exception..
>>> =20
>=20
>=20
> Do you still have the exception?
>=20
>=20
>=20
>>> So i tried nutchwax 0.7.0 (with latest hadoop - standalone), but n=
ow=20
>>>the arcretriever gives an exception when trying to get the document.
>>> Using wera debug I found that the "[archiveidentifier] =3D>=20
>>>2234331/filedesc://IAH-20060619172903-00000-webarchive1.arc" for the=20
>>>same search i made.
>>>(Starting tomcat from anywhere)
>>>
>>>The exception:
>>>7 Bad function argument Cause: java.io.FileNotFoundException:=20
>>>/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/filed=
esc:/IAH-20060619172903-00000-webarchive1.arc=20
>>>does not exist. Stack trace:=20
>>>org.archive.io.arc.ARCUtils.isReadable(ARCUtils.java:171)=20
>>>org.archive.io.arc.ARCUtils.testCompressedARCFile(ARCUtils.java:94)=20
>>>org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:200)=20
>>>org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:194)=20
>>>no.nb.nwa.retriever.ARCRetriever.getDocument(ARCRetriever.java:410)=20
>>>no.nb.nwa.retriever.ARCRetriever.doGet(ARCRetriever.java:131)
>>> =20
>=20
>=20
> Looks like we shouldn't be putting the 'filedesc:' on front of ARC=20
> name? Does ARCRetreiver work if you make a request with=20
> IAH-20060619172903-00000-webarchive1.arc instead of=20
> filedesc:/IAH-20060619172903-00000-webarchive1.arc?
>=20
> St.Ack
>=20
>=20
>=20
> ------------------------------
>=20
> Using Tomcat but need to do more? Need to support web services, securit=
y?
> Get stuff done quickly with pre-integrated technology to make your job =
easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geron=
imo
> http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=
=3D121642
>=20
> ------------------------------
>=20
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>=20
>=20
> End of Archive-access-discuss Digest, Vol 2, Issue 2
> ****************************************************
>=20
>=20
--=20
......................................................................
__
/ / Natalia Torres
C E / S / C A Dept. de Sistemes
/_/ Centre de Supercomputaci=C3=B3 de Catalunya
Gran Capit=C3=A0, 2-4 (Edifici Nexus) =E2=80=A2 08034 Barcelona
T. 93 205 6464 =E2=80=A2 F. 93 205 6979 =E2=80=A2 nt...@ce...
......................................................................
|