Dne p=E1tek 03 listopad 2006 15:33 Shay Lawless napsal(a):
> Hi,
>
> I am using nutchWax to index a series of ARC files created in a webcrawl
> using the Heritrix crawler.
which version of NutchWax do you use?
>
> My problem occurs when I perform a query on nutchWax and attempt to view
> the results, nutch attempts to send me to the URL in question rather than
> the archived content item. As a result I am getting an error as the URL is
> not being correctly formed.
>
> Has anyone any experience with displaying content from an ARC content
> archive rather than directly from the URL. Do I require an ARC-access
> redisplay tool such as 'Wayback Machine' to achieve this. If so, can anyo=
ne
> give advice on this or other similar tools for ARC redisplay?
arcretriever, part of WERA (previous NWA), allows retrieving of ARCRecord=20
through offset and arcname. =20
>
> Any help would be greatly appreciated, thanks in advance
>
> Seamus
Lukas
|