|
From: Shay L. <sea...@gm...> - 2006-11-03 14:33:16
|
Hi, I am using nutchWax to index a series of ARC files created in a webcrawl using the Heritrix crawler. My problem occurs when I perform a query on nutchWax and attempt to view the results, nutch attempts to send me to the URL in question rather than the archived content item. As a result I am getting an error as the URL is not being correctly formed. Has anyone any experience with displaying content from an ARC content archive rather than directly from the URL. Do I require an ARC-access redisplay tool such as 'Wayback Machine' to achieve this. If so, can anyone give advice on this or other similar tools for ARC redisplay? Any help would be greatly appreciated, thanks in advance Seamus |