|
From: Lukas M. <lma...@gm...> - 2006-11-05 19:51:16
|
Dne p=E1tek 03 listopad 2006 15:33 Shay Lawless napsal(a): > Hi, > > I am using nutchWax to index a series of ARC files created in a webcrawl > using the Heritrix crawler. which version of NutchWax do you use? > > My problem occurs when I perform a query on nutchWax and attempt to view > the results, nutch attempts to send me to the URL in question rather than > the archived content item. As a result I am getting an error as the URL is > not being correctly formed. > > Has anyone any experience with displaying content from an ARC content > archive rather than directly from the URL. Do I require an ARC-access > redisplay tool such as 'Wayback Machine' to achieve this. If so, can anyo= ne > give advice on this or other similar tools for ARC redisplay? arcretriever, part of WERA (previous NWA), allows retrieving of ARCRecord=20 through offset and arcname. =20 > > Any help would be greatly appreciated, thanks in advance > > Seamus Lukas |