|
From: Michael S. <st...@ar...> - 2006-11-03 15:24:20
|
Shay Lawless wrote: > Hi, > > I am using nutchWax to index a series of ARC files created in a > webcrawl using the Heritrix crawler. > > My problem occurs when I perform a query on nutchWax and attempt to > view the results, nutch attempts to send me to the URL in question > rather than the archived content item. As a result I am getting an > error as the URL is not being correctly formed. Thats right. You need something to serve up the Archived content. Nutchwax has traditionally been paired with WERA: http://archive-access.sourceforge.net/projects/wera/. Check it out. We also need to make it so Nutchwax works using the opensource wayback machine. Its been reported recently that the bridge between the two is broken at the moment. It needs to be fixed. Yours, St.Ack > > Has anyone any experience with displaying content from an ARC content > archive rather than directly from the URL. Do I require an ARC-access > redisplay tool such as 'Wayback Machine' to achieve this. If so, can > anyone give advice on this or other similar tools for ARC redisplay? > > Any help would be greatly appreciated, thanks in advance > > Seamus > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |