|
From: Pope, J. <Jac...@bl...> - 2007-11-15 09:21:47
|
Hiya Brad, Erik's fix has partially solved the problem, now when I click on a search result in NutchWax, it appears correctly in Wayback, and when I enter the archival URL directly (either specific time or '*' ) it works too, however entering the same URL in the wayback search box and clicking 'Take me back' generates the following URL: http://194.66.226.116:8080/wayback/query?type=3Durlquery&url=3Dhttp%3A%2F= %2F whenthebelfastchildsinsagain.blogspot.com%2Fsearch%2Flabel%2Fbelfast&dat e=3D&Submit=3DTake+Me+Back And returns Resource not found in archive. Also, the number of instances (both in the timeline pane and the search results page in wayback) is 99, and yet only one is listed. 99 is the number I set ArchivalUrlRequestParser.maxRecords to, as part of Erik's fix. I've extracted my nutch index from HADOOP, and copied it to an NFS share, both nutchwax and wayback are pointing to the same index. I've also attached my wayback.xml. Coincidently, I changed the name of the property for the NutchResourceIndex from remotenutchindex (as shown on the documentation webpage) to resourceIndex, to fix a crash on loading tomcat.=20 Cheers, Jack Jackson Pope Technical Lead Web Archiving Team The British Library +44 (0)1937 54 6942 -----Original Message----- From: Brad Tofel [mailto:br...@ar...]=20 Sent: 13 November 2007 20:10 To: Pope, Jackson Subject: Re: [Archive-access-discuss] Wayback 1.0.1 and NutchWax Hi Jackson, Are you trying to use the Nutch index for the Wayback, or have you built a separate index for the Wayback? Functionality for using the remote Nutch resource Index may not be=20 working at the moment -- we found too many performance issues with this, and moved to having both. If you are using a separate wayback-specific Index, does the wayback=20 function independently of Nutch? Can you send on your wayback.xml file, and the link for some search=20 results from NutchWax? Brad Pope, Jackson wrote: > Hiya All, > > =20 > > I'm trying to get NutchWax working with Wayback 1.0.1. I've installed > wayback as ROOT, and have a single collection (8080:wayback). I've > created the index (which NutchWax is searching ok) and made the arc > files available. However when I click on a search result in NutchWax or > enter something in the search box in Wayback it fails. The returned URL > looks right, but I get the following error message: > > > Bad Query Exception > > > The request is missing information, or is not understood by this server. > {0} > > Has anyone experienced this? Any ideas what the cause might be? > > =20 > > Cheers, > > =20 > > Jack > > =20 > > Jackson Pope > > Technical Lead > > Web Archiving Team > > The British Library > > +44 (0)1937 54 6942 > > ************************************************************************ ** > =20 > Experience the British Library online at www.bl.uk > =20 > The British Library's new interactive Annual Report and Accounts 2006/07 : www.bl.uk/mylibrary > =20 > Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook > =20 > The Library's St Pancras site is WiFi - enabled > =20 > ************************************************************************ * > =20 > The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the pos...@bl... : The contents of this e-mail must not be disclosed or copied without the sender's consent.=20 > =20 > The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.=20 > =20 > ************************************************************************ * > > =20 > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > =20 |