|
From: stack <st...@du...> - 2008-02-08 16:51:13
|
Pope, Jackson wrote: > > Hiya all, > > I’ve created a lot of nutchwax indices deployed the segments and index > for each to the search directory, and got nutchwax/wayback to search > these successfully. > > However, when I try to add more than 40 I hit the ‘too many open > files’ problem I mentioned before. Several people have suggested > upping the ‘ulimit’ to 32678, but I’ve already got it set to 1024, so > upping it to 32768 would theoretically allow me to create 30 x 40 > indices, still an order of magnitude smaller than I need. > Regards 1024, do the math. Each index is made of, say, 20 files (Do a listing of an index to figure for sure). 40 * 20 = 800 not counting the other files the application needs to open (jar files, configuration files, etc.). As you can see, 1024 probably ain't enough. Searching many indices is slower than searching a single index. Thats another reason to do merging. > > Next step I’ve tried is index merging. > > I’ve run the IndexMerger over some of my indices successfully, but > when I replace the indexes directory (which contains the individual > indices) with the new index, nutchwax stops working. It tells me that > it’s found some hits for my search term, but it doesn’t list them, and > wayback claims the index is unavailable. What else do I need to do to > deploy a merged index? > Any exceptions in tomcat log? Or looking at the logging, is it looking in right place for index? You might need to add an empty index.done to the merged index if its not there already (See end of this FAQ: http://archive-access.sourceforge.net/projects/nutchwax/faq.html#incremental) -- but I'm fuzzy on this stuff so that might not be it. St.Ack > Cheers, > > Jack > > Jackson Pope > > Technical Lead > > Web Archiving Team > > The British Library > > +44 (0)1937 54 6942 > > ************************************************************************** > Experience the British Library online at www.bl.uk <http://www.bl.uk/> > The British Library’s new interactive Annual Report and Accounts > 2006/07 : www.bl.uk/mylibrary <http://www.bl.uk/mylibrary> > Help the British Library conserve the world's knowledge. Adopt a Book. > www.bl.uk/adoptabook <http://www.bl.uk/adoptabook> > The Library's St Pancras site is WiFi - enabled > ************************************************************************* > The information contained in this e-mail is confidential and may be > legally privileged. It is intended for the addressee(s) only. If you > are not the intended recipient, please delete this e-mail and notify > the pos...@bl... <mailto:pos...@bl...> : The contents of this > e-mail must not be disclosed or copied without the sender's consent. > The statements and opinions expressed in this message are those of the > author and do not necessarily reflect those of the British Library. > The British Library does not take any responsibility for the views of > the author. > ************************************************************************* > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |