|
From: John H. L. <jl...@ar...> - 2008-02-05 20:29:34
|
Hi Miguel. To use distributed search, you need to plan ahead a bit and generate multiple indices. I don't know of a way to partition an existing large index into smaller chunks. For example, if you're indexing 100,000 ARCs and want to deploy on 10 machines, you should split your list of ARCs into 10 chunks of 10,000, invoke ImportArcs for each chunk, and invoke NutchwaxIndexer for each chunk. This will produce 10 segment/index pairs, each of which could be deployed on one of your 10 machines. For large jobs, I usually split the ARCs into groups of 1000. This produces segment/index pairs that are small enough to be manageable and flexible when it comes to deployment layout. Hope this helps. -J On Feb 5, 2008, at 5:12 AM, Miguel Costa wrote: > Hi to all, > > After reading the nutchwax + nutch documentation I can index ARC > files and search them using the nutchwax + wayback machine. > However, I would like to perform a distributed search but I don't > find any documentation on how to partition the index in n parts/ > segments for n machines. > On the other hand there is information explaining how to distribute > search using the search-servers.txt file, but I need to partition > the index first. > Can anyone explain me or give me a clue on how to partition an index > for n machines? > > Regards, > > Miguel Costa > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |