|
From: <st...@ar...> - 2005-10-25 00:09:54
|
>From: Lukas Matejka <mat...@ce...> >... > >I've just tested Nutchwax on single machine. >there are some parametres.. >documents: 2 222 660 >dups:477 234 >begin:13:37:13 CEST 13.10.2005 >end:03:25:12 CEST 16.10.2005 > > Whats that Lukas -- Almost 3 days to do 2 and a quarter million documents? That looks way slow. You're using default nutchwax config (With indexer.maxMergeDocs set to indexer.maxMergeDocs?). Are you doing NedlibToArc convertion at same time? What kinda machine is it? > >i fixed new version of NedlibToArc2.0 based on arc-1.5.1-200508191341.jar with >little changes. > >http://cvs.sourceforge.net/viewcvs.py/arcwayback/NedlibToArc2.0/ > > Good stuff. Did you take the original dk stuff and modify it? Do you want to move your nedlibtoarc here onto archive-access? Seems like good place for it. I can make you a subproject and make you admin. St.Ack |