From: Michael S. <sta...@us...> - 2005-09-27 23:39:46
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/conf In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28370/conf Modified Files: nutch-site.xml Log Message: * conf/nutch-site.xml Up the maxMergeDocs from 50 to a billion on Doug's recommendation (apparently responsible for slow indexing). Index: nutch-site.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/conf/nutch-site.xml,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** nutch-site.xml 17 Aug 2005 21:47:24 -0000 1.25 --- nutch-site.xml 27 Sep 2005 23:39:38 -0000 1.26 *************** *** 52,55 **** --- 52,57 ---- </property> + + <!-- For lucene indexes, normally. The default is 128. Write every 1024 entries rather than every 128, the default. *************** *** 65,68 **** --- 67,86 ---- </property> + <property> + <name>indexer.maxMergeDocs</name> + <value>1000000000</value> + <description>This number determines the maximum number of Lucene + Documents to be merged into a new Lucene segment. Larger values + increase indexing speed and reduce the number of Lucene segments, + which reduces the number of open file handles; however, this also + increases RAM usage during indexing. + + Doug says: "There was a bogus value for indexer.maxMergeDocs in + nutch-default.xml which made indexing really slow. The correct + value is something really big (like Integer.MAX_VALUE)." + </description> + </property> + + <!-- make summaries a little longer than the default --> <property> |