Update of /cvsroot/archive-access/archive-access/projects/nutch/conf
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28370/conf
Modified Files:
nutch-site.xml
Log Message:
* conf/nutch-site.xml
Up the maxMergeDocs from 50 to a billion on Doug's recommendation
(apparently responsible for slow indexing).
Index: nutch-site.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/conf/nutch-site.xml,v
retrieving revision 1.25
retrieving revision 1.26
diff -C2 -d -r1.25 -r1.26
*** nutch-site.xml 17 Aug 2005 21:47:24 -0000 1.25
--- nutch-site.xml 27 Sep 2005 23:39:38 -0000 1.26
***************
*** 52,55 ****
--- 52,57 ----
</property>
+
+
<!-- For lucene indexes, normally. The default is 128.
Write every 1024 entries rather than every 128, the default.
***************
*** 65,68 ****
--- 67,86 ----
</property>
+ <property>
+ <name>indexer.maxMergeDocs</name>
+ <value>1000000000</value>
+ <description>This number determines the maximum number of Lucene
+ Documents to be merged into a new Lucene segment. Larger values
+ increase indexing speed and reduce the number of Lucene segments,
+ which reduces the number of open file handles; however, this also
+ increases RAM usage during indexing.
+
+ Doug says: "There was a bogus value for indexer.maxMergeDocs in
+ nutch-default.xml which made indexing really slow. The correct
+ value is something really big (like Integer.MAX_VALUE)."
+ </description>
+ </property>
+
+
<!-- make summaries a little longer than the default -->
<property>
|