|
From: Ignacio G. <igc...@gm...> - 2007-09-27 17:47:30
|
Hello, I've been doing some testing with nutchwax and I have never had any major problems. However, right now I am trying to index a collection that is over 100 Gb big, and for some reason the indexing is crashing while it tries to populate 'crawldb' The job will run fine at the beginning importing the information from the ARCs and creating the "segments" section. The error I get is an outOfMemory error when the system is processing each of the part.xx in the segments previously created. I tried increasing the following setting on the hadoop-default.xml config file: mapred.child.java.opts to 1GB, but it still failed in the same part. Is there any way to reduce the amount of memory used by nutchwax/hadoop to make the process more efficient and be able to index such a collection? Thank you. |