|
From: Michael S. <st...@du...> - 2007-09-27 22:47:42
|
What John says and then + The OOME exception stack trace might tell us something. + Is the OOME always in same place processing same record? If so, take a look at it in the ARC. St.Ack John H. Lee wrote: > Hi Ignacio. > > It would be helpful if you posted the following information: > - Are you using standalone or mapreduce? > - If mapreduce, what are your mapred.map.tasks and > mapred.reduce.tasks properties set to? > - If mapreduce, how many slaves do you have and how much memory do > they have? > - How many ARCs are you trying to index? > - Did the map reach 100% completion before the failure occurred? > > Some things you may want to try: > - Set both -Xmx and -Xmx to the maximum available on your systems > - Increase one or both of mapred.map.tasks and mapred.reduce.tasks, > depending where the failure occurred > - Break your job up into smaller chunks of say, 1000 or 5000 ARCs > > -J > > On Sep 27, 2007, at 10:47 AM, Ignacio Garcia wrote: > > >> Hello, >> >> I've been doing some testing with nutchwax and I have never had any >> major problems. >> However, right now I am trying to index a collection that is over >> 100 Gb big, and for some reason the indexing is crashing while it >> tries to populate 'crawldb' >> >> The job will run fine at the beginning importing the information >> from the ARCs and creating the "segments" section. >> >> The error I get is an outOfMemory error when the system is >> processing each of the part.xx in the segments previously created. >> >> I tried increasing the following setting on the hadoop-default.xml >> config file: mapred.child.java.opts to 1GB, but it still failed in >> the same part. >> >> Is there any way to reduce the amount of memory used by nutchwax/ >> hadoop to make the process more efficient and be able to index such >> a collection? >> >> Thank you. >> ---------------------------------------------------------------------- >> --- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Archive-access-discuss mailing list >> Arc...@li... >> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |