|
From: Ignacio G. <igc...@gm...> - 2007-09-28 12:32:52
|
Michael, I do not know if it failed on the same record... the first time it failed I assumed that increasing the -Xmx parameters would solve it, since the OOME has happened before when indexing with Wayback. I will try to narrow it as much as I can if it fails again. On 9/27/07, Michael Stack <st...@du...> wrote: > > What John says and then > > + The OOME exception stack trace might tell us something. > + Is the OOME always in same place processing same record? If so, take > a look at it in the ARC. > > St.Ack > > John H. Lee wrote: > > Hi Ignacio. > > > > It would be helpful if you posted the following information: > > - Are you using standalone or mapreduce? > > - If mapreduce, what are your mapred.map.tasks and > > mapred.reduce.tasks properties set to? > > - If mapreduce, how many slaves do you have and how much memory do > > they have? > > - How many ARCs are you trying to index? > > - Did the map reach 100% completion before the failure occurred? > > > > Some things you may want to try: > > - Set both -Xmx and -Xmx to the maximum available on your systems > > - Increase one or both of mapred.map.tasks and mapred.reduce.tasks, > > depending where the failure occurred > > - Break your job up into smaller chunks of say, 1000 or 5000 ARCs > > > > -J > > > > On Sep 27, 2007, at 10:47 AM, Ignacio Garcia wrote: > > > > > >> Hello, > >> > >> I've been doing some testing with nutchwax and I have never had any > >> major problems. > >> However, right now I am trying to index a collection that is over > >> 100 Gb big, and for some reason the indexing is crashing while it > >> tries to populate 'crawldb' > >> > >> The job will run fine at the beginning importing the information > >> from the ARCs and creating the "segments" section. > >> > >> The error I get is an outOfMemory error when the system is > >> processing each of the part.xx in the segments previously created. > >> > >> I tried increasing the following setting on the hadoop-default.xml > >> config file: mapred.child.java.opts to 1GB, but it still failed in > >> the same part. > >> > >> Is there any way to reduce the amount of memory used by nutchwax/ > >> hadoop to make the process more efficient and be able to index such > >> a collection? > >> > >> Thank you. > >> ---------------------------------------------------------------------- > >> --- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2005. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Archive-access-discuss mailing list > >> Arc...@li... > >> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > >> > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > > |