From: Thomas S H. <tha...@gm...> - 2011-01-04 19:06:18
|
Wow, I am sorry, I forgot to get back to the list about this one. Ok, the devs have informed me that this has been fixed in the upstream code for the next release, but it was really my fault. I had set the chunkserver reconnect value to be too low in the mfschunkserver.cfg files, this was causing the chunkservers to come back too quickly was causing the mfsmaster to take up %100 cpu spinning it's wheels. So don't set: MASTER_TIMEOUT to anything below 10, and ten is cutting it close, I would stay above 30 unless your install is very small (fewer than 5 chunkservers) On Fri, Dec 31, 2010 at 12:13 AM, Laurent Wandrebeck <lw...@hy...> wrote: > On Thu, 30 Dec 2010 13:46:47 -0700 > Thomas S Hatch <tha...@gm...> wrote: > > > We are getting these kernel errors: > <snip> > page allocation failure looks like some malloc failed. > Check mfsmaster ram consumption. how much ram does your master box > have ? Are you in 32 or 64 bits mode ? > HTH, > -- > Laurent Wandrebeck > HYGEOS, Earth Observation Department / Observation de la Terre > Euratechnologies > 165 Avenue de Bretagne > 59000 Lille, France > tel: +33 3 20 08 24 98 > http://www.hygeos.com > GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C > D17C F64C > > > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, > and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |