From: Steve T. <sm...@cb...> - 2012-03-01 12:35:23
|
On Tue, 28 Feb 2012, Ricardo J. Barberis wrote: > This happened to me once and I also took down every server of the cluster, 9 > chunkservers and one dedicated metalogger (previously, I unmounted all the > clients, about 250). > > Bad idea: when the master came on-line again and I started one chunkserver, > the master went "crazy" triyng to recreate empty chunks for later deletion. > > My "solution" was to start all the chunkservers at the same time, so the > master saw all the chunks almost simultaneously and didn't try to create > empty chunks. It turns out that the master was very very slow because a RAID-5 reconstruction was in progress on the box. The I/O performance dropped to 5% of its normal value, in the spite of the RAID controller throttle being set to 30% maximum reconstruction rate (it's a Dell PE2900 server with a Perc 5 controller). Once the reconstruction finished, I restarted everything (all chunk servers at the same time) and it came up fine within a few seconds. Steve -- ---------------------------------------------------------------------------- Steve Thompson, Cornell School of Chemical and Biomolecular Engineering smt AT cbe DOT cornell DOT edu "186,282 miles per second: it's not just a good idea, it's the law" ---------------------------------------------------------------------------- |