From: Wang J. <jia...@re...> - 2012-03-01 17:08:07
|
于 2012/3/1 20:35, Steve Thompson 写道: > On Tue, 28 Feb 2012, Ricardo J. Barberis wrote: > >> This happened to me once and I also took down every server of the cluster, 9 >> chunkservers and one dedicated metalogger (previously, I unmounted all the >> clients, about 250). >> >> Bad idea: when the master came on-line again and I started one chunkserver, >> the master went "crazy" triyng to recreate empty chunks for later deletion. >> >> My "solution" was to start all the chunkservers at the same time, so the >> master saw all the chunks almost simultaneously and didn't try to create >> empty chunks. > It turns out that the master was very very slow because a RAID-5 > reconstruction was in progress on the box. The I/O performance dropped to > 5% of its normal value, in the spite of the RAID controller throttle being > set to 30% maximum reconstruction rate (it's a Dell PE2900 server with a > Perc 5 controller). Once the reconstruction finished, I restarted > everything (all chunk servers at the same time) and it came up fine within > a few seconds. > > Steve Use SSD and RAID10 whenever possible for "meta" servers nowadays, IO load during recovery should always be taken into consideration. |