From: WK <wk...@bn...> - 2011-07-19 00:52:45
|
On 6/28/2011 4:14 AM, Ólafur Ósvaldsson wrote: > When reading the man page for mfsmaster.cfg I see the comments for > CHUNKS_LOOP_TIME and CHUNKS_DEL_LIMIT and my understanding is that > with the default values the maximum number of chunks to delete in one > loop (300 sek) is 100, it does not say if that is pr. chunkserver or > for the whole system, but each server here was around and over 5000 > chunk deletions pr. minute and with 10 servers thats over 50k chunk > deletions pr. minute for the whole system. > > CHUNKS_LOOP_TIME > Chunks loop frequency in seconds (default is 300) > > CHUNKS_DEL_LIMIT > Maximum number of chunks to delete in one loop (default > is 100) > We just got hit by this. We had a small 7 million file cluster that had chunkservers with only 1GB or RAM, all of sudden start doing 10K+ deletions a minute. That bogged down the entire cluster making it unusuable and even sent two chunkservers into swap. We were able to get the chunkservers shutdown and restarted the MFSMaster with the CHUNKS_DEL_LIMIT set to 50, after a settling down time, the deletion started again but this time at half the rate (around 5K deletions a minute), which was still excessive given what we have. So I can verify that change CHUNKS_DEL_LIMIT does have an affect (if you resetart the master), but that default is way to high, unless you are careful. In our case, we weren't paying attention and didn't realize the number of files was increasing past a reasonable point for those resources. We have since set it to 20 and have increased the RAM in the chunkservers. BTW, the sizing data in the FAQ for chunkservers, should be more explicit. It should say that you need about 150MB of RAM per chunk server for every Million chunks you get on chunk server (which is about what we are seeing), so the more chunkservers you have the less ram you need. Maybe the CGI could be used to query resources and warn about potential RAM resource issues. -bill |