From: <wk...@bn...> - 2011-09-24 20:22:23
|
On 9/24/11 6:04 AM, Michał Borychowski wrote: > Hi! > > We discussed similiar problem with Ólafur in July as far as I remember. Yes, > we know it is not very optimal - if there are many files to be deleted > system tries to increase the limit so that they get deleted, but > unfortunately with realy huge number of files to be deleted, system stucks > up... We have some ideas for improvement, eg. first doing truncate (setting > to 0 bytes) and later do the real deleting. Also setting of this limit would > be possible on the fly with mastertools. > > > What creates the Deletion Problem is this code in chunks.c if (delnotdone > deldone && delnotdone > prevdelnotdone) { TmpMaxDelFrac *= 1.3; TmpMaxDel = TmpMaxDelFrac; syslog(LOG_NOTICE,"DEL_LIMIT temporary increased to: %u/s",TmpMaxDel); } This allows the deletion rate to be increased at very fast rate (30% every 5 minutes) and there is no hard LIMIT, so the deletion rate keeps on going up until the server is overwhelmed and the deletions are consuming all the resources. Unless you are running out of space on the server, deletion from the Trash is a very, very low priority process and should only be happening AFTER normal reads/writes and replications. So it not necessary to ramp up the deletion rate in most cases. As I previously indicated, we have tested (and now are running in production on 3 clusters) the following replacement code: // Local version 09-23-2011 if (delnotdone > deldone && delnotdone > prevdelnotdone) { if (TmpMaxDelFrac < (MaxDel*2)) { TmpMaxDelFrac *= 1.1; TmpMaxDel = TmpMaxDelFrac; syslog(LOG_NOTICE,"DEL_LIMIT temporary increased to: %u/s",TmpMaxDel); } } This minor change limits the Deletion rate to a HARD LIMIT of 2x the CHUNK_DEL_LIMIT and only increases it by 10% each 5 minutes when its in ramp up phase. This is working very well for us. We are no longer terrified about deleting large folders and we don't care it it takes 6-8 hours to clear the post-trashtime deletion queue instead of the cluster being unusable for 1-2 hours. If we were to find that the number of post-trashtime files were growing to an unreasonably large level, then we would raise the rate for a limited time (probably in the evening when nothing else is going on) and take the performance hit (or add chunkservers/better equipment). So we would like to see a HARD_DEL_LIMIT in mfsmaster.cfg (instead of just assuming 2x DEL_LIMIT as in our example) and the ability to change those settings on the fly as you mentioned. (ideally via a cron job, so we could automatically speed things up a bit in the evenings). Further down our todo list would be some logic that makes the deletion rate subject to the other activity, so if the cluster is otherwise not busy doing read/writes and replications (as in our at night scenario) then it could go ahead and speed things up and conversely if the server is really busy, then postpone the post-trashtime deletions completely. The truncate to 0 idea sounds interesting if there is an actual performance gain and it doesn't introduce complications, but if mfsmaster was more intelligent about 'when' and 'how quickly' it deletes files from the trash queue, its not really high on our wishlist. Finally, I'd like to thank the maintainers for MFS. Now that we have the deletion issue solved and we learned not to let the mfsmaster process exceed 50% of RAM, MFS is a huge improvement over our NFS/DRBD setups in regards to administration and even the ability to use somewhat older servers in the cluster, allowing us to save the state of the art kit for databases and VM's. We've even had a few incidents where equipment failed or we did something stupid and we were able to recover cleanly. The process was well documented, easy to follow and 'just worked'. -bill |