Re: [Moosefs-users] Why is it increasing my DEL_LIMIT when I don't want it to!

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 9/24/11 6:04 AM, Michał Borychowski wrote:
> Hi!
>
> We discussed similiar problem with Ólafur in July as far as I remember. Yes,
> we know it is not very optimal - if there are many files to be deleted
> system tries to increase the limit so that they get deleted, but
> unfortunately with realy huge number of files to be deleted, system stucks
> up... We have some ideas for improvement, eg. first doing truncate (setting
> to 0 bytes) and later do the real deleting. Also setting of this limit would
> be possible on the fly with mastertools.
>
>
>

What creates the Deletion Problem is this code in chunks.c

if (delnotdone > deldone && delnotdone > prevdelnotdone) {
     TmpMaxDelFrac *= 1.3;
     TmpMaxDel = TmpMaxDelFrac;
     syslog(LOG_NOTICE,"DEL_LIMIT temporary increased to: %u/s",TmpMaxDel);
}

This allows the deletion rate to be increased at very fast rate (30% 
every 5 minutes) and there is no hard LIMIT, so the deletion rate keeps 
on going up until the server is overwhelmed and the deletions are 
consuming all the resources.

Unless you are running out of space on the server, deletion from the 
Trash is a very, very low priority process and should only be happening 
AFTER normal reads/writes and replications. So it not necessary to ramp 
up the deletion rate in most cases.

As I previously indicated, we have tested (and now are running in 
production on 3 clusters) the following replacement code:

// Local version 09-23-2011
if (delnotdone > deldone && delnotdone > prevdelnotdone) {
     if (TmpMaxDelFrac < (MaxDel*2)) {
     TmpMaxDelFrac *= 1.1;
     TmpMaxDel = TmpMaxDelFrac;
     syslog(LOG_NOTICE,"DEL_LIMIT temporary increased to: %u/s",TmpMaxDel);
     }
}

This minor change limits the Deletion rate to a HARD LIMIT of 2x the 
CHUNK_DEL_LIMIT and only increases it by 10% each 5 minutes when its in 
ramp up phase.

This is working very well for us. We are no longer terrified about 
deleting large folders and we don't care it it takes 6-8 hours to clear 
the post-trashtime deletion queue instead of the cluster being unusable 
for 1-2 hours.

If we were to find that the number of post-trashtime files were growing 
to an unreasonably large level, then we would raise the rate for a 
limited time (probably in the evening when nothing else is going on) and 
take the performance hit (or add chunkservers/better equipment).

So we would like to see a HARD_DEL_LIMIT in mfsmaster.cfg (instead of 
just assuming 2x DEL_LIMIT as in our example) and the ability to change 
those settings on the fly as you mentioned. (ideally via a cron job, so 
we could automatically speed things up a bit in the evenings).

Further down our todo list would be some logic that makes the deletion 
rate subject to the other activity, so if the cluster is otherwise not 
busy doing read/writes and replications (as in our at night scenario) 
then it could go ahead and speed things up and conversely if the server 
is really busy, then postpone the post-trashtime deletions completely.

The truncate to 0 idea sounds interesting if there is an actual 
performance gain and it doesn't introduce complications, but if 
mfsmaster was more intelligent about 'when' and 'how quickly' it deletes 
files from the trash queue, its not really high on our wishlist.

Finally, I'd like to thank the maintainers for MFS.

Now that we have the deletion issue solved and we learned not to let the 
mfsmaster process exceed 50% of RAM, MFS is a huge improvement over our 
NFS/DRBD setups in regards to administration and even the ability to use 
somewhat older servers in the cluster, allowing us to save the state of 
the art kit for databases and VM's.

We've even had a few incidents where equipment failed or we did 
something stupid and we were able to recover cleanly. The process was 
well documented, easy to follow and 'just worked'.

-bill

Re: [Moosefs-users] Why is it increasing my DEL_LIMIT when I don't want it to!

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [Moosefs-users] Why is it increasing my DEL_LIMIT when I don't want it to!