Re: [Moosefs-users] Write starvation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Michal,

I need this to see if there is a way I can optimize the system to open 
more files per minute. At this stage our systems can open a few hundred 
files in parallel. I am not yet at the point where I can do thousands. 
What I think I am seeing is that the writes are starved because there 
are too many pending opens and most of them are for reading files. There 
seems to be a limit of around 2,400 opens per minute on the hardware I 
have and I am looking at what needs to be done to improve that. Based on 
your answer it sounds like the network traffic from the machine running 
mfsmount() to the master may be the biggest delay? Short of converting 
to 10 GB/s or trying to get all the servers on the same switch I don't 
know if there is much to be done about it?

Robert

On 7/5/11 3:15 AM, Michal Borychowski wrote:
> Hi Robert!
>
> Ad. 1. There is no limit in mfsmount itself, but there are some limits in the operating system. Generally speaking it is wise not to open more than several thousands files in parallel.
>
> Ad. 2. Fopen invokes open, and open invokes (through kernel and FUSE) functions mfs_lookup and mfs_open. Mfs_lookup function changes consequtive path elements into i-node number. While mfs_open makes the target file opening. It sends a packet to the master in order to receive information about possibility to keep the file in the cache. It also marks the file in the master as opened - in cases it is deleted, it is sustained to the moment of closing.
>
> BTW. Why do you need this?
>
>
> Kind regards
> Michał Borychowski
> MooseFS Support Manager
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> Gemius S.A.
> ul. Wołoska 7, 02-672 Warszawa
> Budynek MARS, klatka D
> Tel.: +4822 874-41-00
> Fax : +4822 874-41-01
>
>
>
> -----Original Message-----
> From: Robert Sandilands [mailto:rsa...@ne...]
> Sent: Saturday, July 02, 2011 2:54 AM
> To: moo...@li...
> Subject: Re: [Moosefs-users] Write starvation
>
> Based on some tests I think the limit in this case is the number of
> opens per minute. I think I need to understand what happens with an open
> before I can make guesses on what can be done to get the number higher.
>
> But then it still does not quite explain the write starvation except if
> the number of pending reads are just so much higher than the number of
> pending writes that it seems to starve the writes. Maybe this will
> resolve itself as I add more chunk servers.
>
> Some questions:
>
> 1. Is there a limit to the number of handles that client applications
> can open per mount, per chunk server, per disk?
> 2. What happens when an application does fopen() on a mount? Can
> somebody give a quick overview or do I have to read some code?
>
> Robert
>
> On 6/30/11 11:32 AM, Ricardo J. Barberis wrote:
>> El Miércoles 29 Junio 2011, Robert escribió:
>>> Yes, we use Centos, but installing and using the ktune package generally
>>> resolves most of the performance issues and differences I have seen with
>>> Ubuntu/Debian.
>> Nice to know about ktune and thank you for bringing it up, I'll take a look a
>> it.
>>
>>> I don't understand the comment on hitting metadata a lot? What is a lot?
>> A lot = reading / (re)writing / ls -l'ing / stat'ing too often.
>>
>> If the client can't cache the metadata but uses it often, that means it has to
>> query the master every time.
>>
>> Network latencies might also play a role in the performance degradation.
>>
>>> Why would it make a difference? All the metadata is in RAM anyway? The
>>> biggest limit to speed seems to be the number of IOPS that you can get out
>>> of your disks you have available to you. Looking up the metadata from RAM
>>> should be several orders of magnitude faster than that.
>> Yep, and you have plenty of RAM, so that shouldn't be an issue in your case.
>>
>>> The activity reported through the CGI interface on the master is around
>>> 2,400 opens per minute average. Reads and writes are also around 2400 per
>>> minute alternating with each other. mknod has some peaks around 2,800 per
>>> minute but is generally much lower. Lookup's are around 8,000 per minute
>>> and getattr is around 700 per minute. Chunk replication and deletion is
>>> around 50 per minute. The other numbers are generally very low.
>> Mmm, maybe 2 chunkservers are just too litle to handle that activity but I
>> would also check the network latencies.
>>
>> I'm also not really confident about having master and cunkserver on the same
>> server but I don't have any hard evidence to support my feelings ;)
>>
>>> Is there a guide/hints specific to MooseFS on what IO/Net/Process
>>> parameters would be good to investigate for mfsmaster?
>> I'd like to know that too!
>>
>> Cheers,
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>

Re: [Moosefs-users] Write starvation

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [Moosefs-users] Write starvation