Re: [Moosefs-users] mfsmaster performance and hardware

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

These logs were from a machine that is only running mfsmount and Apache. 
Load is generally 10+ with I/O wait in the 40-90% range. It has 4 cores 
and 8 GB of RAM. It is in a DNS round-robin pool with 4 other similar 
machines. MooseFS is mounted in fstab using the following command:

mfsmount    /srv/mfs    fuse 
mfsmaster=mfsmaster,mfsioretries=300,mfsattrcacheto=60,mfsdirentrycacheto=60,mfsentrycacheto=30,_netdev 
0 0

Apache has sendfile disabled. The total amount of data transferred 
through the 5 mfsmounts is slightly more than 1 TB per day. It sounds 
impressive but it really is only around 13 MB/s. It is extremely rare 
for the same file to be downloaded twice in a day. Caching folders and 
their attributes is potentially useful. Caching files is not.

mfsmaster runs on the one chunkserver. The second chunkserver is a 
dedicated chunkserver. The third chunkserver also runs mfsmetalogger. 
The second chunkserver only has 2.5 million of the 96 million chunks so 
it is not contributing much yet.

On the master:

The metadata is written on a SATA RAID1 volume. The chunks are stored on 
a storage array that is connected via SAS. The only activity on the SATA 
volume is the OS, metadata and local syslog logging. There is a second 
SAS array that is used to stage files for deduplication. Part of the 
deduplication process also moves it to the MooseFS volume. The server is 
a dual quad-core 2 GHz Xeon and the average load is generally less than 
5. The deduplication uses a local mfsmount but is the only user of the 
mount.

Here is the matching logs from the master:

Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.65) has been closed by peer
Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.102) has been closed by peer
Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.14) has been closed by peer
Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.14) has been closed by peer
Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.102) has been closed by peer
Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.65) has been closed by peer
Aug 10 22:03:39 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.14) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.102) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.65) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.14) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.102) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.65) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.14) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.102) has been closed by peer
Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with 
client(ip:xxx.xxx.xxx.65) has been closed by peer

Robert

On 8/10/11 11:56 AM, Elliot Finley wrote:
> On Tue, Aug 9, 2011 at 6:46 PM, Robert Sandilands<rsa...@ne...>  wrote:
>> Increasing the swap space fixed the fork() issue. It seems that you have to
>> ensure that memory available is always double the memory needed by
>> mfsmaster. None of the swap space was used over the last 24 hours.
>>
>> This did solve the extreme comb-like behavior of mfsmaster. It still does
>> not resolve its sensitivity to load on the server. I am still seeing
>> timeouts on the chunkservers and mounts on the hour due to the high CPU and
>> I/O load when the meta data is dumped to disk. It did however decrease
>> significantly.
>>
>> An example from the logs:
>>
>> Aug  9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: ETIMEDOUT
>> (Operation timed out) (1)
>> Aug  9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read
>> header: ETIMEDOUT (Operation timed out))
>> Aug  9 04:03:41 http-lb-1 mfsmount[13288]: registered to master
> Are you using this server as a combination mfsmaster/chunkserver/mfsclient?
>
> If so, is the metadata being written to a spindle(s) that are separate
> from what the chunkserver is using?
>
> How is this box laid out?
>
> Elliot

Re: [Moosefs-users] mfsmaster performance and hardware

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [Moosefs-users] mfsmaster performance and hardware