From: Robert S. <rsa...@ne...> - 2011-08-11 03:11:57
|
These logs were from a machine that is only running mfsmount and Apache. Load is generally 10+ with I/O wait in the 40-90% range. It has 4 cores and 8 GB of RAM. It is in a DNS round-robin pool with 4 other similar machines. MooseFS is mounted in fstab using the following command: mfsmount /srv/mfs fuse mfsmaster=mfsmaster,mfsioretries=300,mfsattrcacheto=60,mfsdirentrycacheto=60,mfsentrycacheto=30,_netdev 0 0 Apache has sendfile disabled. The total amount of data transferred through the 5 mfsmounts is slightly more than 1 TB per day. It sounds impressive but it really is only around 13 MB/s. It is extremely rare for the same file to be downloaded twice in a day. Caching folders and their attributes is potentially useful. Caching files is not. mfsmaster runs on the one chunkserver. The second chunkserver is a dedicated chunkserver. The third chunkserver also runs mfsmetalogger. The second chunkserver only has 2.5 million of the 96 million chunks so it is not contributing much yet. On the master: The metadata is written on a SATA RAID1 volume. The chunks are stored on a storage array that is connected via SAS. The only activity on the SATA volume is the OS, metadata and local syslog logging. There is a second SAS array that is used to stage files for deduplication. Part of the deduplication process also moves it to the MooseFS volume. The server is a dual quad-core 2 GHz Xeon and the average load is generally less than 5. The deduplication uses a local mfsmount but is the only user of the mount. Here is the matching logs from the master: Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:39 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Robert On 8/10/11 11:56 AM, Elliot Finley wrote: > On Tue, Aug 9, 2011 at 6:46 PM, Robert Sandilands<rsa...@ne...> wrote: >> Increasing the swap space fixed the fork() issue. It seems that you have to >> ensure that memory available is always double the memory needed by >> mfsmaster. None of the swap space was used over the last 24 hours. >> >> This did solve the extreme comb-like behavior of mfsmaster. It still does >> not resolve its sensitivity to load on the server. I am still seeing >> timeouts on the chunkservers and mounts on the hour due to the high CPU and >> I/O load when the meta data is dumped to disk. It did however decrease >> significantly. >> >> An example from the logs: >> >> Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: ETIMEDOUT >> (Operation timed out) (1) >> Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read >> header: ETIMEDOUT (Operation timed out)) >> Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master > Are you using this server as a combination mfsmaster/chunkserver/mfsclient? > > If so, is the metadata being written to a spindle(s) that are separate > from what the chunkserver is using? > > How is this box laid out? > > Elliot |