From: Robert S. <rsa...@ne...> - 2011-08-08 23:25:10
|
When I run a strace() on mfsmaster on the hour I get the following: rename("changelog.1.mfs", "changelog.2.mfs") = 0 rename("changelog.0.mfs", "changelog.1.mfs") = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b571b910b80) = -1 ENOMEM (Cannot allocate memory) rename("metadata.mfs.back", "metadata.mfs.back.tmp") = 0 open("metadata.mfs.back", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11 This indicates fork() is failing with an out of memory error. The system has 11 GB cached and 300 MB free. It only has 6 GB of swap. This indicates that clone() ( also known as fork() ) may be trying to test whether the whole process will fit into memory when cloned. Which implies that the memory requirement is actually double than what is commonly believed. I can probably increase swap to make it happy, but that has its own set of issues and is unlikely to solve much as it will be a similar situation if mfsmaster starts swapping. Although in theory mfsmaster should not start swapping as a very low percentage of the forked process will actually be different than the original one. I am testing my theory ;-) Robert On 8/8/11 6:36 PM, Robert Sandilands wrote: > Or I can log into the system on the hour and see if two processes > named mfsmaster exists. In my case it does not which may indicate that > fork() is failing. > > Running strace on the single instance of mfsmaster also indicates it > is busy writing to a file and I can see the the following files: > > -rw-r----- 1 daemon daemon 11G Aug 8 18:02 metadata.mfs.back > -rw-r----- 1 daemon daemon 11G Aug 8 17:02 metadata.mfs.back.tmp > > metadata.mfs.back.tmp was deleted several seconds later. > > iostat -x also indicates 100% utilization on the volume where the > meta-data is stored with a very high number of writes. > > This leaves me with: > > 1. Get a faster disk for doing the metadata backups on (SSD?) > 2. Figure out why fork() is failing > > mfsmaster is the only process using more than 5 GB of RAM on the > machine (32.6 GB). mfschunkserver uses 4.8 GB. No processes seems to > be locking any significant amount of memory. The number of processes > created per second < 1. The machine has 64 GB of RAM. > > Robert > > On 8/8/11 3:46 PM, Elliot Finley wrote: >> On Mon, Aug 8, 2011 at 1:33 PM, Elliot >> Finley<efi...@gm...> wrote: >>> Attached is a patch for filesystem.c that will indicate in your log >>> file whether or not the fork was successful. I'd be curious to see >>> the results. >> Sorry, that last patch has a small problem, attached is the correct one. >> >> Elliot > > |