From: Jakub Kruszona-Z. <jak...@ge...> - 2015-09-17 06:28:03
|
Do you have such entry in your syslog: "fork error (store data in foreground - it will block master for a while)" ? If yes, then this is the source of the problem with your master: Linux systems use several different algorithms of estimating how much memory a single process needs when it is created. One of these algorithms assumes that if we fork a process, it will need exactly the same amount of memory as it's parent. With a process taking 24GB of memory and total amount of 40GB (32GB physical plus 8GB virtual) and this algorithm, the forking would always be unsuccessful. But in reality, the fork commant does not copy the entire memory, only the modified fragments are copied as needed. Since the child process in MFS master only reads this memory and dumps it into a file, it is safe to assume not much of the memory content will change. Therefore such "careful" estimating algorithm is not needed. The solution is to switch the estimating algorithm the system uses. It can be done one-time by a root command: echo 1 > /proc/sys/vm/overcommit_memory To switch it permanently, so it stays this way even after the system is restarted, you need to put the following line into your "/etc/sysctl.conf" file: vm.overcommit_memory=1 On 16 Sep, 2015, at 17:16, bil...@16... wrote: > > Hi, > > I have MFS with a master server with 64GB RAM, and 20 chunkserver , about 750TB space. > But there are almost 180,000,000 files in MFS, the RAM usage is about 80%. > > in every hour, master server cpu usage up to 99%, and all the client can't connent with MFS. > I also get the message below, in /var/log/message > > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060956C4_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (0000000006095862_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (0000000006095880_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060958BB_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (0000000006095B97_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (0000000006095BA9_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060BFBB0_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060BFF7E_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060C00B4_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060CB78A_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060CBB4A_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060CC748_00000001), so create it for future deletion > Sep 16 19:08:59 mfsmaster01 mfsmaster[37325]: chunkserver has nonexistent chunk (00000000060CC8A3_00000001), so create it for future deletion This looks like a result of problem with forking - it shouldn't be dangerous. Just to be sure, that this is not a result of some unknown bug, we will try to reproduce this in our testing environment. > > How can I solove this problem? > > bil...@16... > ------------------------------------------------------------------------------ > Monitor Your Dynamic Infrastructure at Any Scale With Datadog! > Get real-time metrics from all of your servers, apps and tools > in one place. > SourceForge users - Click here to start your Free Trial of Datadog now! > http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140_________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users -- Regards, Jakub Kruszona-Zawadzki - - - - - - - - - - - - - - - - Segmentation fault (core dumped) Phone: +48 602 212 039 |