From: Gandalf C. <gan...@gm...> - 2018-05-31 11:09:36
|
May 31 13:00:07 cs01 mfsmaster[21698]: write error May 31 13:00:07 cs01 mfsmaster[21698]: can't write metadata May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[20898]: child finished May 31 13:00:15 cs01 mfsmaster[20898]: store process has finished - store time: 15.774 May 31 13:00:15 cs01 mfsmaster[20898]: metadata not stored !!! (child exited) - exiting May 31 13:00:15 cs01 mfsmaster[20898]: internal terminate request May 31 13:00:15 cs01 mfsmaster[20898]: state: transition FOLLOWER -> DUMMY ; changed 52033 seconds ago ; leaderip: 10.200.1.13 May 31 13:00:15 cs01 mfsmaster[20898]: state: DUMMY ; changed 0 seconds ago ; leaderip: 10.200.1.13 May 31 13:00:16 cs01 mfsmaster[20898]: exited from main loop May 31 13:00:16 cs01 mfsmaster[20898]: exititng ... May 31 13:00:16 cs01 mfsmaster[20898]: main master server module: closing *:9421 May 31 13:00:16 cs01 mfsmaster[20898]: master <-> chunkservers module: closing *:9420 May 31 13:00:16 cs01 mfsmaster[20898]: master control module: closing *:9419 May 31 13:00:23 cs01 mfsmaster[20898]: cleaning metadata ... May 31 13:00:24 cs01 mfsmaster[20898]: metadata have been cleaned May 31 13:00:24 cs01 mfsmaster[20898]: process exited successfully (status:0) May 31 13:00:25 cs01 mfsmaster[24058]: set gid to 111 May 31 13:00:25 cs01 mfsmaster[24058]: set uid to 107 May 31 13:00:25 cs01 mfsmaster[24058]: can't find process to terminate # df -h Filesystem Size Used Avail Use% Mounted on udev 7.9G 0 7.9G 0% /dev tmpfs 1.6G 9.1M 1.6G 1% /run /dev/sdd1 12G 8.2G 3.9G 69% / tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup chunk0 1.8T 239G 1.6T 14% /mnt/chunks/chunk0 chunk1 1.8T 243G 1.6T 14% /mnt/chunks/chunk1 tmpfs 1.6G 0 1.6G 0% /run/user/1000 maybe i'm running out of space, but I think that master process shound't crash # ls -la /var/lib/mfs/ total 7352160 drwxr-xr-x 2 mfs mfs 311 May 31 13:00 . drwxr-xr-x 25 root root 4096 May 31 00:13 .. -rw-r----- 1 mfs mfs 5 May 30 21:39 .mfschunkserver.lock -rw-r----- 1 mfs mfs 4870620149 May 31 13:00 changelog.1.mfs -rw-r----- 1 mfs mfs 90 May 30 18:30 changelog.12.mfs -rw-r----- 1 mfs mfs 16509535 May 30 22:20 changelog.2.mfs -rw-r----- 1 mfs mfs 1164 May 30 21:32 changelog.4.mfs -rw-r----- 1 mfs mfs 10 May 30 21:17 chunkserverid.mfs -rw-r----- 1 mfs mfs 4066348 May 31 13:00 csstats.mfs -rw-r----- 1 mfs mfs 144 May 31 13:00 metadata.crc -rw-r----- 1 mfs mfs 910498282 May 31 13:00 metadata.mfs -rw-r----- 1 mfs mfs 852231792 May 31 12:00 metadata.mfs.back.1 -rw-r----- 1 mfs mfs 870973440 May 31 13:00 metadata.mfs.emergency -rw-r--r-- 1 root root 8 May 17 09:28 metadata.mfs.empty -rw-r----- 1 mfs mfs 3672832 May 31 13:00 stats.mfs |
From: R.C. <mil...@gm...> - 2018-05-31 14:55:42
|
You expect that a production master is going to have less available space on disk than RAM? Messaggio originale Da: Gandalf Corvotempesta Inviato: giovedì 31 maggio 2018 13:09 A: moo...@li... Oggetto: [MooseFS-Users] [v4] master process crashed on metadata dump May 31 13:00:07 cs01 mfsmaster[21698]: write error May 31 13:00:07 cs01 mfsmaster[21698]: can't write metadata May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[21698]: write error May 31 13:00:15 cs01 mfsmaster[20898]: child finished May 31 13:00:15 cs01 mfsmaster[20898]: store process has finished - store time: 15.774 May 31 13:00:15 cs01 mfsmaster[20898]: metadata not stored !!! (child exited) - exiting May 31 13:00:15 cs01 mfsmaster[20898]: internal terminate request May 31 13:00:15 cs01 mfsmaster[20898]: state: transition FOLLOWER -> DUMMY ; changed 52033 seconds ago ; leaderip: 10.200.1.13 May 31 13:00:15 cs01 mfsmaster[20898]: state: DUMMY ; changed 0 seconds ago ; leaderip: 10.200.1.13 May 31 13:00:16 cs01 mfsmaster[20898]: exited from main loop May 31 13:00:16 cs01 mfsmaster[20898]: exititng ... May 31 13:00:16 cs01 mfsmaster[20898]: main master server module: closing *:9421 May 31 13:00:16 cs01 mfsmaster[20898]: master <-> chunkservers module: closing *:9420 May 31 13:00:16 cs01 mfsmaster[20898]: master control module: closing *:9419 May 31 13:00:23 cs01 mfsmaster[20898]: cleaning metadata ... May 31 13:00:24 cs01 mfsmaster[20898]: metadata have been cleaned May 31 13:00:24 cs01 mfsmaster[20898]: process exited successfully (status:0) May 31 13:00:25 cs01 mfsmaster[24058]: set gid to 111 May 31 13:00:25 cs01 mfsmaster[24058]: set uid to 107 May 31 13:00:25 cs01 mfsmaster[24058]: can't find process to terminate # df -h Filesystem Size Used Avail Use% Mounted on udev 7.9G 0 7.9G 0% /dev tmpfs 1.6G 9.1M 1.6G 1% /run /dev/sdd1 12G 8.2G 3.9G 69% / tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup chunk0 1.8T 239G 1.6T 14% /mnt/chunks/chunk0 chunk1 1.8T 243G 1.6T 14% /mnt/chunks/chunk1 tmpfs 1.6G 0 1.6G 0% /run/user/1000 maybe i'm running out of space, but I think that master process shound't crash # ls -la /var/lib/mfs/ total 7352160 drwxr-xr-x 2 mfs mfs 311 May 31 13:00 . drwxr-xr-x 25 root root 4096 May 31 00:13 .. -rw-r----- 1 mfs mfs 5 May 30 21:39 .mfschunkserver.lock -rw-r----- 1 mfs mfs 4870620149 May 31 13:00 changelog.1.mfs -rw-r----- 1 mfs mfs 90 May 30 18:30 changelog.12.mfs -rw-r----- 1 mfs mfs 16509535 May 30 22:20 changelog.2.mfs -rw-r----- 1 mfs mfs 1164 May 30 21:32 changelog.4.mfs -rw-r----- 1 mfs mfs 10 May 30 21:17 chunkserverid.mfs -rw-r----- 1 mfs mfs 4066348 May 31 13:00 csstats.mfs -rw-r----- 1 mfs mfs 144 May 31 13:00 metadata.crc -rw-r----- 1 mfs mfs 910498282 May 31 13:00 metadata.mfs -rw-r----- 1 mfs mfs 852231792 May 31 12:00 metadata.mfs.back.1 -rw-r----- 1 mfs mfs 870973440 May 31 13:00 metadata.mfs.emergency -rw-r--r-- 1 root root 8 May 17 09:28 metadata.mfs.empty -rw-r----- 1 mfs mfs 3672832 May 31 13:00 stats.mfs ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Gandalf C. <gan...@gm...> - 2018-05-31 15:06:42
|
Il giorno gio 31 mag 2018 alle ore 16:56 R.C. <mil...@gm...> ha scritto: > You expect that a production master is going to have less available space on disk than RAM? This is not an aswer for at least 5 reasons: 1) it's a test cluster 2) our metadata server is using 2.1GB 3) I have 4GB free on disk. 4) free disks is the same in each other server, only one is crashed 5) i'm expecting that, on a multimaster cluster, a bad metadata dump doesn't kill the instances: for 2 reasons: a) it's multimaster, metadata dump could be on another server c) a server failure (for whatever reason) could move the current leader on a different node, even without free space (metadata are kept in ram) So, why, with tons of free RAM available and a multi-master cluster, out-of-space during dump will kill the master process on a follower? I don't see any advantage in doing this. |