From: Allen L. <lan...@gm...> - 2011-11-04 14:13:24
|
So I didn't plan ahead well and ended up with /var filling up on my master over night, causing the master to crash. mfsmetarestore refused to recover the system, I think because it didn't get a chance to write out the metadata file. It seems there's something wrong with the way it's doing writes. After the crash both the metadata.mfs and metadata.mfs.back were 0 bytes, and mfsmetarestore (obviously) refused to read from them. Some but not all of the changelog files were 0 bytes as well. Same story on the backup (metalogger) server. Just a heads up, I think a little more checking would be in order here to make sure there is space available for the metadata, and at least to prevent the master from crashing when/if it can't write the metadata. If it had stayed up with all the metadata in memory I could've seen the disk issue and brought up another metalogger with more disk space to catch up and take over. |
From: Steve <st...@bo...> - 2011-11-04 14:53:06
|
This happened to me too, on a mfsmaster running on a small silicon drive. Lost the lot. A powercut causing a mfschunkserver failure creating extra logging. Looks like a repeatable problem that needs addressing. -------Original Message------- From: Allen Landsidel Date: 04/11/2011 14:37:09 To: moo...@li... Subject: [Moosefs-users] Out of disk space on master / recovery failed So I didn't plan ahead well and ended up with /var filling up on my master over night, causing the master to crash. mfsmetarestore refused to recover the system, I think because it didn't get a chance to write out the metadata file. It seems there's something wrong with the way it's doing writes. After the crash both the metadata.mfs and metadata.mfs.back were 0 bytes, and mfsmetarestore (obviously) refused to read from them. Some but not all of the changelog files were 0 bytes as well. Same story on the backup (metalogger) server. Just a heads up, I think a little more checking would be in order here to make sure there is space available for the metadata, and at least to prevent the master from crashing when/if it can't write the metadata. If it had stayed up with all the metadata in memory I could've seen the disk issue and brought up another metalogger with more disk space to catch up and take over. ----------------------------------------------------------------------------- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Allen L. <lan...@gm...> - 2011-11-04 15:37:55
|
My chunk servers are on different machines, I actually have a setup that's supposed to be pretty resilient. 4 Chunkservers, 1 master, 1 metalogger, and 1 app server running mfsmount -- all separate machines (VMs). I too lost all the data. The chunks survived, but since the metadata was corrupt, all the files were lost and the chunks marked for recycling (goal 0). I should have paid more attention to the default drive the master was using for metadata, but a full-on crash just due to disk space exhaustion on the master seems excessive. On 11/4/2011 10:52 AM, Steve wrote: > This happened to me too, on a mfsmaster running on a small silicon drive. > Lost the lot. A powercut causing a mfschunkserver failure creating extra > logging. > > Looks like a repeatable problem that needs addressing. > > > > > > > > > > > > > > -------Original Message------- > > > > From: Allen Landsidel > > Date: 04/11/2011 14:37:09 > > To: moo...@li... > > Subject: [Moosefs-users] Out of disk space on master / recovery failed > > > > So I didn't plan ahead well and ended up with /var filling up on my > > master over night, causing the master to crash. > > > > mfsmetarestore refused to recover the system, I think because it didn't > > get a chance to write out the metadata file. It seems there's something > > wrong with the way it's doing writes. After the crash both the > > metadata.mfs and metadata.mfs.back were 0 bytes, and mfsmetarestore > > (obviously) refused to read from them. > > > > Some but not all of the changelog files were 0 bytes as well. Same > > story on the backup (metalogger) server. > > > > Just a heads up, I think a little more checking would be in order here > > to make sure there is space available for the metadata, and at least to > > prevent the master from crashing when/if it can't write the metadata. > > If it had stayed up with all the metadata in memory I could've seen the > > disk issue and brought up another metalogger with more disk space to > > catch up and take over. > > > > > > > > ----------------------------------------------------------------------------- > > > RSA(R) Conference 2012 > > Save $700 by Nov 18 > > Register now > > http://p.sf.net/sfu/rsa-sfdev2dev1 > > _______________________________________________ > > moosefs-users mailing list > > moo...@li... > > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: Michał B. <mic...@ge...> - 2011-11-05 07:03:34
|
Hi! Please check these locations for metadata file: /metadata.mfs.emergency /tmp/metadata.mfs.emergency /var/metadata.mfs.emergency /usr/metadata.mfs.emergency /usr/share/metadata.mfs.emergency /usr/local/metadata.mfs.emergency /usr/local/var/metadata.mfs.emergency /usr/local/share/metadata.mfs.emergency Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 -----Original Message----- From: Allen Landsidel [mailto:lan...@gm...] Sent: Friday, November 04, 2011 3:13 PM To: moo...@li... Subject: [Moosefs-users] Out of disk space on master / recovery failed So I didn't plan ahead well and ended up with /var filling up on my master over night, causing the master to crash. mfsmetarestore refused to recover the system, I think because it didn't get a chance to write out the metadata file. It seems there's something wrong with the way it's doing writes. After the crash both the metadata.mfs and metadata.mfs.back were 0 bytes, and mfsmetarestore (obviously) refused to read from them. Some but not all of the changelog files were 0 bytes as well. Same story on the backup (metalogger) server. Just a heads up, I think a little more checking would be in order here to make sure there is space available for the metadata, and at least to prevent the master from crashing when/if it can't write the metadata. If it had stayed up with all the metadata in memory I could've seen the disk issue and brought up another metalogger with more disk space to catch up and take over. ---------------------------------------------------------------------------- -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |