From: Thomas S H. <tha...@gm...> - 2011-03-21 14:51:31
|
I have been hammering away at mfs failover for quite some time and I am familiar with your problem. What happens is that the mfsmetaloggers continue to stream updates from the mfsmaster even after a failover, but the mfsmetarestore command executed on the metadata on the new mfsmaster ends up creating a different "last change point" that what the other metaloggers see. This means that the mfsmetaloggers that did not become the new master have a bad set of metadata after your initial failover. Since I wanted to have a completely clean and automated failover in my MooseFS deployment, I created a wrapper daemon that manages the mfsmetalogger. This daemon should be run on all metaloggers and the mfsmaster, it detects when a failover occurs and ensures that the mfsmetalogger is running on the right nodes and that the metadata being used is the correct metadata. If you do want to use my mfsmetalogger manager it is available here: https://github.com/thatch45/mfs-failover/blob/master/daemon/metaman.py It is written in python3 (my deployments default to python3) but let me know if you are interested in running it on python2 and I will make a python2 version. I also have some ucarp scripts in that github project that can be used for managing failover automatically in conjunction with metaman, but I have not had the time and resources to finish packaging them up. Let me know if you have any questions! -Thomas S Hatch On Mon, Mar 21, 2011 at 5:10 AM, Boyko Yordanov <b.y...@ex...>wrote: > Hi list, > > I'm wondering how are you guys handling mfs master failover? > > In my tests mfsmetalogger seems quite unreliable - 2 days of testing showed > a few cases when mfsmetarestore is unable to restore the metadata.mfs > datafile - getting different errors like Data mismatch, version mismatch, > hole in change files (add more files) etc. > > Running 3 different metadata backup loggers, master and chunk servers all > running mfs-1.6.20-2 on centos 5.5 x86_64, filesystem type is ext3. > > I'm aware that some of you are running huge clusters with terabytes of data > - I'm wondering how do you trust your mfsmaster and am I the only one > concerned with eventual data loss on mfsmaster failover, when mfsmetarestore > does not properly restore the metadata.mfs file from changelogs? > > Boyko > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |