From: Thomas S H. <tha...@gm...> - 2011-03-21 15:27:53
|
Hi Pedro! The problem I am running into is time, and a resource problem. I am in the middle of a number of other projects and my test environment is currently in a state of "flux". I agree that this would be a great thing for moosefs to come packaged with, but it should be a complete package, with ucarp failover scripts wrapped up into a simple cluster management daemon. I have mentioned it before but I hope to have more time for this in a few weeks, but it keeps getting pushed off, I might not be able to get to it for over a month, it keeps getting pushed back. I know that there are many people interested in my moosefs failover, and it is a high priority. Any contributions would be appreciated, all the code is in place, I mostly just need to package it up. P.S. Since this is a list of system admins some of you might be interested in the project that has been requiring most of my time lately, it is called salt: https://github.com/thatch45/salt Salt is a remote execution platform, I am using it to replace func, but it allows for very fast communication to servers and beats the heck out of using ssh for loops, I think it would also be very useful for people deploying MooseFS, since often you want to get information from and execute commands on many of your systems at once. I also have a blog post about it here: http://red45.wordpress.com/2011/03/19/salt-0-6-0-released/ On Mon, Mar 21, 2011 at 9:08 AM, Pedro Naranjo <pe...@st...> wrote: > Dear Thomas, > > Your contribution is very valuable. May I suggest to the Moose FS > developers to include it in the general download of the system? I have also > become very concerned about loosing data. We spent 3 days moving 3TB+ of > data only to loose it all after stimulating a power failure. Granted we had > not deployed the Metaloggers yet but never the lest what ever we can use to > make sure the system as stable as possible is very important. > > Sincerely, > > > > Pedro Naranjo / STL Technologies / Solutions Architect / 888.556.0774 > > > On 3/21/2011 7:51 AM, Thomas S Hatch wrote: > > I have been hammering away at mfs failover for quite some time and I am > familiar with your problem. > What happens is that the mfsmetaloggers continue to stream updates from the > mfsmaster even after a failover, but the mfsmetarestore command executed on > the metadata on the new mfsmaster ends up creating a different "last change > point" that what the other metaloggers see. > This means that the mfsmetaloggers that did not become the new master have > a bad set of metadata after your initial failover. > Since I wanted to have a completely clean and automated failover in my > MooseFS deployment, I created a wrapper daemon that manages the > mfsmetalogger. This daemon should be run on all metaloggers and the > mfsmaster, it detects when a failover occurs and ensures that the > mfsmetalogger is running on the right nodes and that the metadata being used > is the correct metadata. > If you do want to use my mfsmetalogger manager it is available here: > > https://github.com/thatch45/mfs-failover/blob/master/daemon/metaman.py > > It is written in python3 (my deployments default to python3) but let me > know if you are interested in running it on python2 and I will make a > python2 version. > > I also have some ucarp scripts in that github project that can be used > for managing failover automatically in conjunction with metaman, but I have > not had the time and resources to finish packaging them up. > > Let me know if you have any questions! > > -Thomas S Hatch > > On Mon, Mar 21, 2011 at 5:10 AM, Boyko Yordanov <b.y...@ex...>wrote: > >> Hi list, >> >> I'm wondering how are you guys handling mfs master failover? >> >> In my tests mfsmetalogger seems quite unreliable - 2 days of testing >> showed a few cases when mfsmetarestore is unable to restore the metadata.mfs >> datafile - getting different errors like Data mismatch, version mismatch, >> hole in change files (add more files) etc. >> >> Running 3 different metadata backup loggers, master and chunk servers all >> running mfs-1.6.20-2 on centos 5.5 x86_64, filesystem type is ext3. >> >> I'm aware that some of you are running huge clusters with terabytes of >> data - I'm wondering how do you trust your mfsmaster and am I the only one >> concerned with eventual data loss on mfsmaster failover, when mfsmetarestore >> does not properly restore the metadata.mfs file from changelogs? >> >> Boyko >> >> ------------------------------------------------------------------------------ >> Colocation vs. Managed Hosting >> A question and answer guide to determining the best fit >> for your organization - today and in the future. >> http://p.sf.net/sfu/internap-sfd2d >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users >> > > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future.http://p.sf.net/sfu/internap-sfd2d > > > _______________________________________________ > moosefs-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |