Re: [Moosefs-users] how do you handle mfsmaster failover?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Pedro!
The problem I am running into is time, and a resource problem. I am in the
middle of a number of other projects and my test environment is currently in
a state of "flux".
I agree that this would be a great thing for moosefs to come packaged with,
but it should be a complete package, with ucarp failover scripts wrapped up
into a simple cluster management daemon.
I have mentioned it before but I hope to have more time for this in a few
weeks, but it keeps getting pushed off, I might not be able to get to it for
over a month, it keeps getting pushed back.
I know that there are many people interested in my moosefs failover, and it
is a high priority. Any contributions would be appreciated, all the code is
in place, I mostly just need to package it up.

P.S.
Since this is a list of system admins some of you might be interested in the
project that has been requiring most of my time lately, it is called salt:
https://github.com/thatch45/salt
Salt is a remote execution platform, I am using it to replace func, but it
allows for very fast communication to servers and beats the heck out of
using ssh for loops, I think it would also be very useful for people
deploying MooseFS, since often you want to get information from and execute
commands on many of your systems at once. I also have a blog post about it
here:
http://red45.wordpress.com/2011/03/19/salt-0-6-0-released/

On Mon, Mar 21, 2011 at 9:08 AM, Pedro Naranjo <pe...@st...> wrote:

>  Dear Thomas,
>
> Your contribution is very valuable. May I suggest to the Moose FS
> developers to include it in the general download of the system? I have also
> become very concerned about loosing data. We spent 3 days moving 3TB+ of
> data only to loose it all after stimulating a power failure. Granted we had
> not deployed the Metaloggers yet but never the lest what ever we can use to
> make sure the system as stable as possible is very important.
>
> Sincerely,
>
>
>
> Pedro Naranjo / STL Technologies / Solutions Architect / 888.556.0774
>
>
> On 3/21/2011 7:51 AM, Thomas S Hatch wrote:
>
> I have been hammering away at mfs failover for quite some time and I am
> familiar with your problem.
> What happens is that the mfsmetaloggers continue to stream updates from the
> mfsmaster even after a failover, but the mfsmetarestore command executed on
> the metadata on the new mfsmaster ends up creating a different "last change
> point" that what the other metaloggers see.
> This means that the mfsmetaloggers that did not become the new master have
> a bad set of metadata after your initial failover.
> Since I wanted to have a completely clean and automated failover in my
> MooseFS deployment, I created a wrapper daemon that manages the
> mfsmetalogger. This daemon should be run on all metaloggers and the
> mfsmaster, it detects when a failover occurs and ensures that the
> mfsmetalogger is running on the right nodes and that the metadata being used
> is the correct metadata.
> If you do want to use my mfsmetalogger manager it is available here:
>
>  https://github.com/thatch45/mfs-failover/blob/master/daemon/metaman.py
>
>  It is written in python3 (my deployments default to python3) but let me
> know if you are interested in running it on python2 and I will make a
> python2 version.
>
>  I also have some ucarp scripts in that github project that can be used
> for managing failover automatically in conjunction with metaman, but I have
> not had the time and resources to finish packaging them up.
>
>  Let me know if you have any questions!
>
>  -Thomas S Hatch
>
> On Mon, Mar 21, 2011 at 5:10 AM, Boyko Yordanov <b.y...@ex...>wrote:
>
>> Hi list,
>>
>> I'm wondering how are you guys handling mfs master failover?
>>
>> In my tests mfsmetalogger seems quite unreliable - 2 days of testing
>> showed a few cases when mfsmetarestore is unable to restore the metadata.mfs
>> datafile - getting different errors like Data mismatch, version mismatch,
>> hole in change files (add more files) etc.
>>
>> Running 3 different metadata backup loggers, master and chunk servers all
>> running mfs-1.6.20-2 on centos 5.5 x86_64, filesystem type is ext3.
>>
>> I'm aware that some of you are running huge clusters with terabytes of
>> data - I'm wondering how do you trust your mfsmaster and am I the only one
>> concerned with eventual data loss on mfsmaster failover, when mfsmetarestore
>> does not properly restore the metadata.mfs file from changelogs?
>>
>> Boyko
>>
>> ------------------------------------------------------------------------------
>> Colocation vs. Managed Hosting
>> A question and answer guide to determining the best fit
>> for your organization - today and in the future.
>> http://p.sf.net/sfu/internap-sfd2d
>> _______________________________________________
>> moosefs-users mailing list
>> moo...@li...
>> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>>
>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.http://p.sf.net/sfu/internap-sfd2d
>
>
> _______________________________________________
> moosefs-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>

Re: [Moosefs-users] how do you handle mfsmaster failover?

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [Moosefs-users] how do you handle mfsmaster failover?