Re: [MooseFS-Users] MooseFS upgrade plan 1.6 -> 3.0

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello Casper,

there are some aspects I would like to pay your attention to:

We do not recommend and support ucarp failover solution - neither officially, nor unofficially.
It may lead to split-brain and/or rollback metadata to a version which is not the latest one.
And in consequence - to data loss.

We had such scenario in one of our client’s environment.

Please be aware, that version 3.0.x you want to upgrade to is not a stable version yet.
The 3.0 version is not recommended for production environments yet.

We support upgrade only from version 1.6.27-5, not 1.6.25, as mentioned in our document about the upgrade
("MooseFS Upgrade Guide”, you can find it here: https://moosefs.com/documentation/moosefs-2-0.html <https://moosefs.com/documentation/moosefs-2-0.html>).

I’d like to talk/chat with you e.g. via Skype about whole upgrade scenario, because here I mentioned only the most important aspects.
Would it be possible? Could you please send me in priv. message your Skype ID?

Best regards,

-- 
Piotr Robert Konopelko
MooseFS Technical Support Engineer | moosefs.com <https://moosefs.com/>
> On 12 Aug 2015, at 10:28 pm, Casper Langemeijer <ca...@la...> wrote:
> 
> Hi All,
> 
> I'm planning for the upgrade of our MooseFS system. I've waited much too 
> long as we're still at 1.6.25.
> 
> One of the main reasons that we've waited so long is because we do not 
> like downtime. To minimize downtime I've come up with the 9 step plan 
> below, and if you have the time I would really appreciate if you can 
> have a look at it. I want to know it this makes any sense. Some of you 
> might even learn from some of the clever tricks I pull. Instead of an 
> in-place upgrade of the servers I plan to create a new master, copy the 
> master data to a new master and do a ucarp failover.
> 
> The other reason I like this plan is that right after the failover I can 
> test the system and if I see something I do not like I can rollback to 
> the previous master. Only step 6 is the point of no return.
> 
> 
> I have two assumptions that I'm unsure about.
> 
> 1. mfsmaster version 3.0.39 can read the metadata.mfs written by 1.6.25 
> mfsmetarestore -a
> 2. 1.6.25 metaloggers, chunkservers and clients automatically reconnect 
> correctly to a mfsmaster 'failover' with new master version 3.0.39 
> without problems.
> 
> If these steps all work, the filesystem is almost fully available for 
> reading during the upgrade.
> The exceptions to these are of a very small window (subsecond) to 
> remount in step 2 and 6, and a bigger window where all the clients and 
> chunkservers need to detect master failure and reconnect to the new 
> upgraded master. I expect this to take a few seconds.
> 
> The filesystem is closed for writes from step 2 to 6. On my system I 
> roughly estimate this (step 3 and 4) to take about 5-10 minutes. RAM 
> used by my 1.6.25 mfsmaster is 11GiB, I run on commodity server hardware.
> 
> 
> What do you think of this plan?
> 
> Step 1:
> Install a new MooseFS master server, latest version (as time of writing 
> 3.0.39)
> with cgi server too! ucarp config same as current primary master. (ucarp 
> failover will make this new server the new mfsmaster when current master 
> is shut down)
> 
> Step 2:
> On all clients, change mfsclient into readonly mode.
> 
> This is done by doing a lazy umount, immediately followed by a readonly 
> mount. This means that any open files will keep using the read-write 
> mount, all new files are opened throught the read-only mount. When all 
> opened files on the old read-write mount have been closed, the 
> read-write mfsmount process terminates. We monitor it with 'ps x | grep 
> rw,mfsmaste[r]' and we wait until it has terminated.
> 
> /usr/bin/lsof /mnt/mfs; /bin/umount -l /mnt/mfs; /usr/bin/mfsmount 
> /mnt/mfs -o ro,mfsmaster=10.1.7.1,mfsport=9421,dev,suid
> echo -n "Waiting for rw mfsmount to stop: "; while ps auxf | grep 
> rw,mfsmaste[r] > /dev/null; do echo -n "."; sleep 1; done; echo " done";
> 
> If this takes too long we could kill the processes that had open files 
> (that is why I did a lsof /mnt/mfs before we umounted)
> 
> Step 3:
> On a metalogger server metarestore the backups to a new metadata.mfs 
> file, and copy that file to the new MooseFS master server. 
> mfsmetarestore is part of the mfsmaster package, so it's best to do this 
> on the secondary (or standby) master server.
> 
> /usr/sbin/mfsmetalogger stop
> /usr/sbin/mfsmetarestore -a
> /usr/bin/scp /var/lib/mfs/metadata.mfs root@10.1.8.100:/var/lib/mfs/
> 
> Step 4:
> On the new MooseFS master server start mfsmaster with the metadata.mfs 
> created in step 2.
> 
> /usr/sbin/mfsmaster start
> 
> Step 5:
> Do an ip failover by shutting down the network connections to the first 
> MooseFS server (ucarp should handle the rest)
> 
> We now should now have a functioning MooseFS cluster again, with a 
> 3.0.39 master and 1.6.25 metaloggers, chunkservers and clients.
> 
> Step 6:
> On all clients, change mfsclient from readonly mode to read-write. Lazy 
> again, which makes it faster.
> 
> /bin/umount -l /mnt/mfs; /bin/mount /mnt/mfs
> 
> Step 7:
> Upgrade metaloggers to 3.0.39
> 
> Step 8:
> Upgrade chunkservers to 3.0.39
> 
> Step 9:
> Upgrade clients to 3.0.39
> 
> 
> Thank you for taking the time to read this.
> 
> Greetings, Casper
> 
> ------------------------------------------------------------------------------
> _________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [MooseFS-Users] MooseFS upgrade plan 1.6 -> 3.0

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [MooseFS-Users] MooseFS upgrade plan 1.6 -> 3.0