[MooseFS-Users] MooseFS upgrade plan 1.6 -> 3.0

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi All,

I'm planning for the upgrade of our MooseFS system. I've waited much too 
long as we're still at 1.6.25.

One of the main reasons that we've waited so long is because we do not 
like downtime. To minimize downtime I've come up with the 9 step plan 
below, and if you have the time I would really appreciate if you can 
have a look at it. I want to know it this makes any sense. Some of you 
might even learn from some of the clever tricks I pull. Instead of an 
in-place upgrade of the servers I plan to create a new master, copy the 
master data to a new master and do a ucarp failover.

The other reason I like this plan is that right after the failover I can 
test the system and if I see something I do not like I can rollback to 
the previous master. Only step 6 is the point of no return.

I have two assumptions that I'm unsure about.

1. mfsmaster version 3.0.39 can read the metadata.mfs written by 1.6.25 
mfsmetarestore -a
2. 1.6.25 metaloggers, chunkservers and clients automatically reconnect 
correctly to a mfsmaster 'failover' with new master version 3.0.39 
without problems.

If these steps all work, the filesystem is almost fully available for 
reading during the upgrade.
The exceptions to these are of a very small window (subsecond) to 
remount in step 2 and 6, and a bigger window where all the clients and 
chunkservers need to detect master failure and reconnect to the new 
upgraded master. I expect this to take a few seconds.

The filesystem is closed for writes from step 2 to 6. On my system I 
roughly estimate this (step 3 and 4) to take about 5-10 minutes. RAM 
used by my 1.6.25 mfsmaster is 11GiB, I run on commodity server hardware.

What do you think of this plan?

Step 1:
Install a new MooseFS master server, latest version (as time of writing 
3.0.39)
with cgi server too! ucarp config same as current primary master. (ucarp 
failover will make this new server the new mfsmaster when current master 
is shut down)

Step 2:
On all clients, change mfsclient into readonly mode.

This is done by doing a lazy umount, immediately followed by a readonly 
mount. This means that any open files will keep using the read-write 
mount, all new files are opened throught the read-only mount. When all 
opened files on the old read-write mount have been closed, the 
read-write mfsmount process terminates. We monitor it with 'ps x | grep 
rw,mfsmaste[r]' and we wait until it has terminated.

/usr/bin/lsof /mnt/mfs; /bin/umount -l /mnt/mfs; /usr/bin/mfsmount 
/mnt/mfs -o ro,mfsmaster=10.1.7.1,mfsport=9421,dev,suid
echo -n "Waiting for rw mfsmount to stop: "; while ps auxf | grep 
rw,mfsmaste[r] > /dev/null; do echo -n "."; sleep 1; done; echo " done";

If this takes too long we could kill the processes that had open files 
(that is why I did a lsof /mnt/mfs before we umounted)

Step 3:
On a metalogger server metarestore the backups to a new metadata.mfs 
file, and copy that file to the new MooseFS master server. 
mfsmetarestore is part of the mfsmaster package, so it's best to do this 
on the secondary (or standby) master server.

/usr/sbin/mfsmetalogger stop
/usr/sbin/mfsmetarestore -a
/usr/bin/scp /var/lib/mfs/metadata.mfs root@10.1.8.100:/var/lib/mfs/

Step 4:
On the new MooseFS master server start mfsmaster with the metadata.mfs 
created in step 2.

/usr/sbin/mfsmaster start

Step 5:
Do an ip failover by shutting down the network connections to the first 
MooseFS server (ucarp should handle the rest)

We now should now have a functioning MooseFS cluster again, with a 
3.0.39 master and 1.6.25 metaloggers, chunkservers and clients.

Step 6:
On all clients, change mfsclient from readonly mode to read-write. Lazy 
again, which makes it faster.

/bin/umount -l /mnt/mfs; /bin/mount /mnt/mfs

Step 7:
Upgrade metaloggers to 3.0.39

Step 8:
Upgrade chunkservers to 3.0.39

Step 9:
Upgrade clients to 3.0.39

Thank you for taking the time to read this.

Greetings, Casper

[MooseFS-Users] MooseFS upgrade plan 1.6 -> 3.0

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

[MooseFS-Users] MooseFS upgrade plan 1.6 -> 3.0