Re: [Moosefs-users] Backup strategies

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi All,

How large is your metadata & logs at this stage? Just trying to mitigate this exact issue myself.

I was planning to create hourly snapshots (as I understand the way they are implemented they don't affect performance unlike a vmware snapshot please correct me if I'm wrong) and copy these offsite to another mfs/cluster using rsync w/ snapshots on the other site with maybe a goal of 2 at most and using a goal of 3 on site.

I guess the big issue here is storing our data 5 times in total vs. tapes however I guess it would be "quicker" to recover from a "failure" having a running cluster on site b vs a tape backup and dare i say it (possibly) more reliable then a singular tape and tape library.

Also I've been tossing up the idea of using ZFS for storage, reason I say this is because I know mfs has built in check-summing/aka zfs and all that good stuff, however having to store our data 3 times + 2 times is expensive maybe storing it 2+1 instead would work out at scale by using the likes of ZFS for reliability then using mfs for purely for availability instead of reliability & availability as well...

Would be great if there was away to use some kind of rack awareness to say at all times keep goal of 1 or 2 of the data offsite on our 2nd mfs cluster. When I was speaking to one of the staff of the mfs support team they mentioned this was kind of being developed for another customer, So we may see some kind of solution?

Quenten

-----Original Message-----
From: Allen, Benjamin S [mailto:bs...@la...] 
Sent: Wednesday, 4 April 2012 7:17 AM
To: moo...@li...
Subject: Re: [Moosefs-users] Backup strategies

Similar plan here.

I have a dedicated server for MFS backup purposes. We're using IBM's Tivoli to push to a large GPFS archive system backed with a SpectraLogic tape library. I have the standard Linux Tivoli client running on this host. One key with Tivoli is to use the DiskCacheMethod, and set the disk cache to be somewhere on local disk instead of the root of the mfs mount. 

Also I backup mfsmaster's files every hour and retain at least a week of these backups. From the various horror stories we've heard on this mailing list, all have been from corrupt metadata files from mfsmaster. It's a really good idea to limit your exposure to this.

For good measure I also backup metalogger's files every night.

One dream for backup of MFS is to somehow utilize the metadata files dumped by mfsmaster or metalogger, to be able to do a metadata "diff". The goal of this process would be to produce a list of all objects in the filesystem that have changed between two metadata.mfs.back files. Thus you could feed your backup client a list of files, without having the need for the client to inspect the filesystem itself. This idea is inspired by ZFS' diff functionality. Where ZFS can show the changes between a snapshot and the live filesystem.

Ben

On Apr 3, 2012, at 2:18 PM, Atom Powers wrote:

> I've been thinking about this for a while and I think occam's razor (the 
> simplest ideas is the best) might provide some guidance.
> 
> MooseFS is fault-tolerant; so you can mitigate "hardware failure".
> MooseFS provides a trash space, so you can mitigate "accidental 
> deletion" events.
> MooseFS provides snapshots, so you can mitigate "corruption" events.
> 
> The remaining scenario, "somebody stashes a nuclear warhead in the 
> locker room", requires off-site backup. If "rack awareness" was able to 
> guarantee chucks in multiple locations, then that would mitigate this 
> event. Since it can't I'm going to be sending data off-site using a 
> large LTO5 tape library managed by Bacula on a server that also runs 
> mfsmount of the entire system.
> 
> On 04/03/2012 12:56 PM, Steve Thompson wrote:
>> OK, so now you have a nice and shiny and absolutely massive MooseFS file
>> system. How do you back it up?
>> 
>> I am using Bacula and divide the MFS file system into separate areas (eg
>> directories beginning with a, those beginning with b, and so on) and use
>> several different chunkservers to run the backup jobs, on the theory that
>> at least some of the data is local to the backup process. But this still
>> leaves the vast majority of data to travel the network twice (a planned
>> dedicated storage network has not yet been implemented). This results in
>> pretty bad backup performance and high network load. Any clever ideas?
>> 
>> Steve
> 
> -- 
> --
> Perfection is just a word I use occasionally with mustard.
> --Atom Powers--
> Director of IT
> DigiPen Institute of Technology
> +1 (425) 895-4443
> 
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second 
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] Backup strategies

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [Moosefs-users] Backup strategies