From: Allen, B. S <bs...@la...> - 2012-04-03 23:13:15
|
Quenten, I'm using MFS with ZFS. I use ZFS for RAIDZ2 (RAID6) and hot sparing on each chunkserver. I then only set a goal of 2 in MFS. I also have a "scratch" directory within MFS that is set to goal 1 and not backed up to tape. I attempt to get my users to organize their data between their data directory and scratch to minimize goal overhead for data that doesn't require it. Overhead of my particular ZFS setup is ~15% lost to parity and hot spares. Although I was a bit bold with my RAIDZ2 configuration, which will cause rebuild time to be quite long in trade off for lower overhead. This was done with the knowledge that RAIDZ2 can withstand two drive failures, and MFS would have another copy of the data on another chunk server. I have not however tested how well MFS handles a ZFS pool degraded with data loss. I'm guessing I would take the chunkserver daemon offline, get the ZFS pool into a rebuilding state, and restart the CS. I'm guessing the CS will see missing chunks, mark them undergoal, and re-replicate them. A more cautious RAID set would be closer to 30% overhead. Then of course with goal 2 you lose another 50%. A side benefit of using ZFS is on-the-fly compression and de-dup of your chunkserver, L2ARC SSD read cache (although it turns out most of my cache hits are from L1ARC, i.e. memory), and to speed up writes you can add a pair of ZIL SSDs. For disaster recovery you always need to be extra careful when relying on a single system todo your live and DR sites. In this case you're asking for MFS to push data to another site. You'd then be relying on a single piece of software that could equally corrupt your live site and your DR site. Ben On Apr 3, 2012, at 3:36 PM, Quenten Grasso wrote: > Hi All, > > How large is your metadata & logs at this stage? Just trying to mitigate this exact issue myself. > > I was planning to create hourly snapshots (as I understand the way they are implemented they don't affect performance unlike a vmware snapshot please correct me if I'm wrong) and copy these offsite to another mfs/cluster using rsync w/ snapshots on the other site with maybe a goal of 2 at most and using a goal of 3 on site. > > I guess the big issue here is storing our data 5 times in total vs. tapes however I guess it would be "quicker" to recover from a "failure" having a running cluster on site b vs a tape backup and dare i say it (possibly) more reliable then a singular tape and tape library. > > Also I've been tossing up the idea of using ZFS for storage, reason I say this is because I know mfs has built in check-summing/aka zfs and all that good stuff, however having to store our data 3 times + 2 times is expensive maybe storing it 2+1 instead would work out at scale by using the likes of ZFS for reliability then using mfs for purely for availability instead of reliability & availability as well... > > Would be great if there was away to use some kind of rack awareness to say at all times keep goal of 1 or 2 of the data offsite on our 2nd mfs cluster. When I was speaking to one of the staff of the mfs support team they mentioned this was kind of being developed for another customer, So we may see some kind of solution? > > Quenten > > -----Original Message----- > From: Allen, Benjamin S [mailto:bs...@la...] > Sent: Wednesday, 4 April 2012 7:17 AM > To: moo...@li... > Subject: Re: [Moosefs-users] Backup strategies > > Similar plan here. > > I have a dedicated server for MFS backup purposes. We're using IBM's Tivoli to push to a large GPFS archive system backed with a SpectraLogic tape library. I have the standard Linux Tivoli client running on this host. One key with Tivoli is to use the DiskCacheMethod, and set the disk cache to be somewhere on local disk instead of the root of the mfs mount. > > Also I backup mfsmaster's files every hour and retain at least a week of these backups. From the various horror stories we've heard on this mailing list, all have been from corrupt metadata files from mfsmaster. It's a really good idea to limit your exposure to this. > > For good measure I also backup metalogger's files every night. > > One dream for backup of MFS is to somehow utilize the metadata files dumped by mfsmaster or metalogger, to be able to do a metadata "diff". The goal of this process would be to produce a list of all objects in the filesystem that have changed between two metadata.mfs.back files. Thus you could feed your backup client a list of files, without having the need for the client to inspect the filesystem itself. This idea is inspired by ZFS' diff functionality. Where ZFS can show the changes between a snapshot and the live filesystem. > > Ben > > On Apr 3, 2012, at 2:18 PM, Atom Powers wrote: > >> I've been thinking about this for a while and I think occam's razor (the >> simplest ideas is the best) might provide some guidance. >> >> MooseFS is fault-tolerant; so you can mitigate "hardware failure". >> MooseFS provides a trash space, so you can mitigate "accidental >> deletion" events. >> MooseFS provides snapshots, so you can mitigate "corruption" events. >> >> The remaining scenario, "somebody stashes a nuclear warhead in the >> locker room", requires off-site backup. If "rack awareness" was able to >> guarantee chucks in multiple locations, then that would mitigate this >> event. Since it can't I'm going to be sending data off-site using a >> large LTO5 tape library managed by Bacula on a server that also runs >> mfsmount of the entire system. >> >> On 04/03/2012 12:56 PM, Steve Thompson wrote: >>> OK, so now you have a nice and shiny and absolutely massive MooseFS file >>> system. How do you back it up? >>> >>> I am using Bacula and divide the MFS file system into separate areas (eg >>> directories beginning with a, those beginning with b, and so on) and use >>> several different chunkservers to run the backup jobs, on the theory that >>> at least some of the data is local to the backup process. But this still >>> leaves the vast majority of data to travel the network twice (a planned >>> dedicated storage network has not yet been implemented). This results in >>> pretty bad backup performance and high network load. Any clever ideas? >>> >>> Steve >> >> -- >> -- >> Perfection is just a word I use occasionally with mustard. >> --Atom Powers-- >> Director of IT >> DigiPen Institute of Technology >> +1 (425) 895-4443 >> >> ------------------------------------------------------------------------------ >> Better than sec? Nothing is better than sec when it comes to >> monitoring Big Data applications. Try Boundary one-second >> resolution app monitoring today. Free. >> http://p.sf.net/sfu/Boundary-dev2dev >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |