From: Ken <ken...@gm...> - 2012-04-04 07:21:27
|
more detail here: http://sourceforge.net/mailarchive/message.php?msg_id=28664530 -Ken On Wed, Apr 4, 2012 at 1:24 PM, Wang Jian <jia...@re...> wrote: > For desasters such as earthquake, fire, and flood, off-site backup is > must-have, and any RAID level solution is sheer futile. > > As Atom Powers said, Moosefs should provide off-site backup mechanism. > > Months before, my colleague Ken Shao sent in some patches to provide > "class" based goal mechanism, which enables us to define different "class" > to differentiate physical location and backup data in other physical > locations (i.e, 500km - 1000km away). > > The design principles are: > > 1. We can afford to lose some data during the backup point and disaster > point. In this case, old data or old version of data are intact, new data > or new version of data are lost. > 2. Because cluster-to-cluster backup has many drawbacks (performance, > consistency, etc), the duplication from one location to another location > should be within a single cluster. > 3. Location-to-location duplication should not happen when writing, or the > performance/latency is hurt badly. So, the goal recovery mechanism can be > and should be used (CS to CS duplication). And to improve bandwidth > efficiency and avoid peek load time, duplication can be controlled in > timely manner, and dirty/delta algorithm should be used. > 4. Meta data should be logger to the backup site. When disaster happens, > the backup site can be promoted to master site. > > The current rack awareness implementation is not the very thing we are > looking forward to. > > Seriously speaking, as 10gb ether connection is getting cheaper and > cheaper, the traditional rack awareness is rendered useless. > > > 于 2012/4/4 7:13, Allen, Benjamin S 写道: > >> Quenten, >> >> I'm using MFS with ZFS. I use ZFS for RAIDZ2 (RAID6) and hot sparing on >> each chunkserver. I then only set a goal of 2 in MFS. I also have a >> "scratch" directory within MFS that is set to goal 1 and not backed up to >> tape. I attempt to get my users to organize their data between their data >> directory and scratch to minimize goal overhead for data that doesn't >> require it. >> >> Overhead of my particular ZFS setup is ~15% lost to parity and hot >> spares. Although I was a bit bold with my RAIDZ2 configuration, which will >> cause rebuild time to be quite long in trade off for lower overhead. This >> was done with the knowledge that RAIDZ2 can withstand two drive failures, >> and MFS would have another copy of the data on another chunk server. I have >> not however tested how well MFS handles a ZFS pool degraded with data loss. >> I'm guessing I would take the chunkserver daemon offline, get the ZFS pool >> into a rebuilding state, and restart the CS. I'm guessing the CS will see >> missing chunks, mark them undergoal, and re-replicate them. >> >> A more cautious RAID set would be closer to 30% overhead. >> >> Then of course with goal 2 you lose another 50%. >> >> A side benefit of using ZFS is on-the-fly compression and de-dup of your >> chunkserver, L2ARC SSD read cache (although it turns out most of my cache >> hits are from L1ARC, i.e. memory), and to speed up writes you can add a >> pair of ZIL SSDs. >> >> For disaster recovery you always need to be extra careful when relying on >> a single system todo your live and DR sites. In this case you're asking for >> MFS to push data to another site. You'd then be relying on a single piece >> of software that could equally corrupt your live site and your DR site. >> >> Ben >> >> On Apr 3, 2012, at 3:36 PM, Quenten Grasso wrote: >> >> Hi All, >>> >>> How large is your metadata& logs at this stage? Just trying to mitigate >>> this exact issue myself. >>> >>> >>> I was planning to create hourly snapshots (as I understand the way they >>> are implemented they don't affect performance unlike a vmware snapshot >>> please correct me if I'm wrong) and copy these offsite to another >>> mfs/cluster using rsync w/ snapshots on the other site with maybe a goal of >>> 2 at most and using a goal of 3 on site. >>> >>> I guess the big issue here is storing our data 5 times in total vs. >>> tapes however I guess it would be "quicker" to recover from a "failure" >>> having a running cluster on site b vs a tape backup and dare i say it >>> (possibly) more reliable then a singular tape and tape library. >>> >>> Also I've been tossing up the idea of using ZFS for storage, reason I >>> say this is because I know mfs has built in check-summing/aka zfs and all >>> that good stuff, however having to store our data 3 times + 2 times is >>> expensive maybe storing it 2+1 instead would work out at scale by using the >>> likes of ZFS for reliability then using mfs for purely for availability >>> instead of reliability& availability as well... >>> >>> >>> Would be great if there was away to use some kind of rack awareness to >>> say at all times keep goal of 1 or 2 of the data offsite on our 2nd mfs >>> cluster. When I was speaking to one of the staff of the mfs support team >>> they mentioned this was kind of being developed for another customer, So we >>> may see some kind of solution? >>> >>> Quenten >>> >>> -----Original Message----- >>> From: Allen, Benjamin S [mailto:bs...@la...] >>> Sent: Wednesday, 4 April 2012 7:17 AM >>> To: moosefs-users@lists.**sourceforge.net<moo...@li...> >>> Subject: Re: [Moosefs-users] Backup strategies >>> >>> Similar plan here. >>> >>> I have a dedicated server for MFS backup purposes. We're using IBM's >>> Tivoli to push to a large GPFS archive system backed with a SpectraLogic >>> tape library. I have the standard Linux Tivoli client running on this host. >>> One key with Tivoli is to use the DiskCacheMethod, and set the disk cache >>> to be somewhere on local disk instead of the root of the mfs mount. >>> >>> Also I backup mfsmaster's files every hour and retain at least a week of >>> these backups. From the various horror stories we've heard on this mailing >>> list, all have been from corrupt metadata files from mfsmaster. It's a >>> really good idea to limit your exposure to this. >>> >>> For good measure I also backup metalogger's files every night. >>> >>> One dream for backup of MFS is to somehow utilize the metadata files >>> dumped by mfsmaster or metalogger, to be able to do a metadata "diff". The >>> goal of this process would be to produce a list of all objects in the >>> filesystem that have changed between two metadata.mfs.back files. Thus you >>> could feed your backup client a list of files, without having the need for >>> the client to inspect the filesystem itself. This idea is inspired by ZFS' >>> diff functionality. Where ZFS can show the changes between a snapshot and >>> the live filesystem. >>> >>> Ben >>> >>> On Apr 3, 2012, at 2:18 PM, Atom Powers wrote: >>> >>> I've been thinking about this for a while and I think occam's razor (the >>>> simplest ideas is the best) might provide some guidance. >>>> >>>> MooseFS is fault-tolerant; so you can mitigate "hardware failure". >>>> MooseFS provides a trash space, so you can mitigate "accidental >>>> deletion" events. >>>> MooseFS provides snapshots, so you can mitigate "corruption" events. >>>> >>>> The remaining scenario, "somebody stashes a nuclear warhead in the >>>> locker room", requires off-site backup. If "rack awareness" was able to >>>> guarantee chucks in multiple locations, then that would mitigate this >>>> event. Since it can't I'm going to be sending data off-site using a >>>> large LTO5 tape library managed by Bacula on a server that also runs >>>> mfsmount of the entire system. >>>> >>>> On 04/03/2012 12:56 PM, Steve Thompson wrote: >>>> >>>>> OK, so now you have a nice and shiny and absolutely massive MooseFS >>>>> file >>>>> system. How do you back it up? >>>>> >>>>> I am using Bacula and divide the MFS file system into separate areas >>>>> (eg >>>>> directories beginning with a, those beginning with b, and so on) and >>>>> use >>>>> several different chunkservers to run the backup jobs, on the theory >>>>> that >>>>> at least some of the data is local to the backup process. But this >>>>> still >>>>> leaves the vast majority of data to travel the network twice (a planned >>>>> dedicated storage network has not yet been implemented). This results >>>>> in >>>>> pretty bad backup performance and high network load. Any clever ideas? >>>>> >>>>> Steve >>>>> >>>> -- >>>> -- >>>> Perfection is just a word I use occasionally with mustard. >>>> --Atom Powers-- >>>> Director of IT >>>> DigiPen Institute of Technology >>>> +1 (425) 895-4443 >>>> >>>> ------------------------------**------------------------------** >>>> ------------------ >>>> Better than sec? Nothing is better than sec when it comes to >>>> monitoring Big Data applications. Try Boundary one-second >>>> resolution app monitoring today. Free. >>>> http://p.sf.net/sfu/Boundary-**dev2dev<http://p.sf.net/sfu/Boundary-dev2dev> >>>> ______________________________**_________________ >>>> moosefs-users mailing list >>>> moosefs-users@lists.**sourceforge.net<moo...@li...> >>>> https://lists.sourceforge.net/**lists/listinfo/moosefs-users<https://lists.sourceforge.net/lists/listinfo/moosefs-users> >>>> >>> >>> ------------------------------**------------------------------** >>> ------------------ >>> Better than sec? Nothing is better than sec when it comes to >>> monitoring Big Data applications. Try Boundary one-second >>> resolution app monitoring today. Free. >>> http://p.sf.net/sfu/Boundary-**dev2dev<http://p.sf.net/sfu/Boundary-dev2dev> >>> ______________________________**_________________ >>> moosefs-users mailing list >>> moosefs-users@lists.**sourceforge.net<moo...@li...> >>> https://lists.sourceforge.net/**lists/listinfo/moosefs-users<https://lists.sourceforge.net/lists/listinfo/moosefs-users> >>> >>> ------------------------------**------------------------------** >>> ------------------ >>> Better than sec? Nothing is better than sec when it comes to >>> monitoring Big Data applications. Try Boundary one-second >>> resolution app monitoring today. Free. >>> http://p.sf.net/sfu/Boundary-**dev2dev<http://p.sf.net/sfu/Boundary-dev2dev> >>> ______________________________**_________________ >>> moosefs-users mailing list >>> moosefs-users@lists.**sourceforge.net<moo...@li...> >>> https://lists.sourceforge.net/**lists/listinfo/moosefs-users<https://lists.sourceforge.net/lists/listinfo/moosefs-users> >>> >> >> ------------------------------**------------------------------** >> ------------------ >> Better than sec? Nothing is better than sec when it comes to >> monitoring Big Data applications. Try Boundary one-second >> resolution app monitoring today. Free. >> http://p.sf.net/sfu/Boundary-**dev2dev<http://p.sf.net/sfu/Boundary-dev2dev> >> ______________________________**_________________ >> moosefs-users mailing list >> moosefs-users@lists.**sourceforge.net<moo...@li...> >> https://lists.sourceforge.net/**lists/listinfo/moosefs-users<https://lists.sourceforge.net/lists/listinfo/moosefs-users> >> >> > |