From: Mark C. <mca...@em...> - 2013-02-28 21:58:21
|
So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD RAID1 array to an external Fireproof drive (with plans to also sync to a remote server at our collo). I found the script BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples I've seen online have indicated to me that this isn't quite what I'm looking for, since it appears to output it to a different layout. I initially tried the rsync method with -H, but my server would end up choking at 350GB. Any suggestions on how to do this? Thanks, Mark |
From: Les M. <les...@gm...> - 2013-02-28 23:36:56
|
On Thu, Feb 28, 2013 at 3:10 PM, Mark Campbell <mca...@em...> wrote: > So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD > RAID1 array to an external Fireproof drive (with plans to also sync to a > remote server at our collo). I found the script BackupPC_CopyPcPool.pl by > Jeffrey, but the syntax and the few examples I've seen online have indicated > to me that this isn't quite what I'm looking for, since it appears to output > it to a different layout. I initially tried the rsync method with -H, but > my server would end up choking at 350GB. Any suggestions on how to do this? I'm not sure anyone has come up with a really good way to do this. One approach is to use a 3-member raid1 where you periodically remove a drive and resync a new one. If you have reasonable remote bandwidth and enough of a backup window, it is much easier to just run another instance of backuppc hitting the same targets independently. -- Les Mikesell les...@gm... |
From: <bac...@ko...> - 2013-03-01 02:44:34
|
Mark Campbell wrote at about 14:10:13 -0700 on Thursday, February 28, 2013: > So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD RAID1 array to an external Fireproof drive (with plans to also sync to a remote server at our collo). I found the script BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples I've seen online have indicated to me that this isn't quite what I'm looking for, since it appears to output it to a different layout. I initially tried the rsync method with -H, but my server would end up choking at 350GB. Any suggestions on how to do this? The bottom line is that other than doing a block level file system copy there is no "free lunch" that gets around the hard problem of copying over densely hard-linked files. As many like yourself have noted, rsync bogs down using the -H (hard links) flag, in part because rsync knows nothing of the special structure of the pool & pc trees so it has to keep full track of all possible hard links. One solution is BackupPC_tarPCCopy which uses a tar-like perl script to track and copy over the structure. My script BackupPC_copyPcPool tries to combine the best of both worlds. It allows you to use rsync or even "cp -r" to copy over the pool disregarding any hard links. The pc tree with its links to the pool are re-created by creating a flat file listing all the links, directories, and zero size files that comprise the pc tree. This is done with the help of a hash that caches the inode number of each pool entry. The pc tree is then recreated by sequentially (re)creating directories, zero size files, and links to the pool. I have substantially re-written my original script to make it orders of magnitude faster by substituting a packed in-memory hash for the file-system inode-tree I used in the previous version. Several other improvements have been added, including the ability to record full file md5sums and to fix broken/missing links. I was able to copy over a BackupPC tree consisting of 1.3 million pool files (180 GB) and 24 million pc tree entries (4 million directories, 20 million links, 300 thousand zero-length files) in the following time: ~4 hours to copy over the pool ~5 hours to create the flat file mapping out the pc tree directories, hard links & zero length files ~7 hours to convert the flat file into a new pc tree on the target filesystem These numbers are approximate since I didn't really time it. But it was all done on a low end AMD dual-core laptop with a single USB3 drive. For this case, the flat file of links/directories/zero length files is 660 MB compress (about 3.5 GB uncompressed). The inode caching requires about 250MB of RAM (mostly due to perl overhead) for the 1.3 million pool files. Note, before I release the revised script, I also hope to add a feature that allows the copying of one or more backups from the pc tree on one machine to the pc tree on another machine (with a different pool). This feature is not available on any other backup scheme... and effectively will allow "incremental-like" backups. I also plan to allow the option to more tightly pack the inode caching to save memory at the expense of some speed. I should be able to fit 10 million pool nodes in a 300MB cache. I would like to benchmark my revised routine against BackupPC_tarPCCopy in terms of speed, memory requirement, and generated file size... |
From: Les M. <les...@gm...> - 2013-03-01 03:15:02
|
On Thu, Feb 28, 2013 at 8:43 PM, <bac...@ko...> wrote: > > Note, before I release the revised script, I also hope to add a feature that > allows the copying of one or more backups from the pc tree on one > machine to the pc tree on another machine (with a different > pool). This feature is not available on any other backup scheme... and > effectively will allow "incremental-like" backups. That could also be extremely handy when migrating to a new server - I always have a lot of machines where I don't care about old history mixed in with the few where I do. -- Les Mikesell les...@gm... |
From: Arnold K. <ar...@ar...> - 2013-03-01 03:15:30
Attachments:
signature.asc
|
On Thu, 28 Feb 2013 14:10:13 -0700 Mark Campbell <mca...@em...> wrote: > So I'm trying to get a BackupPC pool synced on a daily basis from a > 1TB MD RAID1 array to an external Fireproof drive (with plans to also > sync to a remote server at our collo). I found the script > BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few > examples I've seen online have indicated to me that this isn't quite > what I'm looking for, since it appears to output it to a different > layout. I initially tried the rsync method with -H, but my server > would end up choking at 350GB. Any suggestions on how to do this? Create a snapshot from the underlying lvm-volume and then copy / zip that snapshot directly. Or use BackupPC's 'archive' method to write full tar.gz of your hosts to your external disk. We are using that to write tgz to a directory where amanda then writes these to tape... Have fun, Arnold |
From: gregrwm <bac...@wh...> - 2013-03-01 19:37:06
|
i'm using a simple procedure i cooked up to maintain a "third copy" at a third physical location using as little bandwidth as possible. it simply looks at each pc/*/backups, selects the most recent full and most recent incremental (plus any partial or /new), and copies them across the wire, together with the most recently copied full&incremental set (plus any incompletely copied sets), using rsync, with it's hardlink copying feature. thus my third location has a copy of the most recent (already compressed) pc/ tree data, using rsync to avoid copying stuff over the wire that's already there (and not bothering with the cpool), which, for me, is a happily sized set of hardlinks that rsync can actually manage (ymmv). i have successfully used this together with a script to recreate the cpool if/when necessary. if it's of interest i could share it. |
From: John R. <rou...@re...> - 2013-03-01 20:25:35
|
On Fri, Mar 01, 2013 at 04:24:10AM -0600, gregrwm wrote: > i'm using a simple procedure i cooked up to maintain a "third copy" at a > third physical location using as little bandwidth as possible. it simply > looks at each pc/*/backups, selects the most recent full and most recent > incremental (plus any partial or /new), and copies them across the wire, > together with the most recently copied full&incremental set > [...] > that's already there (and not bothering with the cpool), which, for me, is > a happily sized set of hardlinks that rsync can actually manage (ymmv). > [...] if it's of interest i could share it. I think this fills a useful use case, so yeah I would say send it to the mailing list. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 |
From: gregrwm <bac...@wh...> - 2013-03-03 01:18:54
|
> > > i'm using a simple procedure i cooked up to maintain a "third copy" at a > > third physical location using as little bandwidth as possible. it simply > > looks at each pc/*/backups, selects the most recent full and most recent > > incremental (plus any partial or /new), and copies them across the wire, > > together with the most recently copied full&incremental set > > [...] > > that's already there (and not bothering with the cpool), which, for me, > is > > a happily sized set of hardlinks that rsync can actually manage (ymmv). > > [...] if it's of interest i could share it. > > I think this fills a useful use case, so yeah I would say send it to > the mailing list. > #currently running as "pull", tho can run as "push" with minor mods: #(root bash) rbh=remote.backuppc.host b=/var/lib/BackupPC/pc cd /local/third/copy sb='sudo -ubackuppc' ssp="$sb ssh -p2222" #nonstandard ssh port ssb="$ssp $rbh cd $b" from_to=("$rbh:$b/*" .) fob=$ssb df=($(df -m .)) prun= #prun=--delete-excluded to prune local/third/copy down to most recent backups only echo df=${df[10]} prun=$prun #show current filespace and prun setting [ ! -s iMRFIN ]&&{ touch iMRFIN ||exit $?;} #most recent finished set [ ! -s iMRUNF ]&&{ touch iLRUNF ||exit $?;}||{ cat iMRUNF>>iLRUNF||exit $?;} #most recent and less recent unfinished sets $fob 'echo " --include=*/new" #any unfinished backups for m in */backups;do unset f i #look at all pc/*/backups files while read -r r;do r=($r) [[ ${r[1]} = full ]]&&fu[f++]=$r [[ ${r[1]} = incr ]]&&in[i++]=$r [[ ${r[1]} = partial ]]&&echo " --include=${m%backups}$r" #any incomplete backups done < $m [[ $f -gt 0 ]]&&echo " --include=${m%backups}${fu[f-1]}" #most recent full [[ $i -gt 0 ]]&&echo " --include=${m%backups}${in[i-1]}" #most recent incremental done'>| iMRUNF ||echo badexit;head -99 i* #show backup sets included for transfer rc=255;while [[ $rc = 255 ]];do date #reconnect if 255(connection dropped) #note some special custom excludes are on a separate line rsync -qPHSae"$ssp" --rsync-path="sudo rsync" $(cat iMRFIN iLRUNF iMRUNF) $prun --exclude="/*/*/" \ --exclude=fNAVupdate --exclude=fDownloads --exclude=\*Personal --exclude="*COPY of C*" \ "${from_to[@]}" rc=$?;echo rc=$rc;if [ $rc = 0 ];then mv iMRUNF iMRFIN;rm iLRUNF;fi;done;df -m . |
From: Lars T. Skjong-B. <li...@sn...> - 2013-03-01 21:24:39
|
Hi, On 3/1/13 12:34 AM, Les Mikesell wrote: > On Thu, Feb 28, 2013 at 3:10 PM, Mark Campbell <mca...@em...> wrote: > >> So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD >> RAID1 array to an external Fireproof drive (with plans to also sync to a >> remote server at our collo). > > I'm not sure anyone has come up with a really good way to do this. > One approach is to use a 3-member raid1 where you periodically remove > a drive and resync a new one. If you have reasonable remote > bandwidth and enough of a backup window, it is much easier to just run > another instance of backuppc hitting the same targets independently. I have come up with a IMHO good way to do this using ZFS (ZFSonLinux). Description: * uses 3 disks. * at all times, keep 1 mirrored disk in a fire safe. * periodically swap the safe disk with mirror in server. 1. create a zpool with three mirrored members. 2. create a filesystem on it and mount at /var/lib/backuppc. 3. do some backups. 4. detach one disk and put in safe. 5. do more backups. 6. detach one disk and swap with the other disk in the safe. 7. attach and online the disk from the safe. 8. watch it sync up. I am currently using 2TB disks, and swap period of 1 month. Because of ZFS it doesn't need to sync all the blocks, but only the changed blocks since 1 month ago. For example, with 10GB changed it will sync in less than 25 minutes (approx. 7 MB/s speed). That's a lot faster than anything I got with mdraid which syncs every block. ZFS also comes with benefits of checksumming and error correction of file content and file metadata. BackupPC also supports error correction through par2, and this gives an extra layer of data protection. Backing up large numbers of files can take a very long time because of harddisk seeking. This can be alleviated by using a SSD cache drive for ZFS. This support for read (ZFS L2ARC) and write (ZFS ZIL) caching on a small SSD (30 GB) cuts incremental time down to half for some shares. As for remote sync, you can use "zfs send" on the backup server and "zfs receive" on the offsite server. This will only send the differences since last sync (like rsync), and will be probably be significantly faster than rsync that in addition has to resolve all the hardlinks. -- Best regards, Lars Tobias |
From: Mark C. <mca...@em...> - 2013-03-01 21:40:09
|
Lars, Thanks for the interesting idea! I confess I haven't played with ZFS much (though I've been wanting to for some time), maybe this is the excuse I need ;). Question, taking your model here, and applying it to my situation, how well would this work: BackupPC server, with a RAID1 zpool, with the third member being my external fireproof drive. Rather than the rotation you described, just leave it as is as it does its daily routine. Then, should the day come where I need to grab the drive and go, plugging the drive into a system with ZFSonLinux & BackupPC installed, could I mount this drive by itself? I really like your idea of zfs send/receive for the remote copy. Do you have any tips/pointers/docs on the best way to run it in this scenario? Thanks, --Mark -----Original Message----- From: Lars Tobias Skjong-Børsting [mailto:li...@sn...] Sent: Friday, March 01, 2013 4:18 AM To: bac...@li... Subject: Re: [BackupPC-users] BackupPC Pool synchronization? Hi, On 3/1/13 12:34 AM, Les Mikesell wrote: > On Thu, Feb 28, 2013 at 3:10 PM, Mark Campbell <mca...@em...> wrote: > >> So I'm trying to get a BackupPC pool synced on a daily basis from a >> 1TB MD >> RAID1 array to an external Fireproof drive (with plans to also sync >> to a remote server at our collo). > > I'm not sure anyone has come up with a really good way to do this. > One approach is to use a 3-member raid1 where you periodically remove > a drive and resync a new one. If you have reasonable remote > bandwidth and enough of a backup window, it is much easier to just run > another instance of backuppc hitting the same targets independently. I have come up with a IMHO good way to do this using ZFS (ZFSonLinux). Description: * uses 3 disks. * at all times, keep 1 mirrored disk in a fire safe. * periodically swap the safe disk with mirror in server. 1. create a zpool with three mirrored members. 2. create a filesystem on it and mount at /var/lib/backuppc. 3. do some backups. 4. detach one disk and put in safe. 5. do more backups. 6. detach one disk and swap with the other disk in the safe. 7. attach and online the disk from the safe. 8. watch it sync up. I am currently using 2TB disks, and swap period of 1 month. Because of ZFS it doesn't need to sync all the blocks, but only the changed blocks since 1 month ago. For example, with 10GB changed it will sync in less than 25 minutes (approx. 7 MB/s speed). That's a lot faster than anything I got with mdraid which syncs every block. ZFS also comes with benefits of checksumming and error correction of file content and file metadata. BackupPC also supports error correction through par2, and this gives an extra layer of data protection. Backing up large numbers of files can take a very long time because of harddisk seeking. This can be alleviated by using a SSD cache drive for ZFS. This support for read (ZFS L2ARC) and write (ZFS ZIL) caching on a small SSD (30 GB) cuts incremental time down to half for some shares. As for remote sync, you can use "zfs send" on the backup server and "zfs receive" on the offsite server. This will only send the differences since last sync (like rsync), and will be probably be significantly faster than rsync that in addition has to resolve all the hardlinks. -- Best regards, Lars Tobias ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ BackupPC-users mailing list Bac...@li... List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/ |
From: Mark C. <mca...@em...> - 2013-03-01 21:43:20
|
I find myself rather surprised that this is a major issue in what is otherwise a really good enterprise-level backup tool. Syncronizing backups just seems to be a basic element to the idea of backups in a corporate environment. Should the building that my backup server resides in burns down, gets hit by a tornado, etc, there should be a process whereby you can have a syncronized backup elsewhere. Also by extension, what happens when you want to have a "cluster" of BackupPC? The idea that you just run two BackupPC servers each running their own backups may work in some cases, but you are talking about double the transfers on the machines being backed up, and that can be unacceptable in some cases. For example, one of my machines being backed up is a linux server acting as a network drive. Backups of this can take a long time, BackupPC tells me 514 minutes for it's last full backup (naturally, this occurs after business hours). Once its been backed up, it's been deduped & compressed. It would ideally be better, even on a LAN, to transfer this compressed & deduped pool than it would to back it up twice on the same day. In the case of my network drive, worst case it gets bogged down 8hrs a day for backup. I have a small space of time that is considered "off hours" for it. My backup server on the other hand, can be bogged down 24 hrs/day for all I care, no one else is using its services but me. Jeffrey, what is your latest version of your script? I have 0.1.3, circa Sept '11. Given how your script generally works, could it be made to simply recreate the pool structure on an external drive on the same system, rather than compressing it to a tarball? My end goal here is to be able to simply grab the external drive at a moment's notice, plug it into a new linux machine, and using a tarball of the BackupPC config files, and stand it up long enough to restore everyone's PCs & appropriate servers. Greg, I would definitely have an interest in seeing the script; anything that will help me achieve a tertiary remote backup... Thanks, --Mark -----Original Message----- From: bac...@ko... [mailto:bac...@ko...] Sent: Thursday, February 28, 2013 9:43 PM To: General list for user discussion, questions and support Subject: Re: [BackupPC-users] BackupPC Pool synchronization? Mark Campbell wrote at about 14:10:13 -0700 on Thursday, February 28, 2013: > So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD RAID1 array to an external Fireproof drive (with plans to also sync to a remote server at our collo). I found the script BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples I've seen online have indicated to me that this isn't quite what I'm looking for, since it appears to output it to a different layout. I initially tried the rsync method with -H, but my server would end up choking at 350GB. Any suggestions on how to do this? The bottom line is that other than doing a block level file system copy there is no "free lunch" that gets around the hard problem of copying over densely hard-linked files. As many like yourself have noted, rsync bogs down using the -H (hard links) flag, in part because rsync knows nothing of the special structure of the pool & pc trees so it has to keep full track of all possible hard links. One solution is BackupPC_tarPCCopy which uses a tar-like perl script to track and copy over the structure. My script BackupPC_copyPcPool tries to combine the best of both worlds. It allows you to use rsync or even "cp -r" to copy over the pool disregarding any hard links. The pc tree with its links to the pool are re-created by creating a flat file listing all the links, directories, and zero size files that comprise the pc tree. This is done with the help of a hash that caches the inode number of each pool entry. The pc tree is then recreated by sequentially (re)creating directories, zero size files, and links to the pool. I have substantially re-written my original script to make it orders of magnitude faster by substituting a packed in-memory hash for the file-system inode-tree I used in the previous version. Several other improvements have been added, including the ability to record full file md5sums and to fix broken/missing links. I was able to copy over a BackupPC tree consisting of 1.3 million pool files (180 GB) and 24 million pc tree entries (4 million directories, 20 million links, 300 thousand zero-length files) in the following time: ~4 hours to copy over the pool ~5 hours to create the flat file mapping out the pc tree directories, hard links & zero length files ~7 hours to convert the flat file into a new pc tree on the target filesystem These numbers are approximate since I didn't really time it. But it was all done on a low end AMD dual-core laptop with a single USB3 drive. For this case, the flat file of links/directories/zero length files is 660 MB compress (about 3.5 GB uncompressed). The inode caching requires about 250MB of RAM (mostly due to perl overhead) for the 1.3 million pool files. Note, before I release the revised script, I also hope to add a feature that allows the copying of one or more backups from the pc tree on one machine to the pc tree on another machine (with a different pool). This feature is not available on any other backup scheme... and effectively will allow "incremental-like" backups. I also plan to allow the option to more tightly pack the inode caching to save memory at the expense of some speed. I should be able to fit 10 million pool nodes in a 300MB cache. I would like to benchmark my revised routine against BackupPC_tarPCCopy in terms of speed, memory requirement, and generated file size... ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ BackupPC-users mailing list Bac...@li... List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/ |
From: John R. <rou...@re...> - 2013-03-01 22:14:43
|
On Fri, Mar 01, 2013 at 02:26:35PM -0700, Mark Campbell wrote: > I find myself rather surprised that this is a major issue in what is > otherwise a really good enterprise-level backup tool. Syncronizing > backups just seems to be a basic element to the idea of backups in a > corporate environment. Should the building that my backup server > resides in burns down, gets hit by a tornado, etc, there should be a > process whereby you can have a syncronized backup elsewhere. Also by > extension, what happens when you want to have a "cluster" of BackupPC? Handling this at the device/block level with zfs send/receive, DRBD etc. is another way to handle the sync. I had some luck running DRBD across a simulated laggy WAN (using WANEM to simulate the wan) with a subset of 50 or so hosts being backed up. The compress cycle after each backup did bog things down a bit though. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 |
From: Adam G. <mai...@we...> - 2013-03-01 23:49:53
|
On 01/03/13 08:10, Mark Campbell wrote: > > So I'm trying to get a BackupPC pool synced on a daily basis from a > 1TB MD RAID1 array to an external Fireproof drive (with plans to also > sync to a remote server at our collo). I found the script > BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples > I've seen online have indicated to me that this isn't quite what I'm > looking for, since it appears to output it to a different layout. I > initially tried the rsync method with -H, but my server would end up > choking at 350GB. Any suggestions on how to do this? > > The best option I've found if using an external drive of equal size to the pool is to use Linux md RAID1, and use the --write-mostly on the external drive. Make sure you enable bitmaps on the RAID1 array, and after you rotate drives, you may not need to resync the entire content. For offsite, you can use something like linux md raid1 over the top of NBD, ENBD (or whatever it is called) or DRBD, etc... However, this really depends on the speed of your remote connection, reliability, and will most likely degrade performance significantly. There have been plenty of discussions on this topic on the list over the years, try to find it, as there are lots of options which work for different people, and plenty of pros/cons for each method which has already been discussed. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au |
From: <bac...@ko...> - 2013-03-02 15:38:52
|
Mark Campbell wrote at about 14:26:35 -0700 on Friday, March 1, 2013: > Jeffrey, what is your latest version of your script? I have 0.1.3, circa Sept '11. Given how your script generally works, could it be made to simply recreate the pool structure on an external drive on the same system, rather than compressing it to a tarball? My end goal here is to be able to simply grab the external drive at a moment's notice, plug it into a new linux machine, and using a tarball of the BackupPC config files, and stand it up long enough to restore everyone's PCs & appropriate servers. Well there is no need for a tarball. You rsync or cp the pool (without paying any attention to hard links)... so this can even be done incrementally or over a ssh (or netcat) pipe. My program then (quickly) crawls the pool to create an in-memory inode hash of the pool (which can be saved to a file too for reuse). Then the program crawls all (or some) of the pc tree to create a flat file specifying the directories, zeros, and links. Then you run the same program in restore mode on the new machine to re-create the directory tree, zero files, and links. The only storage requires is for the links file -- which for me compressed took about 600MB to store 4M directories, 20M links and 300K zero length files. This plus the unpooled log files at the root of each machine (small!) plus the (incremental) rsync of the pool is all that needs to be transferred between machines to do a full BackupPC tree copy. |
From: <bac...@ko...> - 2013-03-02 15:38:59
|
gregrwm wrote at about 04:24:10 -0600 on Friday, March 1, 2013: > i'm using a simple procedure i cooked up to maintain a "third copy" at a > third physical location using as little bandwidth as possible. it simply > looks at each pc/*/backups, selects the most recent full and most recent > incremental (plus any partial or /new), and copies them across the wire, > together with the most recently copied full&incremental set (plus any > incompletely copied sets), using rsync, with it's hardlink copying > feature. thus my third location has a copy of the most recent (already > compressed) pc/ tree data, using rsync to avoid copying stuff over the wire > that's already there (and not bothering with the cpool), which, for me, is > a happily sized set of hardlinks that rsync can actually manage (ymmv). i > have successfully used this together with a script to recreate the cpool > if/when necessary. if it's of interest i could share it. One caution: If one is managing multiple pc's with redundant files across them (e.g., OS, apps), then you will waste a lot of bandwidth (and time) copying them since you will lose the pooling. Alternatively, if you use rsync with the -H flag, then you are back to the problem of rsync choking on hardlinks. |
From: Les M. <les...@gm...> - 2013-03-02 16:21:07
|
On Fri, Mar 1, 2013 at 3:32 PM, <bac...@ko...> wrote: > gregrwm wrote at about 04:24:10 -0600 on Friday, March 1, 2013: > > i'm using a simple procedure i cooked up to maintain a "third copy" at a > > third physical location using as little bandwidth as possible. it simply > > looks at each pc/*/backups, selects the most recent full and most recent > > incremental (plus any partial or /new), and copies them across the wire, > > together with the most recently copied full&incremental set (plus any > > incompletely copied sets), using rsync, with it's hardlink copying > > feature. thus my third location has a copy of the most recent (already > > compressed) pc/ tree data, using rsync to avoid copying stuff over the wire > > that's already there (and not bothering with the cpool), which, for me, is > > a happily sized set of hardlinks that rsync can actually manage (ymmv). i > > have successfully used this together with a script to recreate the cpool > > if/when necessary. if it's of interest i could share it. > > One caution: If one is managing multiple pc's with redundant files across them > (e.g., OS, apps), then you will waste a lot of bandwidth (and time) > copying them since you will lose the pooling. Alternatively, if you > use rsync with the -H flag, then you are back to the problem of rsync > choking on hardlinks. But, you'd spit the difference of these problems if you 'rsync -H' a single pc tree or just the recent runs at a time. Then the inode table to track the links would be smaller and less likely to cause trouble - and most of the hard links are to previous runs of the same file anyway. And even backuppc itself won't identify the duplicates before the transfer in each new location. -- Les Mikesell les...@gm... |
From: Lars T. Skjong-B. <li...@sn...> - 2013-03-02 19:28:23
|
Hi Mark, On 3/1/13 10:37 PM, Mark Campbell wrote: > Question, taking your model here, and applying it to my situation, > how well would this work: > > BackupPC server, with a RAID1 zpool, with the third member being my > external fireproof drive. Rather than the rotation you described, > just leave it as is as it does its daily routine. Then, should the > day come where I need to grab the drive and go, plugging the drive > into a system with ZFSonLinux & BackupPC installed, could I mount > this drive by itself? Yes, this is no different, really. It would work very well. Just keep it in sync, and all should be fine. You can just pull out any drive at will, without causing any filesystem corruption. The fireproof drive can be inserted in a different computer with ZFS support and you can run "zpool import" and then you can mount the filesystem. You shouldn't use USB for your external drive, though. E-SATA, Firewire or Thunnderbolt is fine. > I really like your idea of zfs send/receive for the remote copy. Do > you have any tips/pointers/docs on the best way to run it in this > scenario? I don't mean to say RTFM, but the top results of a Google search are as good a starting point as any: https://www.google.com/search?q=zfs+send+receive+backup I think this article is quite good: http://cuddletech.com/blog/pivot/entry.php?id=984 If you have any further questions, don't hesitate to ask. :) -- Best regards, Lars Tobias |
From: Mark C. <mca...@em...> - 2013-03-04 15:59:19
|
Thanks Lars, I think that this is going to be the way I'm going to go. I'm going to migrate the existing pool from its current location on a 1TB linux MD RAID 1 to a newly created 2TB ZFS RAID 1 using 3x drives (the third being the fireproof external). I do believe that this is where BackupPC_copyPcPool.pl will come in handy, am I correct Jeffrey? When we're ready to put in place the offsite backup, could I temporarily sync a 4th drive to the ZFS RAID array so that I can then transport the drive to our collo, and import it there? Also, would I be correct in assuming that the ZFS resilvering process is like other RAID systems, in that I wouldn't have to shut down BackupPC during its resilvering process (that it would just update changes as it went along automatically)? Thanks, --Mark -----Original Message----- From: Lars Tobias Skjong-Børsting [mailto:li...@sn...] Sent: Saturday, March 02, 2013 2:28 PM To: bac...@li... Subject: Re: [BackupPC-users] BackupPC Pool synchronization? Hi Mark, On 3/1/13 10:37 PM, Mark Campbell wrote: > Question, taking your model here, and applying it to my situation, how > well would this work: > > BackupPC server, with a RAID1 zpool, with the third member being my > external fireproof drive. Rather than the rotation you described, > just leave it as is as it does its daily routine. Then, should the > day come where I need to grab the drive and go, plugging the drive > into a system with ZFSonLinux & BackupPC installed, could I mount this > drive by itself? Yes, this is no different, really. It would work very well. Just keep it in sync, and all should be fine. You can just pull out any drive at will, without causing any filesystem corruption. The fireproof drive can be inserted in a different computer with ZFS support and you can run "zpool import" and then you can mount the filesystem. You shouldn't use USB for your external drive, though. E-SATA, Firewire or Thunnderbolt is fine. > I really like your idea of zfs send/receive for the remote copy. Do > you have any tips/pointers/docs on the best way to run it in this > scenario? I don't mean to say RTFM, but the top results of a Google search are as good a starting point as any: https://www.google.com/search?q=zfs+send+receive+backup I think this article is quite good: http://cuddletech.com/blog/pivot/entry.php?id=984 If you have any further questions, don't hesitate to ask. :) -- Best regards, Lars Tobias ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ BackupPC-users mailing list Bac...@li... List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/ |
From: Holger P. <wb...@pa...> - 2013-03-07 14:16:00
|
Hi, Les Mikesell wrote on 2013-03-06 13:42:17 -0600 [Re: [BackupPC-users] BackupPC Pool synchronization?]: > On Wed, Mar 6, 2013 at 12:16 PM, Mark Campbell > <mca...@em...> wrote: > > Interesting. Well then I guess the answer is to not muck with pooling (as redundant as it is, at least it theoretically shouldn't hurt anything), disable compression, and enable dedup & compression on ZFS. > > Yes, I'd do that and try out the mirroring and send/receive features. > If you are sure everything else is good you can probably find the part > in the code that makes the links and remove it. It's a bit more than one "part in the code". *New pool entries* are created by BackupPC_link, which would then be essentially unnecessary. That part is simple enough to turn off. But there's really a rather complex strategy to link to *existing pool entries*. In fact, without pooling there is not much point in using the Perl rsync implementation, for instance (well, maybe the attrib files, but then again, maybe we could get rid of them as well, if we don't use pooling). It really sounds like a major redesign of BackupPC if you want to gain all the benefits you can. Sort of like halfway to 4.0 :). Basically, you end up with just the BackupPC scheduler, rsync (or tar or just about anything you can put into a command line) for transport, and ZFS for storage. Personally, I'd probably get rid of the attrib files (leaving plain file system snapshots easily accessible with all known tools and subject to kernel permission checking) and the whole web interface ;-). Most others will want to be able to browse backups through the web interface, which probably entails keeping attrib files (and having all files be owned by the backuppc user, just like the current situation). Then again, 'fakeroot' emulates root-type file system semantics through a preloaded library. Maybe this idea could be adapted for BackupPC to use stock tools for transport and get attrib files (and backuppc file ownership) just the same. ZFS is an interesting topic these days. It's probably best to gain some BackupPC community experience with ZFS first, before contemplating changing BackupPC to take the most advantage. Even with BackupPC pooling in place, significant gains seem possible. Regards, Holger |
From: Les M. <les...@gm...> - 2013-03-07 16:04:08
|
On Thu, Mar 7, 2013 at 8:15 AM, Holger Parplies <wb...@pa...> wrote: > > It's a bit more than one "part in the code". *New pool entries* are created > by BackupPC_link, which would then be essentially unnecessary. That part is > simple enough to turn off. But there's really a rather complex strategy to > link to *existing pool entries*. In fact, without pooling there is not much > point in using the Perl rsync implementation, for instance (well, maybe the > attrib files, but then again, maybe we could get rid of them as well, if we > don't use pooling). The perl rsync understands the local compression - which may also be better handled by the file system. Clearly the snapshots of a growing logfile could be stored more efficiently with a block level scheme - but backuppc's checksum caching might be a win for non-changing files in terms of processing efficiency. > It really sounds like a major redesign of BackupPC if you > want to gain all the benefits you can. Sort of like halfway to 4.0 :). > Basically, you end up with just the BackupPC scheduler, rsync (or tar or just > about anything you can put into a command line) for transport, and ZFS for > storage. Personally, I'd probably get rid of the attrib files (leaving plain > file system snapshots easily accessible with all known tools and subject to > kernel permission checking) and the whole web interface ;-). If anyone is designing for the future, I think it makes sense to split out all of the dedup and compression operations, since odds are good that future filesystems will handle this well and your backup system won't be a special case. Keeping 'real' filesystem attributes is more of a problem, since the system hosting the backups may not have the same user base as the targets, the filesystem may not be capable of holding the same attributes, and even if those were not a prioblem it would mean the backup system would have to run as root to have full access. > Most others will > want to be able to browse backups through the web interface, which probably > entails keeping attrib files (and having all files be owned by the backuppc > user, just like the current situation). Then again, 'fakeroot' emulates > root-type file system semantics through a preloaded library. That's interesting - it would be nice to have a user-level abstraction where a non-admin web user could access things with approximately the permissions he would have on the source host. > Maybe this idea > could be adapted for BackupPC to use stock tools for transport and get attrib > files (and backuppc file ownership) just the same. > > ZFS is an interesting topic these days. It's probably best to gain some > BackupPC community experience with ZFS first, before contemplating changing > BackupPC to take the most advantage. Even with BackupPC pooling in place, > significant gains seem possible. Hmmm, maybe something even more extreme for the future would be to work out a way to have snapshots of virtual-machine images updated with block-level file pooling. Then, assuming appropriate network connectivity, you'd have the option of firing up the VM as an instant replacement instead of rebuilding/restoring a failed host. -- Les Mikesell les...@gm... |
From: Mark C. <mca...@em...> - 2013-03-07 14:34:52
|
Holgar, My thinking at this point is that I'll leave the pooling be--it may require some extra CPU cycles & RAM from time to time, but my understanding of the zfs dedup & compress features are that they should be transparent to BackupPC, so while pooling in BackupPC won't avail much, it probably wouldn't hurt anything either. Thanks, --Mark -----Original Message----- From: Holger Parplies [mailto:wb...@pa...] Sent: Thursday, March 07, 2013 9:16 AM To: General list for user discussion, questions and support Subject: Re: [BackupPC-users] BackupPC Pool synchronization? Hi, Les Mikesell wrote on 2013-03-06 13:42:17 -0600 [Re: [BackupPC-users] BackupPC Pool synchronization?]: > On Wed, Mar 6, 2013 at 12:16 PM, Mark Campbell > <mca...@em...> wrote: > > Interesting. Well then I guess the answer is to not muck with pooling (as redundant as it is, at least it theoretically shouldn't hurt anything), disable compression, and enable dedup & compression on ZFS. > > Yes, I'd do that and try out the mirroring and send/receive features. > If you are sure everything else is good you can probably find the part > in the code that makes the links and remove it. It's a bit more than one "part in the code". *New pool entries* are created by BackupPC_link, which would then be essentially unnecessary. That part is simple enough to turn off. But there's really a rather complex strategy to link to *existing pool entries*. In fact, without pooling there is not much point in using the Perl rsync implementation, for instance (well, maybe the attrib files, but then again, maybe we could get rid of them as well, if we don't use pooling). It really sounds like a major redesign of BackupPC if you want to gain all the benefits you can. Sort of like halfway to 4.0 :). Basically, you end up with just the BackupPC scheduler, rsync (or tar or just about anything you can put into a command line) for transport, and ZFS for storage. Personally, I'd probably get rid of the attrib files (leaving plain file system snapshots easily accessible with all known tools and subject to kernel permission checking) and the whole web interface ;-). Most others will want to be able to browse backups through the web interface, which probably entails keeping attrib files (and having all files be owned by the backuppc user, just like the current situation). Then again, 'fakeroot' emulates root-type file system semantics through a preloaded library. Maybe this idea could be adapted for BackupPC to use stock tools for transport and get attrib files (and backuppc file ownership) just the same. ZFS is an interesting topic these days. It's probably best to gain some BackupPC community experience with ZFS first, before contemplating changing BackupPC to take the most advantage. Even with BackupPC pooling in place, significant gains seem possible. Regards, Holger ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ BackupPC-users mailing list Bac...@li... List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/ |
From: Tyler J. W. <ty...@to...> - 2013-03-07 15:14:25
|
On 2013-03-07 14:34, Mark Campbell wrote: > My thinking at this point is that I'll leave the pooling be--it may > require some extra CPU cycles & RAM from time to time, but my > understanding of the zfs dedup & compress features are that they should > be transparent to BackupPC, so while pooling in BackupPC won't avail > much, it probably wouldn't hurt anything either. Except that it's the pooling (hardlinking) that makes pool synchronization suck so badly. Although perhaps ZFS mirror might make that better, I'd much rather disable pooling entirely (disable the linking process), and then just use rsync to sync the backuppc/pc tree between primary and secondary hosts. Regards, Tyler -- "... I've never seen the Icarus story as a lesson about the limitations of humans. I see it as a lesson about the limitations of wax as an adhesive." -- Randall Munroe, "XKCD What IF?: Interplanetary Cessna" |
From: Mark C. <mca...@em...> - 2013-03-07 16:33:33
|
I do agree that it would be better ideally to disable BackupPC's pooling mechanism in the case of ZFS, but it sounds as though we don't really have that capability (at least not without some serious hacking). Maybe when the ethereal 4.0 arrives, it'll be a different story. ;) As I've come to understand the ZFS syncing abilities, it sounds like an analogous way to describe it is "rsync meets dd", so the aspects of the filesystem that hang up rsync become irrelevant to zfs send/receive. Thanks, --Mark -----Original Message----- From: Tyler J. Wagner [mailto:ty...@to...] Sent: Thursday, March 07, 2013 10:14 AM To: General list for user discussion, questions and support Cc: Mark Campbell Subject: Re: [BackupPC-users] BackupPC Pool synchronization? On 2013-03-07 14:34, Mark Campbell wrote: > My thinking at this point is that I'll leave the pooling be--it may > require some extra CPU cycles & RAM from time to time, but my > understanding of the zfs dedup & compress features are that they > should be transparent to BackupPC, so while pooling in BackupPC won't > avail much, it probably wouldn't hurt anything either. Except that it's the pooling (hardlinking) that makes pool synchronization suck so badly. Although perhaps ZFS mirror might make that better, I'd much rather disable pooling entirely (disable the linking process), and then just use rsync to sync the backuppc/pc tree between primary and secondary hosts. Regards, Tyler -- "... I've never seen the Icarus story as a lesson about the limitations of humans. I see it as a lesson about the limitations of wax as an adhesive." -- Randall Munroe, "XKCD What IF?: Interplanetary Cessna" |
From: Mark C. <mca...@em...> - 2013-08-06 12:42:12
|
I thought I would give an update to this old thread. To those unfamiliar with this thread, basically, I was looking for a way to do backuppc data synchronization, and we brainstormed a way using the ZFS file system (as it has an rsync-like ability (sending only changes), but at the filesystem/block level (like dd), rather than the file level of rsync, so the problems that plague trying to rsync a BackupPC pool shouldn't affect ZFS send/receive). Then that topic expanded into using ZFS with deduplication (which is block based, to compliment the dedup of BackupPC's pools, which is file based) to further cut down on disk usage. It is on that subject that I wanted to report my findings. A spare Supermicro server was recently acquired that had an Opteron quad core, 16GB of RAM, and a RAID-5 array with 4 250GB drives, giving it a 750GB array (25% less than my production system, but good for an experimental setup). So I decided to try and use it as an experimental ZFS/BackupPC box. I started by loading CentOS 6.4 on it, installed ZFSOnLinux (I know, experimental, but that's exactly what this is), and installed BackupPC. I created a single disk ZFS pool from a partition of the array in the same location relative to / as it is on my production BackupPC box & enabled both dedup & compression. I then copied over /etc/BackupPC from production to my test box, and modified the server config to not do any compression ($Conf{CompressLevel}=0)--not doing this completely negated dedup's abilities. Once I created the basic BackupPC pool structure on the ZFS pool, I started the BackupPC service, and let her go for several days (but monitored her), accumulating the same backups that my production box does. It should be noted that this box needed ALL 16GB of RAM for the dedup feature, but it never crashed, or even used swap significantly, and performance remained reasonable. Over the course of 7 days, this box has been extremely successful in its dedup & compression features. At 25% smaller total disk space, I'm currently at 2.17x Dedup rate (really good!), and it is storing nearly as many backups as production is, with more space free! I have not yet had a second box to play with to do ZFS transfers, but when I do, I will report on that too. Thanks, --Mark -----Original Message----- From: Tyler J. Wagner [mailto:ty...@to...] Sent: Thursday, March 07, 2013 10:14 AM To: General list for user discussion, questions and support Cc: Mark Campbell Subject: Re: [BackupPC-users] BackupPC Pool synchronization? On 2013-03-07 14:34, Mark Campbell wrote: > My thinking at this point is that I'll leave the pooling be--it may > require some extra CPU cycles & RAM from time to time, but my > understanding of the zfs dedup & compress features are that they > should be transparent to BackupPC, so while pooling in BackupPC won't > avail much, it probably wouldn't hurt anything either. Except that it's the pooling (hardlinking) that makes pool synchronization suck so badly. Although perhaps ZFS mirror might make that better, I'd much rather disable pooling entirely (disable the linking process), and then just use rsync to sync the backuppc/pc tree between primary and secondary hosts. Regards, Tyler -- "... I've never seen the Icarus story as a lesson about the limitations of humans. I see it as a lesson about the limitations of wax as an adhesive." -- Randall Munroe, "XKCD What IF?: Interplanetary Cessna" |
From: Trey D. <tre...@gm...> - 2013-03-03 07:02:54
|
On Mar 1, 2013 3:42 PM, "Mark Campbell" <mca...@em...> wrote: > > Lars, > > Thanks for the interesting idea! I confess I haven't played with ZFS much (though I've been wanting to for some time), maybe this is the excuse I need ;). Question, taking your model here, and applying it to my situation, how well would this work: > > BackupPC server, with a RAID1 zpool, with the third member being my external fireproof drive. Rather than the rotation you described, just leave it as is as it does its daily routine. Then, should the day come where I need to grab the drive and go, plugging the drive into a system with ZFSonLinux & BackupPC installed, could I mount this drive by itself? > > I really like your idea of zfs send/receive for the remote copy. Do you have any tips/pointers/docs on the best way to run it in this scenario? > > Thanks, > > --Mark > > > -----Original Message----- > From: Lars Tobias Skjong-Børsting [mailto:li...@sn...] > Sent: Friday, March 01, 2013 4:18 AM > To: bac...@li... > Subject: Re: [BackupPC-users] BackupPC Pool synchronization? > > Hi, > > On 3/1/13 12:34 AM, Les Mikesell wrote: > > On Thu, Feb 28, 2013 at 3:10 PM, Mark Campbell < mca...@em...> wrote: > > > >> So I'm trying to get a BackupPC pool synced on a daily basis from a > >> 1TB MD > >> RAID1 array to an external Fireproof drive (with plans to also sync > >> to a remote server at our collo). > > > > I'm not sure anyone has come up with a really good way to do this. > > One approach is to use a 3-member raid1 where you periodically remove > > a drive and resync a new one. If you have reasonable remote > > bandwidth and enough of a backup window, it is much easier to just run > > another instance of backuppc hitting the same targets independently. > > I have come up with a IMHO good way to do this using ZFS (ZFSonLinux). > > Description: > * uses 3 disks. > * at all times, keep 1 mirrored disk in a fire safe. > * periodically swap the safe disk with mirror in server. > > 1. create a zpool with three mirrored members. > 2. create a filesystem on it and mount at /var/lib/backuppc. > 3. do some backups. > 4. detach one disk and put in safe. > 5. do more backups. > 6. detach one disk and swap with the other disk in the safe. > 7. attach and online the disk from the safe. > 8. watch it sync up. > > I am currently using 2TB disks, and swap period of 1 month. Because of ZFS it doesn't need to sync all the blocks, but only the changed blocks since 1 month ago. For example, with 10GB changed it will sync in less than 25 minutes (approx. 7 MB/s speed). That's a lot faster than anything I got with mdraid which syncs every block. > > ZFS also comes with benefits of checksumming and error correction of file content and file metadata. BackupPC also supports error correction through par2, and this gives an extra layer of data protection. > > Backing up large numbers of files can take a very long time because of harddisk seeking. This can be alleviated by using a SSD cache drive for ZFS. This support for read (ZFS L2ARC) and write (ZFS ZIL) caching on a small SSD (30 GB) cuts incremental time down to half for some shares. > > As for remote sync, you can use "zfs send" on the backup server and "zfs receive" on the offsite server. This will only send the differences since last sync (like rsync), and will be probably be significantly faster than rsync that in addition has to resolve all the hardlinks. > > -- > Best regards, > Lars Tobias > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > BackupPC-users mailing list > Bac...@li... > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/ > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > BackupPC-users mailing list > Bac...@li... > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/ +1 for ZFS as a means to replicate the pool without lots of rsyncing. However the checksumming in ZFS only takes place on RAIDZ sets. ZFS mirroring (RAID 1) does not do checksum verification. You would have to use RAIDZ1 (RAID 5) , RAIDZ2 (RAID 6) or RAIDZ3 (triple parity) to benefit from checksum verification. |