From: Andreas P. <and...@gm...> - 2012-05-12 10:57:15
|
Hi Les, I allready thought about that and I agree that the handling of large image files is problematic in general. I need to make images for the windows-based virtual machines to get them back running when a disaster happens. If I go away from backuppc for transfering these images, I don't see any benefits (maybe because I just don't know of a image solution that solves my problems better). As I already use backuppc to do backups of the data partitions (all linux based) I don't want my backups to become more complex than necessary. I can live with the amount of harddisk space the compressed images will consume and the IO while merging the files is acceptable for me, too. I can tell the imaging software (partimage) to cut the image into 2 GB volumes, but I doubt that this enables effective pooling, since the system volume I make the image from has temporary files, profiles, databases and so on stored. If every image file has changes (even if there are only a few megs altered), I expect the rsync algorithm to be less effective than comparing large files where it is more likely to have a "unchanged" long part which is not interrupted by artificial file size boundaries resulting from the 2 GB volume splitting. I hope I made my situation clear. If anyone has experiences in large image file handling which I may benefit from, please let be know! Thank you very much, Andreas Piening Am 12.05.2012 um 06:04 schrieb Les Mikesell: > On Fri, May 11, 2012 at 4:01 PM, Andreas Piening > <and...@gm...> wrote: >> Hello Backuppc-users, >> >> I stuck while trying to identify the suitable rsync parameters to handle large image file backups with backuppc. >> >> Following scenario: I use partimage to do LVM-snapshot based full images of my virtual (windows-) machines (KVM) blockdevices. I want to save theses images from the virtualization server to my backup machine running backuppc. The images are between 40 and 60 Gigs uncompressed each. The time-window for the backup needs to stay outside the working hours and is not large enough to transfer the images over the line every night. I red about rsync's capability to only transfer the changed parts in the file by a clever checksum-algorithm to minimize the network traffic. That's what I want. >> >> I tested it by creating a initial backup of one image, created a new one with only a few megs of changed data and triggered a new backup process. But I noticed that the whole file was re-transfered. I waited till the end to get sure about that and decided that it was not the ultimate idea to check this with a compressed 18 GB image file but this was my real woking data image and I expected it to work like expected. Searching for reasons for the complete re-transmission I ended in a discussion-thread where they talked about rsync backups of compressed large files. The explanation made sense to me: The compression algorithm can cause a complete different archive file even if just some megs of data at the beginning of the file hast been altered, because of recursion and back-references. >> So I decided to store my image uncompressed which is about 46 Gigs now. I found out that I need to add the "-C" parameter to rsync, since data compression is not enabled per default. Anyway: the whole file was re-created in the second backup run instead of just transfering the changed parts, again. >> >> My backuppc-option "RsyncClientCmd" is set to "$sshPath -C -q -x -l root $host $rsyncPath $argList+" which is backup-pcs default disregarding the "-C". >> >> Honestly, I don't understand the exact reason for this. There are some possibilities that may be guilty: >> >> -> partimage does not create a linear backup image file, even if it is uncompressed >> -> there is just another parameter for rsync I missed which enables differential file-changes-transfers >> -> rsync exames the file but decides to not use differential updates for this one because of it's size or just because it's created-timestamp is not the same as the prior one >> >> Please give me a hint if you've successfully made differential backups of large image files. > > I'm not sure there is a good way to handle very large files in > backuppc. Even if rysnc identifies and transfers only the changes, > the server is going to copy and merge the unchanged parts from the > previous file which may take just as long anyway, and it will not be > able to pool the copies. Maybe you could split the target into many > small files before the backup. Then any chunk that is unchanged > between runs would be skipped quickly and the contents could be > pooled. > > -- > Les Mikesell > les...@gm... |