Re: [BackupPC-users] Correct rsync parameters for doing incremental transfers of large image-files

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Les,

I allready thought about that and I agree that the handling of large image files is problematic in general. I need to make images for the windows-based virtual machines to get them back running when a disaster happens. If I go away from backuppc for transfering these images, I don't see any benefits (maybe because I just don't know of a image solution that solves my problems better).
As I already use backuppc to do backups of the data partitions (all linux based) I don't want my backups to become more complex than necessary.
I can live with the amount of harddisk space the compressed images will consume and the IO while merging the files is acceptable for me, too.
I can tell the imaging software (partimage) to cut the image into 2 GB volumes, but I doubt that this enables effective pooling, since the system volume I make the image from has temporary files, profiles, databases and so on stored. If every image file has changes (even if there are only a few megs altered), I expect the rsync algorithm to be less effective than comparing large files where it is more likely to have a "unchanged" long part which is not interrupted by artificial file size boundaries resulting from the 2 GB volume splitting.

I hope I made my situation clear.
If anyone has experiences in large image file handling which I may benefit from, please let be know!

Thank you very much,

Andreas Piening

Am 12.05.2012 um 06:04 schrieb Les Mikesell:

> On Fri, May 11, 2012 at 4:01 PM, Andreas Piening
> <and...@gm...> wrote:
>> Hello Backuppc-users,
>> 
>> I stuck while trying to identify the suitable rsync parameters to handle large image file backups with backuppc.
>> 
>> Following scenario: I use partimage to do LVM-snapshot based full images of my virtual (windows-) machines (KVM) blockdevices. I want to save theses images from the virtualization server to my backup machine running backuppc. The images are between 40 and 60 Gigs uncompressed each. The time-window for the backup needs to stay outside the working hours and is not large enough to transfer the images over the line every night. I red about rsync's capability to only transfer the changed parts in the file by a clever checksum-algorithm to minimize the network traffic. That's what I want.
>> 
>> I tested it by creating a initial backup of one image, created a new one with only a few megs of changed data and triggered a new backup process. But I noticed that the whole file was re-transfered. I waited till the end to get sure about that and decided that it was not the ultimate idea to check this with a compressed 18 GB image file but this was my real woking data image and I expected it to work like expected. Searching for reasons for the complete re-transmission I ended in a discussion-thread where they talked about rsync backups of compressed large files. The explanation made sense to me: The compression algorithm can cause a complete different archive file even if just some megs of data at the beginning of the file hast been altered, because of recursion and back-references.
>> So I decided to store my image uncompressed which is about 46 Gigs now. I found out that I need to add the "-C" parameter to rsync, since data compression is not enabled per default. Anyway: the whole file was re-created in the second backup run instead of just transfering the changed parts, again.
>> 
>> My backuppc-option "RsyncClientCmd" is set to "$sshPath -C -q -x -l root $host $rsyncPath $argList+" which is backup-pcs default disregarding the "-C".
>> 
>> Honestly, I don't understand the exact reason for this. There are some possibilities that may be guilty:
>> 
>> -> partimage does not create a linear backup image file, even if it is uncompressed
>> -> there is just another parameter for rsync I missed which enables differential file-changes-transfers
>> -> rsync exames the file but decides to not use differential updates for this one because of it's size or just because it's created-timestamp is not the same as the prior one
>> 
>> Please give me a hint if you've successfully made differential backups of large image files.
> 
> I'm not sure there is a good way to handle very large files in
> backuppc.  Even if rysnc identifies and transfers only the changes,
> the server is going to copy and merge the unchanged parts from the
> previous file which may take just as long anyway, and it will not be
> able to pool the copies.    Maybe you could split the target into many
> small files before the backup.  Then any chunk that is unchanged
> between runs would be skipped quickly and the contents could be
> pooled.
> 
> -- 
>  Les Mikesell
>    les...@gm...