Thread: Having rsync issues when modified part of file isnt at end
Brought to you by:
thesun
From: Guillaume F. <gui...@fr...> - 2015-02-03 08:53:12
|
Hello everyone! I am having an issue with the backup of a few files here, taking more space than need on my ZFS dataset. After some digging, i found the issue is primarly caused by both gzip and rsyncrypto. Here, i will only discuss of the rsyncrypto part making rsync to fail at backup efficiently : Suppose you make 2 files of 450MB, with only 50MB that changed, in the middle of the file (no deleted or added data, not even moved). To create a test case, here is what i made : dd if=/dev/urandom of=begin.iso bs=1M count=100 dd if=/dev/urandom of=end.iso bs=1M count=300 dd if=/dev/urandom of=middle1.iso bs=1M count=50 dd if=/dev/urandom of=middle2.iso bs=1M count=50 Lets build our two files : cat begin.iso middle1.iso end.iso >file1.iso cat begin.iso middle2.iso end.iso >file2.iso So we end up with two files of identical size, but 50MB diff somewhere inside : -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file1.iso -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file2.iso I now encrypt them with rsyncrypto : rsyncrypto --gzip=nullgzip file1.iso{,.enc} backup.{keys,crt} rsyncrypto --gzip=nullgzip file2.iso{,.enc} backup.{keys,crt} The first noticeable thing i see is that they dont do the same size once encrypted : -rw-r--r-- 1 root root 472063316 2 févr. 14:55 file1.iso.enc -rw-r--r-- 1 root root 472062484 2 févr. 14:55 file2.iso.enc Now if i copy the original files using rsync, I get interesting i/o work : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file1.iso test/file.iso sending incremental file list > f+++++++++ file1.iso 471,859,200 100% 208.71MB/s 0:00:02 (xfr#1, to-chk=0/1) sent 471,974,500 bytes received 35 bytes 188,789,814.00 bytes/sec total size is 471,859,200 speedup is 1.00 [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file2.iso test/file.iso sending incremental file list > f..t...... file2.iso 471,859,200 100% 135.90MB/s 0:00:03 (xfr#1, to-chk=0/1) sent 52,543,948 bytes received 152,118 bytes 8,107,087.08 bytes/sec total size is 471,859,200 speedup is 8.95 [kuri:~/tmp/random] $ Now i copy the encrypted files : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file1.iso.enc test/file.iso.enc sending incremental file list > f+++++++++ file1.iso.enc 472,063,316 100% 180.86MB/s 0:00:02 (xfr#1, to-chk=0/1) sent 472,178,659 bytes received 35 bytes 134,908,198.29 bytes/sec total size is 472,063,316 speedup is 1.00 [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file2.iso.enc test/file.iso.enc sending incremental file list > f.st...... file2.iso.enc 472,062,484 100% 111.87MB/s 0:00:04 (xfr#1, to-chk=0/1) sent 52,608,319 bytes received 152,188 bytes 9,592,819.45 bytes/sec total size is 472,062,484 speedup is 8.95 So, it worked perfectly on this test, but sometimes, it fails to do proper diff, so lets make another test file : dd if=/dev/urandom of=middle3.iso bs=1M count=50 cat begin.iso middle3.iso end.iso >file3.iso rsyncrypto --gzip=nullgzip file3.iso{,.enc} backup.{keys,crt} Lets look at the files : -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file1.iso -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file2.iso -rw-r--r-- 1 kuri users 471859200 3 févr. 09:07 file3.iso -rw-r--r-- 1 root root 472063316 2 févr. 14:55 file1.iso.enc -rw-r--r-- 1 root root 472062484 2 févr. 14:55 file2.iso.enc -rw-r--r-- 1 root root 472062932 3 févr. 09:07 file3.iso.enc Lets rsync the third file : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file3.iso.enc test/file.iso.enc sending incremental file list > f.st...... file3.iso.enc 472,062,932 100% 53.22MB/s 0:00:08 (xfr#1, to-chk=0/1) sent 367,307,827 bytes received 152,188 bytes 34,996,191.90 bytes/sec total size is 472,062,932 speedup is 1.28 So, it copied 350MB of a 450MB file that only had 50MB changed. Lets see with the unencrypted files : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file3.iso test/file.iso sending incremental file list > f..t...... file3.iso 471,859,200 100% 135.29MB/s 0:00:03 (xfr#1, to-chk=0/1) sent 52,543,947 bytes received 152,118 bytes 9,581,102.73 bytes/sec total size is 471,859,200 speedup is 8.95 So it is working properlly if files are not encrypted. Is it possible that due to having different filesize, rsync algorithm fails ? Do you have any hints ? The only thing i can see is that between file1.iso.enc and file2.iso.enc, the filesize dropped a little, and between file2.iso.enc and file3.iso.enc it is higher, but i have no idea if this can be related... Checking at the data of each encrypted file i can see that the last 300MB are exactly the same : [kuri:~/tmp/random] $ tail -c 314572800 file1.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - [kuri:~/tmp/random] $ tail -c 314572800 file2.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - [kuri:~/tmp/random] $ tail -c 314572800 file3.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - But the first 100MB are not : [kuri:~/tmp/random] $ head -c 104857600 file1.iso.enc | sha1sum d86fa953b25e1a01a53409f567cc845535525dc1 - [kuri:~/tmp/random] $ head -c 104857600 file2.iso.enc | sha1sum 0c10309cf8fe0bb349b05081c782469e4c2fb0e2 - [kuri:~/tmp/random] $ head -c 104857600 file3.iso.enc | sha1sum 338ba6c1a58dde8c334092986e5ce20e3b8114df - Any help would be greatly appreciated, i would like to backup even bigger files (some GBs), where over 90% of the file gets transferred if encrypted with rsyncrypto while only 2-4MB would be transferred otherwise. |
From: Shachar S. <sh...@sh...> - 2015-02-03 20:10:08
|
rsyncrypto compresses as part of the encryption. You obviously did not notice this, as you were using /dev/random as your source, and hence producing uncompressible files. This is also the reason (at least part of it) that the encrypted files were not the same size. On 03/02/15 10:34, Guillaume Friloux wrote: > The only thing i can see is that between file1.iso.enc and > file2.iso.enc, > the filesize dropped a little, and between file2.iso.enc and > file3.iso.enc it is higher, > but i have no idea if this can be related... You are using rsync with --inplace. In that mode, rsync cannot reuse blocks that were already overwritten by the destination file. When the new file is bigger than the old one, you are overwriting the data you would reuse while transferring, severely limiting rsync's ability to optimize your transfer. If you remove --inplace, you will see that rsync has no problem optimizing your encrypted files, no matter the size changes. You have not asked your gzip question, but I am guessing it is either the same issue there, or you forgot to pass it the --rsyncable flag. Shachar |
From: Guillaume F. <gui...@fr...> - 2015-02-03 20:55:56
|
Hello, thanks for answering. I have to use --inplace to limit writes on the ZFS dataset, otherwise each snapshot will use the total file size instead of only the diff. In my env, i dont do local copies, i do send over SSH on a BSD host. I will redo all the tests without --inplace to see if it does better for rsync (but wont be a real solution for my ZFS vol). I am using /dev/urandom only because i wanted a simple test case, but the problem occurs with real files, like outlook PST files, PPT files and so on. I intentionally not used gzip because gzip itself also produce some problems here with the files, and i do use --rsyncable, or tell rsyncrypto to use gzip (encounter the problem with both methods). Youre saying rsyncrypto uses gzip, but i did give --gzip=nullgzip that is a bash script calling cat, and so, not compression should be done, or is rsyncrypto adding compression over what gzip did ? Le 2015/02/03 20:54, Shachar Shemesh a écrit : > rsyncrypto compresses as part of the encryption. You obviously did not > notice this, as you were using /dev/random as your source, and hence > producing uncompressible files. This is also the reason (at least part > of it) that the encrypted files were not the same size. > > On 03/02/15 10:34, Guillaume Friloux wrote: > >> The only thing i can see is that between file1.iso.enc and >> file2.iso.enc, >> the filesize dropped a little, and between file2.iso.enc and >> file3.iso.enc it is higher, >> but i have no idea if this can be related... > You are using rsync with --inplace. In that mode, rsync cannot reuse > blocks that were already overwritten by the destination file. When the > new file is bigger than the old one, you are overwriting the data you > would reuse while transferring, severely limiting rsync's ability to > optimize your transfer. If you remove --inplace, you will see that > rsync has no problem optimizing your encrypted files, no matter the > size changes. > > You have not asked your gzip question, but I am guessing it is either > the same issue there, or you forgot to pass it the --rsyncable flag. > > Shachar > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is > your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. > Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > > _______________________________________________ > Rsyncrypto-devel mailing list > Rsy...@li... > https://lists.sourceforge.net/lists/listinfo/rsyncrypto-devel |
From: Guillaume F. <gui...@fr...> - 2015-02-04 09:59:31
|
Ok, did the tests without --inplace, it saves the BW, but completly nullify the benefits of ZFS snapshots. This is disturbing. Le 2015/02/03 21:54, Guillaume Friloux a écrit : > Hello, thanks for answering. > > I have to use --inplace to limit writes on the ZFS dataset, otherwise > each snapshot > will use the total file size instead of only the diff. > In my env, i dont do local copies, i do send over SSH on a BSD host. > > I will redo all the tests without --inplace to see if it does better > for > rsync > (but wont be a real solution for my ZFS vol). > > I am using /dev/urandom only because i wanted a simple test case, but > the problem > occurs with real files, like outlook PST files, PPT files and so on. > > I intentionally not used gzip because gzip itself also produce some > problems > here with the files, and i do use --rsyncable, or tell rsyncrypto to > use > gzip (encounter the problem with both methods). > > Youre saying rsyncrypto uses gzip, but i did give --gzip=nullgzip that > is > a bash script calling cat, and so, not compression should be done, or > is > rsyncrypto adding compression over what gzip did ? > > > Le 2015/02/03 20:54, Shachar Shemesh a écrit : >> rsyncrypto compresses as part of the encryption. You obviously did not >> notice this, as you were using /dev/random as your source, and hence >> producing uncompressible files. This is also the reason (at least part >> of it) that the encrypted files were not the same size. >> >> On 03/02/15 10:34, Guillaume Friloux wrote: >> >>> The only thing i can see is that between file1.iso.enc and >>> file2.iso.enc, >>> the filesize dropped a little, and between file2.iso.enc and >>> file3.iso.enc it is higher, >>> but i have no idea if this can be related... >> You are using rsync with --inplace. In that mode, rsync cannot reuse >> blocks that were already overwritten by the destination file. When the >> new file is bigger than the old one, you are overwriting the data you >> would reuse while transferring, severely limiting rsync's ability to >> optimize your transfer. If you remove --inplace, you will see that >> rsync has no problem optimizing your encrypted files, no matter the >> size changes. >> >> You have not asked your gzip question, but I am guessing it is either >> the same issue there, or you forgot to pass it the --rsyncable flag. >> >> Shachar |
From: Shachar S. <sh...@sh...> - 2015-02-04 19:05:02
|
On 04/02/15 11:58, Guillaume Friloux wrote: > Ok, did the tests without --inplace, it saves the BW, but completly > nullify the benefits of ZFS snapshots. I will point out that this discussion has transgressed beyond the realm of rsyncrypto, and into the rsync turf. However: Using rsync, you can compare with files in one directory, but write the results to another directory. If you do that, you can create a secondary director of just the files that changed. Use "cat" to copy them back to the original location, and hopefully that would salvage your zfs usage. As I said, however, this is more rsync than rsyncrypto related. As a side note, please don't disable gzip in rsyncrypto. that feature was meant solely for the tests. Certain entropy assumption behind the cryptanalysis of rsyncrypto do not hold when the entropy of the file is low. In other words, when rsyncrypto does not compress, it is less secure as an encryption. Shachar |