Having rsync issues when modified part of file isnt at end
Brought to you by:
thesun
From: Guillaume F. <gui...@fr...> - 2015-02-03 08:53:12
|
Hello everyone! I am having an issue with the backup of a few files here, taking more space than need on my ZFS dataset. After some digging, i found the issue is primarly caused by both gzip and rsyncrypto. Here, i will only discuss of the rsyncrypto part making rsync to fail at backup efficiently : Suppose you make 2 files of 450MB, with only 50MB that changed, in the middle of the file (no deleted or added data, not even moved). To create a test case, here is what i made : dd if=/dev/urandom of=begin.iso bs=1M count=100 dd if=/dev/urandom of=end.iso bs=1M count=300 dd if=/dev/urandom of=middle1.iso bs=1M count=50 dd if=/dev/urandom of=middle2.iso bs=1M count=50 Lets build our two files : cat begin.iso middle1.iso end.iso >file1.iso cat begin.iso middle2.iso end.iso >file2.iso So we end up with two files of identical size, but 50MB diff somewhere inside : -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file1.iso -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file2.iso I now encrypt them with rsyncrypto : rsyncrypto --gzip=nullgzip file1.iso{,.enc} backup.{keys,crt} rsyncrypto --gzip=nullgzip file2.iso{,.enc} backup.{keys,crt} The first noticeable thing i see is that they dont do the same size once encrypted : -rw-r--r-- 1 root root 472063316 2 févr. 14:55 file1.iso.enc -rw-r--r-- 1 root root 472062484 2 févr. 14:55 file2.iso.enc Now if i copy the original files using rsync, I get interesting i/o work : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file1.iso test/file.iso sending incremental file list > f+++++++++ file1.iso 471,859,200 100% 208.71MB/s 0:00:02 (xfr#1, to-chk=0/1) sent 471,974,500 bytes received 35 bytes 188,789,814.00 bytes/sec total size is 471,859,200 speedup is 1.00 [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file2.iso test/file.iso sending incremental file list > f..t...... file2.iso 471,859,200 100% 135.90MB/s 0:00:03 (xfr#1, to-chk=0/1) sent 52,543,948 bytes received 152,118 bytes 8,107,087.08 bytes/sec total size is 471,859,200 speedup is 8.95 [kuri:~/tmp/random] $ Now i copy the encrypted files : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file1.iso.enc test/file.iso.enc sending incremental file list > f+++++++++ file1.iso.enc 472,063,316 100% 180.86MB/s 0:00:02 (xfr#1, to-chk=0/1) sent 472,178,659 bytes received 35 bytes 134,908,198.29 bytes/sec total size is 472,063,316 speedup is 1.00 [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file2.iso.enc test/file.iso.enc sending incremental file list > f.st...... file2.iso.enc 472,062,484 100% 111.87MB/s 0:00:04 (xfr#1, to-chk=0/1) sent 52,608,319 bytes received 152,188 bytes 9,592,819.45 bytes/sec total size is 472,062,484 speedup is 8.95 So, it worked perfectly on this test, but sometimes, it fails to do proper diff, so lets make another test file : dd if=/dev/urandom of=middle3.iso bs=1M count=50 cat begin.iso middle3.iso end.iso >file3.iso rsyncrypto --gzip=nullgzip file3.iso{,.enc} backup.{keys,crt} Lets look at the files : -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file1.iso -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file2.iso -rw-r--r-- 1 kuri users 471859200 3 févr. 09:07 file3.iso -rw-r--r-- 1 root root 472063316 2 févr. 14:55 file1.iso.enc -rw-r--r-- 1 root root 472062484 2 févr. 14:55 file2.iso.enc -rw-r--r-- 1 root root 472062932 3 févr. 09:07 file3.iso.enc Lets rsync the third file : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file3.iso.enc test/file.iso.enc sending incremental file list > f.st...... file3.iso.enc 472,062,932 100% 53.22MB/s 0:00:08 (xfr#1, to-chk=0/1) sent 367,307,827 bytes received 152,188 bytes 34,996,191.90 bytes/sec total size is 472,062,932 speedup is 1.28 So, it copied 350MB of a 450MB file that only had 50MB changed. Lets see with the unencrypted files : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file3.iso test/file.iso sending incremental file list > f..t...... file3.iso 471,859,200 100% 135.29MB/s 0:00:03 (xfr#1, to-chk=0/1) sent 52,543,947 bytes received 152,118 bytes 9,581,102.73 bytes/sec total size is 471,859,200 speedup is 8.95 So it is working properlly if files are not encrypted. Is it possible that due to having different filesize, rsync algorithm fails ? Do you have any hints ? The only thing i can see is that between file1.iso.enc and file2.iso.enc, the filesize dropped a little, and between file2.iso.enc and file3.iso.enc it is higher, but i have no idea if this can be related... Checking at the data of each encrypted file i can see that the last 300MB are exactly the same : [kuri:~/tmp/random] $ tail -c 314572800 file1.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - [kuri:~/tmp/random] $ tail -c 314572800 file2.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - [kuri:~/tmp/random] $ tail -c 314572800 file3.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - But the first 100MB are not : [kuri:~/tmp/random] $ head -c 104857600 file1.iso.enc | sha1sum d86fa953b25e1a01a53409f567cc845535525dc1 - [kuri:~/tmp/random] $ head -c 104857600 file2.iso.enc | sha1sum 0c10309cf8fe0bb349b05081c782469e4c2fb0e2 - [kuri:~/tmp/random] $ head -c 104857600 file3.iso.enc | sha1sum 338ba6c1a58dde8c334092986e5ce20e3b8114df - Any help would be greatly appreciated, i would like to backup even bigger files (some GBs), where over 90% of the file gets transferred if encrypted with rsyncrypto while only 2-4MB would be transferred otherwise. |