Add --rsyncable option for image compression

lucatrv
2014-01-18
2015-06-20
  • lucatrv
    lucatrv
    2014-01-18

    I use Clonezilla as a backup tool. I create disk images and then transfer them to a remote server using rsync.
    For this I cannot create compressed images. In fact as known standard compression algorithms are not rsync-friendly, as a very small change in the original files could force rsync to re-transmit the whole compressed file, instead of just the changed portion.
    For this reason some compression tools (in particular at present: gzip, pigz, pbzip2) have the --rsyncable option. With this option, the algorithm resets periodically during compression so as to limit the change extent in the compressed file when some portion of the original files changes, allowing rsync to work on the compressed file. The trade-off is a slightly worse compression ratio.
    It would be very useful to add to Clonezilla the possibility to specify the --rsyncable option, which should be passed to the compression tool when supported.
    Thanks

     
  • Steven Shiau
    Steven Shiau
    2014-01-19

    Sure. I did a test with and without using "-R" for pigz, the image size is about 5 MB difference out of 700 MB. It's acceptable, and is very helpful when rsyncing the files.
    We will add this in the next release for gzip/pigz. Thanks for requesting this.
    However, I do not see any similar option for other compression programs, e.g. pbzip2, lbzip2, lzop, xz, or pixz... Are you sure it's available for pbzip2?

    Steven.

     
  • lucatrv
    lucatrv
    2014-01-19

    Yes you are actually right, only gzip and pigz support the "--rsyncable" option. Please notice that only pigz has the short option "-R", while gzip has only "--rsyncable".
    Other compression programs may support this in future once their algorithm implementation stabilizes, see "man xz".
    Thanks a lot for your feedback.

     
  • jsteel
    jsteel
    2014-10-22

    From what I can see, "gzip --rsyncable" is not an accepted upstream option/patch. It seems some distributions choose to include this patch but not others. Is it possible to (as you have done) change this config on the live CDs you provide (that support it) but leave the config file provided in the tarball without this option?

     
  • Steven Shiau
    Steven Shiau
    2014-11-03

    @jsteel,
    Thanks. Yes, you are right. This is a patch, not native option for gzip.
    BTW, which version of GNU/Linux do you use? I'd like to know if that's the OS we support or not.
    Thanks.

    Steven.

     
  • Steven Shiau
    Steven Shiau
    2014-11-05

    This is improved in clonezilla 3.12.3-drbl1. Now only when "--rsyncable" is available, Clonezilla will use it for gzip or pigz. If you test it, please let us know the results.

    Steven.

     
  • lucatrv
    lucatrv
    2015-03-21

    Sorry for not getting back on this before, I found these posts only now.

    Regarding the "--rsyncable" option, I confirm it is not present in the upstream GNU gzip code, while it is added through a patch in Debian/Ubuntu, Fedora, and other distributions.

    See for instance the following discussions (Mark Adler is one of the authors of gzip and the original author of pigz, Paul Eggert is the maintainer of GNU gzip):
    http://lists.gnu.org/archive/html/bug-gzip/2012-06/msg00005.html
    http://lists.gnu.org/archive/html/bug-gzip/2012-06/msg00023.html
    http://lists.gnu.org/archive/html/bug-gzip/2013-06/msg00008.html
    http://lists.gnu.org/archive/html/bug-gzip/2013-06/msg00022.html

    The "--rsyncable" option is instead available in upstream pigz:
    http://zlib.net/pigz/pigz.pdf

    However, I have carried out several tests and found out that unfortunately rsync or similar deduplicating algorithms do not work as expected with Clonzilla backups. It could be due to how Clonezilla stores data, I will try to explain below.

    To exclude the influence of compression algorithms, I have worked with plain uncompressed Clonezilla backups, and tested the following software: rsync, xdelta3, zbackup, duplicati2. I backed-up a partition, changed a few kb of data in the partition, and then backup-up again the partition. I would have expected to obtain a small diff file between the two backup files when using one of the software above. Instead the diff file is very large, about the same size of the original backup. This is strange, initially I thought it could be due to the rsync algorithm but software like zbackup and duplicati2 analyse the full file before computing the differences, so now I think it could be due to how Clonezilla stores data.

    If this issue was fixed, it would be very convenient to use Clonezilla together with a software like zbackup. In this case, backups could be run incrementally, saving a lot of time and disk space. Moreover data storage and compression could be left to zbackup, simplifying Clonezilla code. For now it supports LZMA and LZO but I am sure also DEFLATE (gzip/pigz) could be easily added. Could this feature be considered for Clonezilla v3?...

    Thanks

     
  • Steven Shiau
    Steven Shiau
    2015-03-24

    @lucatrv,
    Thanks for raising this topic. Program zbackup looks very interesting. We will keep this in mind and to see if it can be used for the incremental backup for Clonezilla in the future.

    Steven.

     
  • lucatrv
    lucatrv
    2015-06-12

    Hi Steven, I noticed that zbackup was indeed included in the Clonezilla Live CD, can it be used right now with Clonezilla or was it it just added for future possible use?
    Thanks

     
  • Steven Shiau
    Steven Shiau
    2015-06-20

    Hi Lucatrv,
    No, it can not be used with Clonezilla. We just added that for future possible use.

    Steven.