Read Me
Rdiff-image Tools
=================
The rdiff-image tools is a backup solution aimed at hosts
running on a virtual machine somewhere in the cloud. It
leverages the backup to provide a couple of other features
useful to open source style maintenance of the host.
Documentation consists of this file and the man pages.
All documentation is readable online at the home page:
http://rdiff-image.sourceforge.net/
Dependencies
------------
Python >= 3.0, http://www.python.org/
rdiff, http://librsync.sourceforge.net/
Bash shell, http://tiswww.case.edu/php/chet/bash/bashtop.html
cpio, http://www.gnu.org/software/cpio/
gnupg, http://www.gnupg.org
If you plan to use S3, you will also need:
python-boto http://code.google.com/p/boto/
If you plan to use rdiff-image-snap:
rsync, http://rsync.samba.org/
Optionally, is you are planning on using S3 this Firefox
plus this extension is very handy:
s3fox, http://www.s3fox.net/
Building and Installing
-----------------------
Packages are available for Debian and RedHat style
distributions at the home page. If you install using one
of these you can skip this section.
The build dependencies are:
- Python3 development system, http://www.python.org
- A POSIX system (make, unix shell, sed, etc).
Only interpreted languages are used, so no building is
required.
To install, in the directory containing this file run:
make install
Quick Start
-----------
a. Copy the example shared/rdiff-image/examples/rdiff-image.conf
to /etc/rdiff-image/rdiff-image.conf.
b. Read rdiff-image-cron(1) to get an overview of how this all
works, and keep rdiff-image.conf(5) handy.
c. Customise /etc/rdiff-image/rdiff-image.conf.
d. Run "rdiff-image-cron /etc/rdiff-image/rdiff-image.conf"
to check your customised rdiff-image.conf works.
e. Put the above invocation into cron to have backups happen
regularly.
f. Use rdiff-image-get(1) to download and unpack your backups,
and rdiff-image-boot(1) to see if they run.
g. Use the diff command of rdiff-image-tarutil(1) to compare
successive incremental backups to see why they are growing
quickly and thus causing Amazon S3 to change you more than
you might like.
Notes
-----
a. The rdiff-image tools were designed to take a complete backup
a Linux Virtual Machine running in the cloud, and then make it
easy to restore and run it somewhere else. This is where the
"image" in rdiff-image comes from - it in some sense takes an
image of a VM. The notes here assume you this is what you are
trying to do, too.
b. To save you time if you are looking for a "partial restore"
option - there isn't one. Rdiff-image-get always unpacks the
the backup to a directory containing the complete file system
of your backed up machine. If you want to do something else
use the "--tar" option of rdiff-image-get to create a tar file
and restore from that.
c. Since the restore is a complete copy of your VM, it is usually
possible to literally "run it" within a chroot on a machine
running the same OS (eg a Linux Kernel). Thus you can for
example download the image onto your laptop using
rdiff-image-get(1), boot it using rdiff-image-boot(1) and
see the same services supplied by your VM running on your
laptop. This includes things like www, ssh, planet, mail
and so on. To get this working requires:
- Ensure the "mounts" option in rdiff-image.conf is correct.
The default is usually OK.
- Set the "init.d" option in rdiff-image.conf appropriately.
It is a list of services from /etc/init.d you VM needs to
start running on boot up. It is a subset, because some
of the stuff done on a normal boot up (like setting up
network interfaces for instance) have already been done by
the host.
- Shut down services on your host machine that conflicting
TCP/UDP ports.
- Alter the configuration of services like apache to remove
assumptions on specific host names or IP addresses. In the
case host names, use wild cards.
- Having done the preparation, try booting your image using
rdiff-image-boot(1). It won't work. Fix the problems by
changing files in the local backup. When it works send the
fixed files to the real VM.
d. Notice getting the VM booting under a chroot was done by
working on a local copy of the VM, and once done sending the
changes back to the real machine. This has obvious
advantages, and thus you get two pay backs for making a
restored VM bootable - not only does it make it easy to
recover after a disaster, it makes it easier to maintain it.
In turn, if people download the backup to maintain the VM,
your backup is being tested regularly.
e. Your backup will fail on for reasons you haven't thought of
yet. Nothing beats the sense of betrayal you get from finding
the backup you were relying on hasn't worked for the last six
months. At the very least set the "email" option in
rdiff-image.conf, and verify it works by temporarily setting
the "enable" option to "no", and running rdiff-image-cron
manually. Ideally, you also monitor it using Nagios or
something similar.
f. Optimising of your backups for S3 is an important if tedious
step. The backups consist of the occasional full backup
(.tar.gz) followed by a series of differences to that full
backup (.rdiff.gz). Pretty much regardless of the size of the
full backup, if the rate the differences grow is small, the S3
costs will be small. With this dollar incentive most people
notice the differences grow surprisingly quickly given the
amount of activity they think happens on their machine. The
key to finding out why this is so and fixing it is the
rdiff-image-tarutil(1) program, and in particular the "diff"
option of that command. It compares two backups (which are tar
files), telling you what the differences are. If you use it to
compare one backup to the next it will tell you why the
differences grew. Often you will decide there is no point in
backing up the file causing the problem. When doing this it
is helpful to know the "rdiff" program used to calculate the
differences operates in 2K chunks, so even a single byte
change will cause a 2K change in the output.
g. If you want to start the chroot'ed VM when the host boots put
a symbolic link to chroot copy of rdiff-image-boot(1) into
the hosts /etc/init.d and the appropriate /etc/rc?.d
directories.
rdiff-image-snap
----------------
The rdiff-tools includes rdiff-image-snap. It doesn't belong with
the rdiff-image tools, but it currently has no other home to go
to. I run it daily using cron to create a backup of my laptop's
hard drive. The net effect is an invocation like this:
rdiff-image-snap -x -m :external-disk-drive-label / /laptop-backup
Is like running this:
fsck -a /dev/external-disk-drive
mount -L external-disk-drive-label /mnt/external-disk-drive
cp -ax / /mnt/external-disk-drive/laptop-backup/backup-YYYYMMDD-HHMMSS
umount /mnt/external-disk-drive
The main difference is files that haven't changed since the
previous backup are hard linked to the old copy and thus don't
occupy extra space. This means I get to keep about 3 months
worth of daily images on a single drive. There are other tools
that do this (eg rsnapshot), but if all you want to do is back up
your machine to a local (or networked) file system,
rdiff-image-snap does the job just as well, and is drop dead
simple to use.
License
-------
Copyright (c) 2009-2014,2015,2016,2017,2018,2021 Russell Stuart.
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
The copyright holders grant you an additional permission under Section 7
of the GNU Affero General Public License, version 3, exempting you from
the requirement in Section 6 of the GNU General Public License, version 3,
to accompany Corresponding Source with Installation Information for the
Program or any work based on the Program. You are still required to
comply with all other Section 6 requirements to provide Corresponding
Source.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
--
Russell Stuart
2014-May-30