From: Scott C. <sc...@bn...> - 2004-02-02 23:07:38
|
Here is a more detailed description of my backup system, which uses flexbackup and is 'tape-less'. (My apologies in advance if the mail system hopelessly mungs the tab formatted stuff below. Viewing with a fixed-width font may help fix some of the display problems.) I backup six Linux computers and three MS Windows computers. Each of the six Linux computers runs flexbackup once per day, writing the files that are backed-up to a compressed afio archive file, which is saved on a separate disk drive in the computer. These drives store all of the backup images, with older images being deleted manually as drive space dictates. One of the six Linux computers is also a secondary storage location for each of the five main Linux computers. It also has the C drives of the three MS Windows computers mounted in its filesystem, so that these three computers are backed-up also. This system gives me a 'tape-less' backup system (I don't have much trust in tape drives) with all of the backup images on-line. Each of the five main Linux computers also has their backup images "backed-up" to a central secondary Linux system, also with the images on disk. This system is in a separate building from the other computers, and is more secure. Configuration of the five main Linux computers: * Each of these computers has a primary disk and a "backup" disk. The primary disks hold the system and home filesystems, and are 20Gb each. The backup disks hold the flexbackup configuration and state files plus the compressed afio images. These are 40Gb drives (except for one of the systems, which has a 40Gb home drive, and a 120Gb backup drive). * The backup drive is mounted on '/backups'. A typical directory listing of the '/backup' directory looks like this: zimtok5.root# l total 24 drwxr-xr-x 2 root root 16384 Jun 7 2003 lost+found drwxrwxrwt 2 root root 4096 Jun 7 2003 tmp drwxr-x--- 7 dad dad 4096 Jan 12 15:10 zimtok5 zimtok5.root# On each of the five computers there is a directory in the /backups directory which is named for that computer (zimtok5 in the example above). The reason for this will be explained below. This directory is owned by 'dad' (my local user name for "distribution administrator", you can use your own user name as needed) and is not accessible to the general public (for file security reasons). * A directory listing of the directory 'zimtok5' looks like this: zimtok5.root# l total 32 drwxr-xr-x 2 dad dad 8192 Feb 1 23:13 archive -rw-r--r-- 1 dad dad 5937 Jun 7 2003 flexbackup.conf drwxr-xr-x 2 dad dad 4096 Feb 2 01:39 images drwxr-xr-x 2 dad dad 4096 Feb 2 01:41 log drwxr-xr-x 2 dad dad 4096 Jan 27 01:39 state drwxr-xr-x 2 dad dad 4096 Feb 2 01:41 tmp zimtok5.root# This directory contains the flexbackup.conf file, the 'images' directory (where the most current afio images are kept), an 'archive' directory (where older afio images are moved to), plus the 'log' and 'state' directories used by flexbackup. This directory and subdirectories are the same on all six Linux computers. * The root user has a cron entry which starts the backup process: # backups 39 1 * * * /home/dad/bin/runflex This starts a Perl script called 'runflex' which is in the bin directory of the user 'dad'. * Each Linux computer has a user called 'dad' which I use for system maintenance tasks which I don't want root privileges attached to. It has log-in disabled so it can only be gotten to as root (su - dad). The home directory looks like this: zimtok5.root# l -R .: total 16 drwxr-xr-x 2 dad dad 4096 Apr 21 2003 bin drwxr-xr-x 2 dad dad 4096 Jun 7 2003 etc drwxr-xr-x 2 dad dad 4096 Feb 2 13:59 incoming drwxr-xr-x 2 dad dad 4096 Aug 11 11:55 tmp ./bin: total 8 -rwxr--r-- 1 dad dad 1975 Jan 12 14:32 purge-images -rwxr--r-- 1 dad dad 2183 Apr 21 2003 runflex ./etc: total 24 -rw-r--r-- 1 dad dad 424 Jan 12 14:27 cron.dad -rw-r--r-- 1 dad dad 337 Jan 12 14:23 cron.root -rwxr-xr-x 1 dad dad 1664 Mar 29 2003 newuser.pl -rw-r--r-- 1 dad dad 9167 Jan 30 15:03 user-list ./incoming: total 0 ./tmp: total 0 zimtok5.root# The 'bin' directory contains two Perl scripts: runflex, which does the backups; and 'purge-images' (which is a bit mis-named), which moves older afio images from the 'images' directory to the 'archives' directory. The 'incoming' directory is used for other purposes (as a destination for updated system files which get sent out from my central computer via 'scp'.) * Here is a copy of the 'runflex' Perl script: #!/usr/bin/perl # runflex # # Run a flexbackup job for each of the directories in the /backups directory. # Each flexbackup job has its own configuration file. # # Scott Coburn, April 2002 use strict; # # define a day offset into the month, unique to each job, # so that, say, level 0 dumps will not be done on each # job on the same night. Then the network file # transfers to the secondary storage computer (zartron9) # will not clog the network with level 0 dumps # from all of the individual computers. (The level 0 # dump transfers will be staggered...) # my %offset = ( "zartron9" => 0, "gorzarg5" => 1, "x22a" => 2, "x22b" => 3, "x22c" => 4, "solids" => 5, "ahnooie4" => 6, "albh" => 7, "fred2" => 8, "zimtok5" => 9, ); # thirty-one day cycle for backup dump levels 0 through 9. my @sched = ( 1, 3, 2, 5, 4, 7, 6, 9, 8, 1, 3, 2, 5, 4, 7, 6, 9, 8, 1, 3, 2, 5, 4, 7, 6, 9, 8, 0, 3, 2, 5); my $budir = "/backups"; my $config = "flexbackup.conf"; my $cfile; # constructed configuration file name my $dname; # directory name my $job; # job name my $jobdir; # job directory my $host; # host name (ie x22a, not x22a.nsls.bnl.gov) my $dom; # today's 'day of the month' (1-31) my $level; # job's dump level for today my $retstat; # system call return status $dom = (localtime)[3]; opendir BUDIR, $budir or die "Backup directory $budir open error: $!\n"; while ($job = readdir BUDIR) { next if $job =~ /^\./; # skip over .files next if $job =~ /lost\+found/; # skip over lost+found directory next if $job =~ /tmp/; # skip over tmp directory next if $job =~ /restore/; # skip over restore directory $jobdir = "$budir/$job"; $cfile = "$jobdir/$config"; if (opendir IMDIR, $jobdir) { if ( -f $cfile and -r $cfile) { $level = $sched[(($dom + $offset{$job} - 1) % 31)]; print "$job: flexbackup -fs all -level $level -c $cfile\n\n"; $retstat = system "flexbackup -fs all -level $level -c $cfile"; } else { print "Error reading configuration file $cfile: $!\n"; } closedir IMDIR; } else { print "Skipping job directory $jobdir. Open error: $!\n"; } } closedir BUDIR; This script starts by getting the day-of-the-month (1 -> 31) to use as an index into the backup schedule array 'sched'. It then goes into the directory '/backups' and loops through all of the directory entries, skipping over the 'lost+found', 'tmp', and 'restore' directories, plus any dot files that may be there. For the case of running the script on the 'zimtok5' computer, it will find the 'zimtok5' directory. It then constructs '/backups/zimtok5' and '/backups/zimtok5/flexbackup.conf', goes into '/backups/zimtok5', checks that the configuration file exists and is executable. It then calculates the backup level given the day-of- the-month and this computers 'offset' (9 for zimtok5). It then passes this level and the configuration file path and name to the actual flexbackup job. Since this script is run as a cron job the output of all of this is emailed to the root user when the backup completes. Root's email is forwarded to 'dad' which is forwarded to 'dad' on the one of my other servers, which I collect each morning to see if all went well. * The user 'dad' also has a cron job which runs once per day: # purge flexbackup image files 13 23 * * * /home/dad/bin/purge-images This runs a Perl script called 'purge-images'. This script is not really needed, but I run it because I have not yet had the time to unwind it from my systems. At one time it did some other useful things, but it has been deprecated slowly. For what it is worth, here it is: #!/usr/bin/perl # Scott Coburn, March 2002 # purge-images # # Purge old level [0-9] afio image files from backup images directory. # Purged files are moved to the archive directory. # Old files should be deleted from the archive directory manually as # needed to regain disk space. # # This program should be run on each host after flexbackup has finished. use strict; my $budir = "/backups"; my $imagedir = "images"; my $archivedir = "archive"; my $imtail = "afio-gz"; my $dname; # directory name my $fname; # file name my $dir; # for constructed directory name my @files; # image file names my $nfiles; # number of file names to process my $i; # loop counter my $thisfile; my $nextfile; my $retstat; # system call return status opendir BUDIR, $budir or die "Backup directory $budir open error: $!\n"; while ($dname = readdir BUDIR) { next if $dname =~ /^\./; # skip over .files next if $dname =~ /lost\+found/; # skip over lost+found directory next if $dname =~ /tmp/; # skip over tmp directory next if $dname =~ /restore/; # skip over restore directory $dir = "/$budir/$dname/$imagedir"; if (opendir IMDIR, $dir) { &move_files; closedir IMDIR; } else { print "Skipping image directory $dir. Open error: $!\n"; } } closedir BUDIR; sub move_files { @files = (); while ($fname = readdir IMDIR) { next if $fname =~ /^\./; # skip over .files if ($fname =~ /^.*\.[0-9]\.\d*\.($imtail)$/) { unshift @files, $fname; } } @files = sort @files; $nfiles = @files - 1; # don't need to worry about last file on the list... $files[0] =~ /^(.*\.[1-9])/; $thisfile = $1; for ( $i = 0 ; $i < $nfiles ; $i++) { $files[$i+1] =~ /^(.*\.[1-9])/; $nextfile = $1; if ($thisfile eq $nextfile) { # print "/bin/mv $dir/$files[$i] /$budir/$dname/$archivedir\n"; $retstat = system "/bin/mv $dir/$files[$i] /$budir/$dname/$archivedir"; } $thisfile = $nextfile; } } # end purge-images This script loops through the /backups directory looking for sub-directories not named 'lost+found', 'tmp', or 'restore'. When it finds one it goes into this directory's 'image' subdirectory, makes a list of all of the files which end in 'afio-gz', sorts them, and then goes through moving (into the 'archive' directory) all those array entries whos next neighbor entry has the same beginning string (as this entry will be an image of a backup of a particular filesystem but with a "lower" backup date string in the name). Again, this is not really needed, you can just leave all of the images in the 'images' directory. I am going to remove all of this stuff when I get the time. So, to summarize: each of the five main computers runs a backup job each night, the images are saved in the /backups tree; root runs the backups via the 'runflex' script; the /backups tree is owned by the user 'dad' who is accessible only by root; 'dad' keeps all of the scripts in its home directory. The sixth Linux system is my central administration computer. This is the computer where 'dad' lives. This computer keeps the 'original copies' of all of the system configuration files for my Linux systems. Updated rpm files go out to each of the computers from here. It is also the secondary storage system for the backup image files. It also is a central location for system logs for the other Linux systems. It is located in a separate building and has only three accounts, root, dad, and my account. No logins are allowed for the user 'dad', so I have to get to it through root. The user 'dad' has ssh keys installed on the other five Linux computers so that files can be securely copied back and forth from the central system to the others via 'scp'. The directory structure on this central computer (it is called zartron9) is very similar to the one on the other five Linux computers. Here are the differences: * The '/backups' directory is a directory of symbolic links to directories on the drives which store the image files. So, the '/backups' directory looks like this: zartron9.root# l total 4 lrwxrwxrwx 1 dad dad 18 Dec 9 16:45 ahnooie4 -> /backups0/ahnooie4 lrwxrwxrwx 1 dad dad 14 Dec 9 16:45 albh -> /backups0/albh lrwxrwxrwx 1 dad dad 18 Feb 13 2003 gorzarg5 -> /backups0/gorzarg5 lrwxrwxrwx 1 dad dad 15 Dec 9 16:46 fred2 -> /backups0/fred2 lrwxrwxrwx 1 dad dad 16 Dec 9 17:05 solids -> /backups1/solids lrwxrwxrwx 1 dad dad 14 Feb 13 2003 x22a -> /backups0/x22a lrwxrwxrwx 1 dad dad 14 Feb 13 2003 x22b -> /backups0/x22b lrwxrwxrwx 1 dad dad 14 Feb 13 2003 x22c -> /backups0/x22c lrwxrwxrwx 1 dad dad 18 Feb 13 2003 zartron9 -> /backups0/zartron9 lrwxrwxrwx 1 dad dad 17 Jun 23 2003 zimtok5 -> /backups0/zimtok5 zartron9.root# This directory contains a link for each of the computers which is backed-up in my system. There is a 250Gb drive which stores the backup images, it is mounted on '/backups0'. I have been through a couple of drive configurations, and having this directory full of links has worked very well in adapting to the changes. The link 'solids' is to a directory on another 250Gb drive which is mounted on '/backups1'. This is for our server computer, which I have not yet gotten onto this backup system. * There is a directory called '/desktops' which has three MS Windows computer's drive Cs mounted in it. It looks like this: zartron9.root# l total 24 drwx------ 1 root root 4096 Jan 30 18:01 ahnooie4.driveC drwx------ 1 root root 4096 Jan 9 10:42 albh.driveC drwx------ 1 root root 4096 Jan 22 08:31 fred2.driveC -rw-r--r-- 1 root root 1010 Feb 21 2003 shares -rwxr-xr-x 1 root root 2328 Feb 14 2003 smount -rwxr-xr-x 1 root root 1867 Feb 14 2003 sumount zartron9.root# The 'shares', 'smount', and 'sumount' are files related to mounting and unmounting the these Samba file systems. So, these three disks are part of zartron9's file system, making them available for backups. * Just as in the case of the five main Linux computers, each of the links in '/backups' points to a directory with 'images', 'archive', etc., and a flexbackup.conf file. However, the directories for the five main Linux computers (gorzarg5, x22a, x22b, x22c, and zimtok5) have their flexbackup.conf files renamed to, say, flexbackup.conf.zimtok5. This will prevent 'runflex' from trying to do a real backup from the configuration file (zartron9 is not the computer to do the backup on) and also provides a backup copy of the conf file from the computer it does run on. The /backups/zartron9 link points to zartron9's backup directory, with its flexbackup.conf. The other three (ahnooie4, albh, and fred2) have flexbackup.conf files which specify backups for the MS Windows filesystems mounted in /desktops. So, when 'runflex' runs on 'zartron9' the Perl script loops through the '/backups' directory, looking for 'flexbackup.conf' files and running a flexbackup job for each one it finds. It will find one for each of zartron9 and the three MS Windows mounts. This is why the structure of the '/backups' directory is the way it is. I only need one version of the 'runflex' script for all of my systems. It just loops through the '/backups' directory and runs a flexbackup job for each 'flexbackup.conf' file it finds. On the main Linux computers it finds only one conf file, but on zartron9 it finds four. * The computer zartron9 also runs a Perl script called 'get-images' which is run as a cron job once per day. It copies the previous days backup images from each of the five main Linux computers. Here it is: #!/usr/bin/perl # Scott Coburn, June 2003 # get-images # # Retrieve latest afio backup images from each host. Filenames # of images to retrieve are in the flexbackup log on each host. # use strict; my $budir = "/backups"; my $flogdir = "log"; my $flogname = "all.latest"; my $imagedir = "images"; my $remote_flogname; my $local_flogname; my $fqhost; # fully-qualified host name my $host; # host name my $retstat; # system call return status my %hosts = ( "gorzarg5" => "gorzarg5.phy.bnl.gov", "x22a" => "x22a.nsls.bnl.gov", "x22b" => "x22b.nsls.bnl.gov", "x22c" => "x22c.nsls.bnl.gov", "zimtok5" => "zimtok5.phy.bnl.gov", ); while (($host, $fqhost) = each %hosts) { $retstat = system "/bin/ping -q -c 2 -w 5 $fqhost > /dev/null"; if ($retstat) { print "Host $fqhost could not be reached. Skipping it.\n"; } else { print "Host $fqhost seems to be alive today.\n"; $local_flogname = "/$budir/$host/$flogname"; $remote_flogname = "/$budir/$host/$flogdir/$flogname"; $retstat = system "/usr/bin/scp $fqhost:$remote_flogname $local_flogname > /dev/null"; &process_flog unless $retstat; } } sub process_flog { $retstat = open FLEXLOG, "<$local_flogname"; if ($retstat) { while (<FLEXLOG>) { chomp; if (/of=(.*-gz)/) { print "Retrieving $host:$1\n"; system "/usr/bin/scp $fqhost:$1 $budir/$host/$imagedir > /dev/null"; } } close FLEXLOG; } else { print "Cannot open flexbackup log file $local_flogname: $!\n"; } } This script copies the backup image files for each of the hosts listed in the 'hosts' array. It pings the host to see if it is reachable. If so, it constructs some filenames with paths, retrieves the previous days 'latest-all' log file, and calls the 'process_flog' procedure. This procedure simply retrieves each file listed in the log file using 'scp'. Each retrieved image file is put into its host's /backups/'host'/images directory, just where it appears on the host. I plan to make some changes to this system when I have some time: * update flexbackup and the Perl scripts to flexbackup version 1.? (whatever the latest version is. * configure one of my other Linux systems to be the secondary backup system for storing the backup images created by zartron9. With the system as it is now, the backup images created by zartron9 are only in one place (on zartron9). This is a single point of failure for this computer and the three MS Windows computers. * weed out the 'purge-images' scripts. I have not included any of my 'flexbackup.conf' files here. They are pretty much standard, except I have changed the list of files type and directories to backup and to skip. I have also changed the filenames and paths to the 'state' and 'log' directories, to point into the /backups/'host' directory. As always, comments and criticisms are welcome. Questions answered eventually. Scott -- * Scott Coburn * Brookhaven National Laboratory * sc...@bn... 631.344.7110 * This message brought to you by Linux. |