ABBS (Agent Based Backup System)
6th August 2005
Tony di Nucci
tony[at]ovule[dot]co[dot]uk
------------------------------
CONTENTS
------------------------------
1: INTRODUCTION
2: SERVER REQUIREMENTS
3: CLIENT/HOST REQUIREMENTS
4: SERVER SETUP
5: RUNNING SERVER
6: CLIENT/HOST SETUP
7: LINEAR FLOW OF EXECUTION : SERVER
8: FLOW OF EXECUTION : HOST
9: FLOW OF EXECUTION : AGENT
------------------------------
1: INTRODUCTION
------------------------------
Before I get started I would like to apologise for the currently poor documentation for this project, ABBS is currently in the alpha stage and there will be better documentation soon. If anyone would like to become involved and help with the documentation, or other aspects of the ABBS please email me, tony[at]ovule[dot]co[dot]uk
ABBS aims to be a complete backup solution for networks of all types and size.
ABBS backs up only those files that need to be. It is a pointless excercise to backup the same files everyday even though they have never changed.
ABBS can take the load off file servers. There is little point in having a file server full of files only used by the creator, purely because it is easier to backup the file server than it is all client machines. ABBS allows clients to use their own drives, while all important data is still backed up as regularly as is neccisary.
The reason for making this system agent based is two-fold:
1: Reduce workload of the server.
A similar system to this could exist where the server scans all network drives itself and looks for files to backup. However this is using the servers, probably already stretched, resources. Why not move this task to clients? This can be acheived by way of an agent which executes on clients.
2: Make future ABBS updates trivial.
As stated in point 1 (above), it is a good thing to move processing to the client. This means there must be some form of program running on clients. On a network of hundreds of computers it will be a time consuming task to update all clients, to run the latest version of the software. By using agents only the server has to be updated since agents live on the server and are sent over the network to hosts requesting them. The host does not know anything about how the agent does it's job. The host is simply an environment the agent can live in while doing it's work.
------------------------------
2: SERVER REQUIREMENTS
------------------------------
Unix like OS on server. (for the time being, still working on making server platform independant)
All network drives (you wish to backup) are mounted on server.
J2SE 1.4+ SDK (to build from source)
J2SE 1.4+ JRE (if you downloaded the precompiled version)
HTTP server.
Bzip2.
------------------------------
3: CLIENT/HOST REQUIREMENTS
------------------------------
Any OS on (in theory).
J2SE 1.4+ SDK (to build from source)
J2SE 1.4+ JRE (if you downloaded the precompiled version)
------------------------------
4: SERVER SETUP
------------------------------
1-----------
All network drives must be mounted at their top level. For instance, if there are Windows clients on the network and you want to be able to backup content on their C: drives you must mount this drive from C:. On Unix-like directory heirarchy drives must be mounted at /.
If you have a Window PC on the network, called "HUGO" with two drives, C: and D: then the mount path on the server must include the name (HUGO) and see these drives in their entirety. An example would be to create two directories, one called /mnt/HUGO/C and the other /mnt/HUGO/D. You would now mount HUGO's C: drive to /mnt/HUGO/C and HUGO's D: drive to /mnt/HUGO/D. It does not matter what the full path to HUGO's mounted drive is, so long as it contains HUGO/C for its C: drive and HUGO/D for the D: drive. An equally legal path is /some-dir/some-other-dir/etc/HUGO/C.
If you are mounting drives with a Unix-like directory heirarchy then these must be mounted at /. For example a Linux box called "victor" must be mounted to some directory like /mnt/victor/
2-----------
You must decide on a location to store backups to. This will typically be a dedicated HDD. You will be asked when installing the server to state this backup path.
3-----------
You must have a functioning HTTP server. ABBS will run within it to allow objects to move through the network.
4-----------
Unzip abbsServer.zip to desired installation directory. Must be accessible by your web server, eg: /var/www/localhost/htdocs/
Move to directory created.
Execute build file: # sh build
Answer all questions printed to terminal.
------------------------------
5: RUNNING SERVER
------------------------------
Move to installation directory.
As root execute run file: #sh run
------------------------------
6: CLIENT/HOST SETUP
------------------------------
Unzip abbsHost-<version>.zip to desired installation directory.
Move to directory created.
Execute build.bat - Windows users, either double click or type build.bat at a command prompt. Unix-like OS users: # sh build.bat
Answer all questions printed to console.
------------------------------
7: LINEAR FLOW OF EXECUTION : SERVER
------------------------------
The server is multithreaded and once step 1 is complete it can be performing a number of, any of, the operations at the same time.
1: Server starts, looks for configuration file and initialises itself.
2: Server waits, listening for hosts to request an agent.
3: Server accepts connection from host, spawns an agent and sends this to the requesting host.
4: Server accepts an agent returned from a host.
5: Server collects list of files that agent determined need backed up on host.
6: Server creates a file containing the list of all files that need to be backed up on host. This allows the server to get rid of the agent and perform the backup at some later time. It is also a precausion taken incase the server cannot locate the hosts drive.
7: Server copies all files from host which are in the list created in step 6.
8: Server writes all successfully backed up file names to a log file and all file names that couldn't be backed up to another log. In the event some files could not be backed up, these files names are kept in the file mentioned in step 6 so that the backup of these files can be attempted again in the future.
9: Server archives and compresses all files just backed. The archive is called <host-name>_<current-time-in-milliseconds>.tar.bz2
10: If step 9 completed successfully all temporary files copied to server from host are removed.
------------------------------
8: FLOW OF EXECUTION : HOST
------------------------------
1: Host starts, looks for configuration file and initialises itself.
2: Host makes a request for an agent from the server.
3: Host accepts agent and tells it it's name, when an agent visited it last and what directories it wants the agent to look through (for files modified since it's last visit).
4: Once the agent is complete the host sends it back to the server.
5: Once the server has received the agent the host exits.
------------------------------
9: FLOW OF EXECUTION : AGENT
------------------------------
1: Agent is born on server.
2: Agent is sent to host.
3: Agent is given the hosts name, last backup time and list of directories to search for modified files. Agent is told to begin by host.
4: Agent recursively scans directories on host for files modified since the time of last backup. Any newer files are added to the agents list.
5: Agent is sent back to server.
6: Agent suppies server with list of file names it collected on host.