Thread: Re: [BackupPC-users] Reducing Network Requirements

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> Are there plans to implement a prorietary client to minimise network usage
> by implementing true incremental/delta backups? a.la rsync algorithm or
> equivalent.

I am planning on adding tar over ssh and eventually rsync as
additional transport layers for unix/linux clients.  Rsync might
need a few additional features to make it work with BackupPC, but
I still need to see if I can make it work with an unmodified rsync.

I haven't contemplated doing a custom client, mainly because I don't
know a thing about win32 development and I generally like the idea
of not needing to install or maintain client-side software.  For
unix/linux, rsync (or a slight addition to it) should work well.
For win32, in addition to reducing network traffic, a custom client
might also be able to workaround open-file and file locking issues,
and also provide faithful saving of file attributes.

> By splitting file data into delta's (compared across and referenced by
> complete data set)
> I suspect online disk requirements can be reduced much much more.

Yes!!  I have been thinking about using something like xdelta1
(see http://sourceforge.net/projects/xdelta) to do binary file
deltas, but that would only be against the previous file of the
same name from the same machine.

Better yet (as you suggest), I have also been thinking that there
should be a way to change the pool to be a pool of snippets or
deltas, rather than complete files.  This could improve the overall
storage efficiency from the current 6-8x to perhaps 20x or more,
which would be really great.  The problem is I haven't figured out
how to do this.  Suggestions and discussion are welcome!

> Lastly, should consider the structure of files on disk to
> support integration with a HSM backend (for tape storage)
> inline with data retention. As opposed to writing your own.

These issues are very important.  The current hardlink structure
is not friendly to many tape backup systems.  I'm a novice in the
area of tape backup systems, so I'd be very happy to learn more
and figure out how to better glue BackupPC into other backend
systems (both open source and commercial).

> Their my thoughts, happy to code and contribute, however looking for
> direction and an
> understanding of what is already in the works.

If you want to help design and code that would be great!!  Here's my
current roadmap for v1.  Dramatic improvements (like making the pool
an rsync-like database of snippets and adding client tools) would
probably form the basis of v2:

 1. Adding tar over ssh as an alternative to smbclient for unix/linux
    clients.  This will allow symlinks and other special files to be
    backed up correctly.  Later I will add rsync as a third transport
    layer (in addition to smb and tar/ssh).

 2. Providing a utility to copy the pool and all data files.  Useful for
    efficiently migrating the BackupPC data to a new (bigger) filesystem
    while preserving all the hardlinks.  (Perhaps gnu "cp -dR" will be
    fast enough instead; I'll benchmark both.)

 3. Saving attributes (unix owner, group, permissions, mtime, atime etc).
    I don't know enough about Windows ACLs to know how to save them too
    (this would need updates to smbclient).

 4. Generating zip or tar archives for restore. Currently files can
    only be restored one at a time, unless you mount the server's disk
    and drag files from the client.  Generating a zip or tar archive
    will allow the original attributes to be merged and restored too.

 5. Ability to split the pool across several file systems.  Since
    I use hardlinks to efficiently store repeated files, everything
    must currently be on a single file system.  New 120GB drives
    are just becoming available, so you could run BackupPC on two
    or three 120GB file systems without worrying about merging the
    physical drives into one big file system.  Tools will allow you
    to resplit the pool if you need to add more drives later.

 6. Update the CGI script BackupPC_Admin so it will optionally
    run under mod_perl.  This would require a dedicated httpd,
    running as the BackupPC user.  The advantages are that it
    would eliminate the current setuid, and it would be much
    faster (it would have persistent connections to the BackupPC
    server, and also cache the hosts file).

 7. Implementing binary file diffs so that files with small changes are
    stored in a binary diff form (maybe using xdelta1).

I'm planning on implementing 1-3 for the next version (v1.04), hopefully
by mid or late January, and maybe the next two or three for the version
after that.

If you want to pitch in with, say, #4 or #6, or instead look at longer
term goals (pool structure, optional client side software, or backend
tape integration), then that would be great!

Regards,
Craig

Thread: Re: [BackupPC-users] Reducing Network Requirements

backuppc-users