> I like this feature:
> - A clever pooling scheme minimizes disk storage. Identical files
> across multiple backups of the same or different PCs are stored only
> once resulting in substantial savings in disk storage.
> However I would imagine you have to transfer these over the network
> regardless, considering there is no client to perform and compare checksums.
Yes, that's right. As you correctly note the checksum calculation
happens on the server, not the client, so all the data needs to be
transferred. In any case, you still need to do a complete file
compare to be 100% sure you have a match. So even if you moved the
checksum to the client you would still be transferring the same
amount of data (in this case from the server back to the client)
to do the compare.
I need to look into what programs like rsync do. I presume they
compute digests on portions of each file to make the chance of a
false match infinitesimally small (but it's still non-zero). So
this avoids transferring the entire file, with the price that
you might get it wrong once every 10^x years...
There is one big saving on the server side I'm developing for a new
version. Currently all the client data is written to server disk by
tar, and then checksums are computed and identical files are replaced by
hard links. So every incoming file is currently written to disk, and
then read for a complete compare against the pool. So for every 1MB
file there is a 1MB write and two 1MB reads. The new version replaces
tar with code that computes the checksums and does the compare on the
fly, eliminating the write and one read. Files match the pool 80-90%
of the time, so I can reduce writes by a factor of 5-10 and reads by
a factor of around 2. A future version will also add compression and
binary file deltas to further reduce the server storage.
But in all these cases all the raw data still needs to be transferred
to the client. I have not contemplated the use of BackupPC over a
slow network where minimizing network traffic is a goal...