Craig Barratt wrote at about 23:43:40 -0800 on Wednesday, March 2, 2011:
> Jeffrey writes:
> > Craig Barratt wrote at about 00:31:58 -0800 on Wednesday, March 2, 2011:
> An open question is what approach should I use for filling a
> backup? Should it be configured (ie: every Nth backup is
> filled)? Or should it be based on whether the backup was a
> full or not (making it similar to 3.x)?
I would vote for 'configurable' plus the ability to manually go back
and fill (and ideally also unfill as per my earlier note) intermediate
Similarly, I like the idea of having a supported script to remove any
intermediate backup (whether filled or unfilled) because sometimes one
realizes that an intermediate backup has no value and is just wasting
Conversely, I would like to raise a suggestion I mentioned a while
back with reference to 3.x. I think it would be great to have the
ability to mark a backup to be saved and not automatically deleted
based upon the expiry rules. Currently, I can fake it by renaming the
backup (+/- adding a symlink to the original name). But it would be
really nice to have an officially-supported convention that allows
individual backups to be protected. My recommendation would be to add
a suffix (e.g., .save) to the backup number. The particular use case I
have in mind is when you upgrade a system (or otherwise make major
changes) and specifically want to save the last backup of the
In general, while most times, we might rely on a 1-time configuration
of how many backups to save and which ones to fill, sometimes, users
might want to go back and prune or otherwise adjust their backup
tree. For example, suppose there is an old unfilled backup that you
happen to reference a lot because it stores some key files, then you
may want to fill it or mark it to be saved. Conversely, you may want
to delete some intermediate backups (or just unfill them) if they are
not very representative.
> > If so, does that mean that there is now a concept of how many backups
> > between fills that replaces the old incremental level concept?
> Not yet, but it's an open design question.
> > And if so, is the choice of how many backups between fills really just
> > boil down to a tradeoff between storage efficiency (due to duplication
> > of directory trees and attrib files) vs. speed of reconstructing
> > intermediate backups?
> Exactly right!
> > This is awesome! Certainly should be a big speedup given that disk IO
> > is often the primary bottleneck.
> That's right.
> > > This approach changes the deletion dependencies. The oldest backup can
> > > be deleted at any time, and more generally the oldest backup of a chain
> > > (ie: if the next older one is filled) can be deleted at any time. Any
> > > other backup can be deleted too, but it requires the deltas to be merged
> > > with the next older backup. A filled backup can be deleted too, and it
> > > will be merged to create a new filled backup with the prior deltas.
> > Am I write in assuming that this means you are exposing perl library
> > routines (and maybe even full scripts) that:
> > 1. Allow one to manually convert a non-filled to a filled backup
> The script I have just duplicates the most recent (filled) backup,
> so that's the way a new filled backup is created. Doing it in the
> middle of the chain would be more general, but I haven't done it
> that way.
> > 2. Delete any backup (whether filled or unfilled) and automatically
> > fill the prior backups so that the integrity of the chain is
> > preserved.
> Yes, this is done. And yes, that's exactly what's required - you
> have to merge the changes into the prior backup (if it's not filled)
> and keep the reference counts correct.
> > 3. Conversely, is there a routine that would "unfill" a filled backup
> > by converting it to a delta relative to the next most recent
> > backup? (this should be possible and could be useful in some cases)
> No, this isn't done. It's an interesting idea.
> > I'm really struggling to understand what benefit there is anymore to
> > the notion of a "full" backup and whether it just adds more confusion
> > to have both a full vs. incremental and a filled vs. unfilled concept.
> You're right. "Full" and "Incremental" are probably the wrong
> terms to use. It really means how "thorough" the backup is. For
> rsync, currently "full" means checksum the blocks on both sides,
> and incremental means just check the metadata. But a reasonable
> "full" could be just verify the full-file checksum (and compare
> it on the server like any other meta data) using the --checksum
> option. This takes very low server load. Or you could have a
> probability configuration (ie: roll the dice) that determines
> which files get the block-checksum compare, and which ones get
> just the full-file comparison.
> > As per my earlier email, is there code to do a one-time forward
> > conversion of 3.x backups to 4.x backups? The goal of course would be
> > to get rid of the hardlinks while also benefiting from the more robust
> > md5sum checksuming of 4.x?
> No, not currently. See my previous reply.