From: Craig B. <cba...@us...> - 2011-03-02 08:32:10
|
Jeffrey suggested I outline some of the features in 4.0 to solicit feedback and discussion. I apologize for the delay in doing this. I've been been exceptionally busy during the last few months. Rather than write one huge email I'll cover different topics in different emails. My descriptions here are relatively terse and assume an understanding of BackupPC 3.x. There are some quite significant changes and improvements in 4.0. The first topic I want to cover is how backups are stored. In 3.x, a full backup is stored in a "filled" manner, and incrementals are stored as forward-time deltas from the reference backup, which is either the last full or a prior incremental based on $Conf{IncrLevels}. In 3.x incrementals always have a full directory tree. In 3.x to create a view of the last backup, one or more stored backups needs to be merged together in a forward-time manner, starting with the reference full, and cumulating the forward deltas. Since the most recent backup is the most commonly used (for rsync or restores), there is an overhead merging all the backup trees to create a view of the most recent backup. This also creates a dependency on deleting backups. You cannot delete a backup if a more recent one depends on it; the common example is a full followed by incrementals. The full can't be deleted until the incrementals are deleted. In 4.0, backup storage is quite different. The most recent backup is always filled, and the prior backups are stored as reverse time deltas. There could be no connection between a full/incremental and backups being filled or not. There is no longer a concept of $Conf{IncrLevels}. It is possible to have earlier backups filled to reduce the number of merges required to reconstruct an old backup. Also, there is no need to store full directory trees on the deltas - only filled backups store a full tree. A backup starts by simply renaming the most recent backup, eg, $TopDir/pc/HOST/15 to $TopDir/pc/HOST/16, and an initially empty tree is created below $TopDir/pc/HOST/15. The backup proceeds by using $TopDir/pc/HOST/16 as the reference (in the case of rsync), and each time there is a change, $TopDir/pc/HOST/16 is updated, and the opposite change is made below $TopDir/pc/HOST/15. This is a big improvement over 3.x since very few disk writes are needed if the client data hasn't changed very much (currently 3.x creates a full tree of hardlinks when you do a full backup even with no changes). This approach changes the deletion dependencies. The oldest backup can be deleted at any time, and more generally the oldest backup of a chain (ie: if the next older one is filled) can be deleted at any time. Any other backup can be deleted too, but it requires the deltas to be merged with the next older backup. A filled backup can be deleted too, and it will be merged to create a new filled backup with the prior deltas. Everything described above is already implemented. The one open issue is when and how an intermediate filled backup is created. One approach is to continue to connect the concept of a full backup and a filled backup (although the design allows them to be decoupled). The code continues to support the 3.x storage format, so you can upgrade to 4.x and still access/view/restore the 3.x backups. However, the first backup after upgrading to 4.x will need to be a full to establish the first filled reference backup. Craig |