From: Craig B. <cba...@us...> - 2007-02-27 07:53:50
|
Jason writes: > I was just thinking today about the structure of the pc/* folders and=20 > wondered why it is that the files stored in such a strange way. I=20 > realize the need for compression is an important motivator. But=20 > couldn't that be addressed by installing a compressed filesystem in a=20 > partition mounted where the backup data goes? Why the mangling of=20 > filenames? Presumably it's used to disambiguate what is data and what=20 > is metadata? Maybe storing this information and other metadata=20 > someplace separately for a host would be better? Yes, it was done to keep data and metadata separate. Also, charset conversions also mean the server file names might differ from the client. > What I was getting at in this, mainly, is that the Linux operating=20 > system is flush with tools to operate on files and filesystems, but by=20 > essentially creating a separate and incompatible set of files, BackupPC= =20 > cannot benefit from that functionality. Even if the files were named the same, they are compressed and don't have the right metadata. So while it might appear standard tools are more useful, there are pitfalls. > My desire: it would be really nice if the file hierarchy reflected the=20 > reverse-changes rather than the changes. Currently, the way backups ar= e=20 > set up, in order for you to get the current status of a machine, you=20 > must "replay" the changes from the last full backup--I know, BackupPC=20 > does this for you in code--this is called a Redo stack. An Undo stack=20 > has some benefits, though. In effect, it would maintain the exact=20 > status of the host PC as a 'filled' copy of everything, and as you go=20 > back revisions, you have previous versions of files that were changed. =20 > The pool structure need not change, only the presentation of the=20 > hardlinking of machines. >=20 > The benefit? Restores are brain-dead easy, and do not require BackupPC= =20 > to run. Tarring off full versions of machines is similarly supported=20 > without any effort on BackupPC's part. Many other features, too. Goin= g=20 > back versions of a file still requires a bit of searching through=20 > revision directories, but it's something that BackupPC *should* be doin= g=20 > for users. What users will want to do with minimum intervention on=20 > BackupPC's part is dealing with whole machine snapshots. IMHO, at leas= t. This has been discussed before. I agree that storing backups in the way you describe would be better. The most recent snapshot is always filled, no matter whether it was created with an incemental or full. Prior snapshots simply store the changes, backwards in time. Since the most recent snapshot is typically used for browsing and restore this is the most efficient approach. Older snapshots are recreated by merging backwards (opposite to the forward merging done currently). The other big advantage is that the oldest snapshot can be expired at any time since the snapshot dependencies are in reverse time order. However, the challenges with implementing this are: - doing a new backup is problematic. There are two approaches: - make a complete, new, filled snapshot, and then prune the prior filled snapshot to just represent the reverse difference. However, this involves making hardlinks for every file in the backup, even if just an incremental backup is being done. This approach would be the easier one to implement, but would have a performance penalty. - update the filled snapshot in-place as the backup is done, and concurrently create a delta snapshot that represents what will become the prior snapshot. This is more efficient, but a lot harder to implement. The major design issue is how to handle a backup failure: the code would have to undo all the changes to restore the filled snapshot to its original state. - this would be an incompatible change, and it would be a lot of work to support both storage methods for backward compatibility. So either= the implementation complexity is high > Thanks for your consideration, and please forgive me if I'm drastically= =20 > oversimplifying things. I do love what it does for me, and am simply=20 > trying to suggest possible improvements. Overall your suggestion is a good one, and if I was writing BackupPC over again I would do it this way. Maybe this is how 4.x will work... Craig |