Yes, but that doesn't meet the key requirement I am proposing: that the
downtime be limited to a few seconds. The only potential significant
downtime with what I am proposing is merging COWs if there was a lot of
disk I/O during the 'snapshot mode'. The existing ability is certainly
useful, but not sufficient to get backups with the least impact to a
running system. Snapshot mode is suspend and flush/sync with the
additional semantics of an automatic temporary push/pop of a COW layer
on all filesystem images. The remaining issue of merging COW's could be
handled by RAID-recovery like gradual merging.
The problem is that if you have gigabytes of filesystem images, it takes
time to copy them, even if using something like rsync to determine what
has changed. It also costs, unless there is a program that determines
which blocks are actually utilized without reads, to scan zero holes in
I should have also mentioned that it should be possible to get a
reliable feed of what blocks, or ranges more likely have changed since a
certain event. This would allow very efficient, near-realtime
replication. At the very least this could be used in suspend or
'snapshot mode' for efficiency, but with proper push of blocks, buffer
visibility, or some kind of write-through notification, synchronization
could be realtime.
Jeff Dike wrote:
>On Sat, Jan 17, 2004 at 04:13:39PM -0500, Stephen D. Williams wrote:
>>When a UML instance is running a always-on service, such as a web
>>server, an administrator needs to be able to make live backups, or
>>replications, of a running system.
>>Can this be done now? What needs to be added to support it?
>Have you seen the stop, sysrq s, cp, go trick described at
swilliams@... http://www.hpti.com Personal: sdw@... http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw