From: Stephen D. W. <sd...@li...> - 2004-01-17 21:13:45
Attachments:
sdw.vcf
|
When a UML instance is running a always-on service, such as a web server, an administrator needs to be able to make live backups, or replications, of a running system. This can be done using LVM snapshots, although that is not always appropriate for just this feature. A UML instance can be paused and then restarted, but this can cause severe delays while large partitions are replicated. The COW ability is a great basis for an ideal solution, but I believe we need to identify and implement some additional features. What I propose as a useful solution is: A UML instance mounts filesystems directly or based on a COW image. When an administrator invokes a console snapshot mode, the UML instance causes a quick freeze, new delta COWs to be created and stacked on existing mounts, then resumes. When the administrator completes whatever snapshot backup is needed, they invoke a console unsnapshot command which pauses the instance, merges the delta COWs, remounts the original images with updates, and resumes. This relies on COW stacking, which I saw was added in a patch last year and I assume is still present. The downtime for the instance would be measured in seconds generally. Can this be done now? What needs to be added to support it? sdw -- swi...@hp... http://www.hpti.com Personal: sd...@li... http://sdw.st Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw |
From: Jeff D. <jd...@ad...> - 2004-01-18 04:29:00
|
On Sat, Jan 17, 2004 at 04:13:39PM -0500, Stephen D. Williams wrote: > When a UML instance is running a always-on service, such as a web > server, an administrator needs to be able to make live backups, or > replications, of a running system. > > Can this be done now? What needs to be added to support it? Have you seen the stop, sysrq s, cp, go trick described at http://user-mode-linux.sourceforge.net/mconsole.html ? Jeff |
From: Stephen D. W. <sd...@li...> - 2004-01-18 07:07:56
Attachments:
sdw.vcf
|
Yes, but that doesn't meet the key requirement I am proposing: that the downtime be limited to a few seconds. The only potential significant downtime with what I am proposing is merging COWs if there was a lot of disk I/O during the 'snapshot mode'. The existing ability is certainly useful, but not sufficient to get backups with the least impact to a running system. Snapshot mode is suspend and flush/sync with the additional semantics of an automatic temporary push/pop of a COW layer on all filesystem images. The remaining issue of merging COW's could be handled by RAID-recovery like gradual merging. The problem is that if you have gigabytes of filesystem images, it takes time to copy them, even if using something like rsync to determine what has changed. It also costs, unless there is a program that determines which blocks are actually utilized without reads, to scan zero holes in sparse files. I should have also mentioned that it should be possible to get a reliable feed of what blocks, or ranges more likely have changed since a certain event. This would allow very efficient, near-realtime replication. At the very least this could be used in suspend or 'snapshot mode' for efficiency, but with proper push of blocks, buffer visibility, or some kind of write-through notification, synchronization could be realtime. sdw Jeff Dike wrote: >On Sat, Jan 17, 2004 at 04:13:39PM -0500, Stephen D. Williams wrote: > > >>When a UML instance is running a always-on service, such as a web >>server, an administrator needs to be able to make live backups, or >>replications, of a running system. >> >>Can this be done now? What needs to be added to support it? >> >> > >Have you seen the stop, sysrq s, cp, go trick described at >http://user-mode-linux.sourceforge.net/mconsole.html ? > > Jeff > > -- swi...@hp... http://www.hpti.com Personal: sd...@li... http://sdw.st Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw |
From: <s-...@rh...> - 2004-01-18 16:24:06
|
On Sun, Jan 18, 2004 at 02:07:56AM -0500, Stephen D. Williams wrote: > Yes, but that doesn't meet the key requirement I am proposing: that the > downtime be limited to a few seconds. If you use the LVM on the host you can take a snapshot of the volume the ubd images are on while the uml is stopped. The snapshot gets made instantly so the uml would be stopped for very little time. -Steve |
From: BlaisorBlade <bla...@ya...> - 2004-01-18 18:30:22
|
Alle 22:13, sabato 17 gennaio 2004, Stephen D. Williams ha scritto: > When a UML instance is running a always-on service, such as a web > server, an administrator needs to be able to make live backups, or > replications, of a running system. > > This can be done using LVM snapshots, although that is not always > appropriate for just this feature. A UML instance can be paused and then > restarted, but this can cause severe delays while large partitions are > replicated. In general, this should be done with LVM. There is a strong idea that COW is somehow a duplication of LVM, and I read that Jeff Dike (IIRC) wanted it to be rewritten as a new snapshot format for LVM (the reason for a different format is that LVM snapshot were built thinking to partitions, while Uml can use sparse files - which is a great advantage if you're able to use it). It's maybe harder to setup, that's true, but see about EVMS which provides a nicer interface. Bye -- cat <<EOSIGN Paolo Giarrusso, aka Blaisorblade Linux Kernel 2.4.23/2.6.0 on an i686; Linux registered user n. 292729 EOSIGN |