From: Gordan B. <go...@bo...> - 2009-12-28 08:21:52
|
Last new thread for today, I promise. ;) I've been thinking about this for a while, and I think it's worth sharing. OSR is conceptually quite smilar virtualization - there is an init-root "hypervisor" that boot-straps the shared storage and then starts up the "guest" root chrooted to the shared storage. OpenVZ (http://en.wikipedia.org/wiki/OpenVZ) virtualization is very similar to this (very much like Solaris "zones" or FreeBSD "jails"). It starts up the "guest" installation in it's own chroot on the local file system, without actually having a disk image file container, and the virtualization abstraction layer is paper thin because the only things being virtualized for the guest are the process IDs (since some things are allegedly sensitive to init not being pid 1) and the networking (so that each guest can have it's completely independent network configuration). All this means that the performance penalty is negligible. The guest VM doesn't run it's own kernel - the host kernel does all the kernel tasks, the guest's lowest level is it's init. What I'm thinking about is coming up with an OSR modification that takes advantage of this - making the init root slightly more fully featured (useful for debug purposes!) so that it boots into it's own init, has it's own console login, and sets up the disk volumes (of whatever description) for the shared root guest. It then simply starts up the shared root guest VM. Now, I know this is a lot to take in (and yes, I know it sounds like a mad idea at first), and it is conceptually a pretty big change. But do you think it makes any sense to even look into going in this direction? The benefits would be: 1) A more fully featured standalone init-root host would allow for easier debugging. 2) The "guest" wouldn't need any modification or tweaks - this would have avoided a number of problems - e.g. the issue on the killall5 thread, volume mounting above/below the guest's init line (/etc/mtab thread), need for patches to the guest's halt/network init scripts, and possibly other things that show up the fragility of the initrd. 3) The guest wouldn't need any awareness of the file system it lives on or of any daemons required to sustain that file system - the host would take care of all of that with complete transparency. This means no need to worry about killing a process upon which things like the rootfs depends on. The reasons against this I can think of: 1) A tiny performance hit due to the networking stack and PID virtualization. (I don't think this would be measurable considering the inevitable cluster fs overheads.) 2) The initrd would end up being a bit bigger (if it ends up having it's own init and getties it would be doing more, so it's inevitable for it to grow slightly, but it would almost certainly grow by a lot less than the savings yielded recently by the pruning of the unused kernel modules and the pyc/pyo files. ;) 3) Any other unforeseen things that show up only once the prototype is built. This is a big one. There has been a whole array of bugs I had tripped in glfs because nobody ever considered the use-case of using it for a rootfs during testing (the biggest one off the top of my head was a massive memory leak stemming from mmap()-induced memory fragmentation that only arose when shared libraries were kept on glfs). I suspect this would likely expose similar problems - but I guess that is inevitable when straying off the straight and narrow. Gordan |
From: Gordan B. <go...@bo...> - 2009-12-29 08:28:06
|
For the sake of education and diversity I have just finished the first attempt at this, purely to see if there is an inherent problem in OpenVZ that might shoot this down. Well - I haven't found any such problems! :D The basic setup was this: Two identical host machines (virtual, because it was easier, but fully virtualized with KVM, CentOS 5.4). The host OS was stripped down to a bare minimum (a "mere" 850MB...) since I didn't feel applying the more sensibly sized OSR init root was vital for now and modding it would require extra work. Shared virtual disk (shared image, presented as an IDE device), with GFS on top. So far, so standard. The shared GFS device was mounted under /vz/private (where OpenVZ keeps the VM fs trees). CentOS 5.4 guest template was initialized in there. The guest config file was the same on both hosts, except for the IP address (the IP is configured on the host rather than the guest, but each guest can have independent iptables rules if required). Thus - the two guests were running on a GFS shared root. The one thing remaining would be to set up the entries in fstab to make sure that /cdsl.local gets bind mounted correctly at boot-up time, but other than that, I'd say the preliminary prototype test has passed. :) The basic thing I wanted to achieve is to have a cleaner separation between the host provided shared rootfs and the guest so that there are no issues during shutdown with unmounting file systems, etc. This prototype appears to have completely met those requirements. There is also an added bonus benefit that I hadn't thought of before - the guest can be cleanly rebooted without rebooting the host (which also means without triggering fencing and suchlike). Gordan |