Underdog -- Early and Late User Context Code
mkinitramfs & mkinitrd replacement and (eventually) alternate /init
Status: Planning
Brought to you by:
ibitobear
File | Date | Author | Commit |
---|---|---|---|
apps | 2023-08-26 |
![]() |
[a3bb3d] Make merged-usr compatible and some fixes |
configs | 2023-08-26 |
![]() |
[a3bb3d] Make merged-usr compatible and some fixes |
experimental | 2016-02-17 |
![]() |
[47b15a] -- (Ongoing) more work, nothing final |
include | 2023-08-26 |
![]() |
[a3bb3d] Make merged-usr compatible and some fixes |
prototype | 2023-08-26 |
![]() |
[a3bb3d] Make merged-usr compatible and some fixes |
src | 2023-08-26 |
![]() |
[a3bb3d] Make merged-usr compatible and some fixes |
utility | 2023-08-26 |
![]() |
[a3bb3d] Make merged-usr compatible and some fixes |
.gitignore | 2018-03-23 |
![]() |
[eb3ee1] -- (Maint) Just getting the chicken scratches i... |
CMakeLists.txt | 2023-08-26 |
![]() |
[a3bb3d] Make merged-usr compatible and some fixes |
README.txt | 2016-02-05 |
![]() |
[c3ef30] -- (Fix) Spelling! |
This is Underdog. GIT Note: The two main branches are "master" and "incomplete", The grossly experimental code and pre-alpha stuff is in the "incomplete" branch. It's a mess, but it's the interesting stuff. The "master" branch is the generally useful utilities so far. NOTE: The whats and Whys are after the installation section in the name of orginizational clarity. STATUS OVERVIEW: As of now, the underdog infrastructure _WILL_ successfully boot, including suspend/resume boots, of any combination of standard /boot and / devices, LVM2 devices, and cryptsetup devices locally attached to a computer. One of the test-bed machines is an unencrypted /boot, and then an encrypted region containing an LVM Volume Group named "System" with "Root" and "Swap" volumes. The system will properly suspend/resume to the encrypted System/Swap device etc. So underdog is usable as of this writing. _BUT_ _BE_ _WARNED_... This is proof of concept code so far and you have no warrenty from me that it won't fail in new and exciting ways at any time. _DO_ have alternate boot methods ready before you commit. (I have an bootable external drive for those cases and I wrote the thing... you would do well to have likewise.) INSTALLATION: (That is why you came here right?) (1) unpack the cpio archive or copy the directory tree to /opt/underdog. Yes, as of this writing it _MUST_ go in /opt/underdog because the absolute path /opt/underdog/prototype is used to build the initramfs. (2) Change directory to your linux build directory, and generate the initramfs description file. The name doesn't matter, but I use ".initramfs" to make it very like ".config" in convention. cd /usr/src/linux-(whatever) /opt/underdog/utility/make_initramfs_description.bash >.initramfs (3) while still inside your linux source directorydo either step 3a or step 3b. I actually do both, I use step 3b to test modifications, and then I do 3a when I have a stable setup. (3a) make menuconfig (etc) to set CONFIG_INITRAMFS_SOURCE=".initramfs" then build and install your new kernel with underdog fully integrated into the kernel image. (3b) Generate a stand-alone initramfs image for use in booting. This is done with the poorly named "gen_initramfs_list.sh" script provided by the kernel distribution. bash scritps/gen_initramfs_list.sh \ -o /boot/initramfs.img-$(uname -r) \ .initramfs (4) boot into the kernel. I have two entries in my config, one for booting normally, which has no "initrd" directive, and so uses the last stable version I built into the kernel. I have a second that does have the initrd directive so that I can test incremental changes. Installation Notes: -- "scripts/gen_initramfs_list.sh" is poorly named. It should be called "gen_initramfs_from_list.sh" since it makes the actual initramfs image from a list instead of making the list itself. The underdog utility make_initramfs_description.bash is what actually makes the list. -- underdog must be in /opt/underdog because of one line in make_initramfs_description.bash. If you know what you are doing you can change that one line after installing it somewhere else. I could have automated this but consider it the "you must know this much to mess around in here" test. -- You _DO_ _NOT_ normally need to re-run the make description script if you edit /opt/underdog/prototype/init or if your system libraries are updated but maintain the same names and paths. The description file is just a plain text list of path names that should be included in the initramfs image. -- You _DO_ _NOT_ need to re-run the make description script, nor do you need to rebuild your kernel or initramfs, if you change the list of attached hardware. The initramfs image built is fully plug-n-play for your storage systems as long as they can be seen by the kernel. -- You _DO_ need to re-run the make description phase if you change system files you want/need to include. This includes adding cryptsetup, lvm2, and mdadm support to your system _IFF_ you need that support to boot. It _DEFINITELY_ includes the case where you update a system library or tool in such a way that it's name changes, and that library or tool is part of your boot needs. The make description step decides on the names and paths of files, not the actual generation of the initramfs by the kernel scripts. -- The system uses the _full_ binaries as of this writing. Nothing is copied aside, stripped, or in need of rebuilding with static libraries. This means that some memory is used up during boot, but that use is somewhat transient and will be steadily reduced as time goes by, but there are no plans to eliminate this "waste" entirely. What you need to boot safely, you need to un-boot as well. What Is Going On and Why: This project is a means to capture and maintain the Linux kernel early user space so that the root and subsequent mounts made in that context can be completely undone during shutdown. To date, Unix and Linux perform a last-ditch effort to safe their storage by unmounting what they can, then remounting the remaining file systems as read-only, then syncing the disks and exiting. This is sub-optimal in several ways. (1) Remounting a file system read-only is not the same as unmounting it. It's been "close enough" for years, but from the perspective of some elements it is inferior. This may include checkpoint activities of solid-state file systems etc. now and who knows what gymnastics in the future. (2) Stateful storage servers (e.g. iSCSI et. al.) may hold non-trivial resources, locks, and who-knows what else open because the drive media is still mounted, and likely still mounted read-write, even if the filesystem thereon has been switched to read-only. (3) Using kernel built-ins to mount NFS is okay, but not so much for NFS4, and no such support exists for iSCSI etc. So it's time to start ignoring the root-on-nfs type kernel bits and get them into the early user context. (4) It offends my sense of symmetry to leave things open while the system dies. 8-) The conceptual solution is fairly simple. If early user context (initramfs environments) can connect up all this stuff, then returning to that context should allow the disconnect of same. The eventual full solution will be to do several things: -- Replace the initramfs /init, which is usually a script of some sort, with an executive that will do the normal /init stuff we expect. -- Have that executive use the clone() syscall to create a new process and mount name-spaces so that the "real" root file system can be positioned and then run normally. (this is done instead of a switch_root) -- Use exec*() in the clone to run the "real" /sbin/init as a virtual pid==1 process that the /init in the early user context can wait for it. -- Kill the /sbin/init instead of calling halt/reboot, but in such a way that the initramfs knows whether to halt or reboot. -- The death of /sbin/init will clean up all it's local mounts, so /init can then clean up all the actual mounts it made prior to the clone. -- Now that the system is as clean as if it never mouted or attached anything, the halt or reboot (or kexec, or loop back to the clone step) can proceed. ROADMAP: Currently the project consists of just a couple of scripts in a directory structure. So WTF you ask? Proof of Concept, Stage 1: In this stage I built the means to easily make my own initramfs image descriptions. There were no good tools to do this. All of the existing systems were complex and full of assumptions about how the boot process should work. They also built the intramfs image itself as a monolithic process. Nothing out there would just take a list of files, resolve the dependencies and conflicts, and then output the descriptor file. So I fixed that omission. Proof of Concept, Stage 2: (Our current stage) I am producing an init script based entirely proven tools that will conditionally assemble an entire system. That is, a system that will honor suspend/resume, designation of root=, use of cryptseupt, lvm, mdadm. Proof of Concept, Stage 3: Scrap busybox switch_root in favor of the first real underdog binary. This binary will do the clone_and_run and will eventually exit when the /sbin/init process is killed in the "real" environment. Alpha Releases: Once the proof of concept stages work, the underdog binary will become /init, and the current init script will become /init.bash or similar. The initial alpha version will surround the execution of the shell script, and handle the mounts of /proc and /sys and optionally /dev in the early user context, as well as the clone_and_exec. In the alpha releases stage it is expected that features will be added to the shell script (e.g. iSCSI and nfs support) even as features migrate from the shell script into the underdog binary. UNDECIDED GOALS: -- I am thinking of replacing sysvinit. The current version of sysvinit has no well-defined exit behaviors. Right now if you manage to kill init you get a kernel panic; which will not happen within underdog because the "apparent runtime init" won't be the real system_wide init process. Killing underdog would, of course, have that same panic as there must be a real first process running, but that's not important. Besides, it isn't like sysvinit is a terribly complex piece of logic... The _main_ reason for replacing sysvinit would be to capture and properly deal with things like "/sbin/halt". -- I vaguely envision the complete absence of os-level shell scripts in the early user context. There isn't anything wrong with shell scripts per-se, but if I can get things expressed in terms of shared object file plugins then I can really clean out the initramfs once the system is booted (as switch_root does) but still be left with everything needed for underdog to do its cleanup. Simpler systems, such as embedded devices, could have a real win if the whole default boot was reduced to one stripped executable that only had the tidbits needed to run, and then do cleanup, "stuck" in memory. -- I envision being able to easily start more than one /sbin/init (root environment) with its own mount and process space by passing a message back to the early user context, or by specifying root= more than once on the command line. This would beat the heck out of "user mode linux" or chroot for isolating functions and features. -- It should be reasonable, once we are free of the tyriany of current root= interpretations done by the kernel, to compose one or file system hierarchies out of fragments and overlays. This sounds wierd at first, but imagine having different versions of libc in differnt directories and selecting the one desired at rutime. That sounds contrived i'm sure but it has interesting possibilities. Alternately consider a firewall (etc) box that has a minimal root and a series of maintenance-ready root overlays and such for, well, mainenance. Robert White <rwhite@pobox.com>