Underdog -- Early and Late User Context Code

mkinitramfs & mkinitrd replacement and (eventually) alternate /init

Status: Planning

Brought to you by: ibitobear

Tree [a3bb3d] master / History

HTTPS access

File	Date	Author	Commit
apps	2023-08-26	Robert White	[a3bb3d] Make merged-usr compatible and some fixes
configs	2023-08-26	Robert White	[a3bb3d] Make merged-usr compatible and some fixes
experimental	2016-02-17	Robert White	[47b15a] -- (Ongoing) more work, nothing final
include	2023-08-26	Robert White	[a3bb3d] Make merged-usr compatible and some fixes
prototype	2023-08-26	Robert White	[a3bb3d] Make merged-usr compatible and some fixes
src	2023-08-26	Robert White	[a3bb3d] Make merged-usr compatible and some fixes
utility	2023-08-26	Robert White	[a3bb3d] Make merged-usr compatible and some fixes
.gitignore	2018-03-23	Robert White	[eb3ee1] -- (Maint) Just getting the chicken scratches i...
CMakeLists.txt	2023-08-26	Robert White	[a3bb3d] Make merged-usr compatible and some fixes
README.txt	2016-02-05	Robert White	[c3ef30] -- (Fix) Spelling!

Read Me

This is Underdog.

GIT Note: The two main branches are "master" and "incomplete",
The grossly experimental code and pre-alpha stuff is in the
"incomplete" branch. It's a mess, but it's the interesting stuff.
The "master" branch is the generally useful utilities so far.

NOTE: The whats and Whys are after the installation section
in the name of orginizational clarity.

STATUS OVERVIEW: As of now, the underdog infrastructure _WILL_
successfully boot, including suspend/resume boots, of any combination
of standard /boot and / devices, LVM2 devices, and cryptsetup devices
locally attached to a computer.  One of the test-bed machines is an
unencrypted /boot, and then an encrypted region containing an LVM Volume
Group named "System" with "Root" and "Swap" volumes. The system will
properly suspend/resume to the encrypted System/Swap device etc.

So underdog is usable as of this writing.

_BUT_ _BE_ _WARNED_... This is proof of concept code so far and you have
no warrenty from me that it won't fail in new and exciting ways at any
time.  _DO_ have alternate boot methods ready before you commit. (I have
an bootable external drive for those cases and I wrote the thing... you
would do well to have likewise.)

INSTALLATION:

(That is why you came here right?)

(1) unpack the cpio archive or copy the directory tree to
/opt/underdog. Yes, as of this writing it _MUST_ go in /opt/underdog
because the absolute path /opt/underdog/prototype is used to build
the initramfs.

(2) Change directory to your linux build directory, and generate
the initramfs description file. The name doesn't matter, but I
use ".initramfs" to make it very like ".config" in convention.

cd /usr/src/linux-(whatever)
/opt/underdog/utility/make_initramfs_description.bash >.initramfs

(3) while still inside your linux source directorydo either step 3a
or step 3b. I actually do both, I use step 3b to test modifications,
and then I do 3a when I have a stable setup.

(3a) make menuconfig (etc) to set CONFIG_INITRAMFS_SOURCE=".initramfs"
then build and install your new kernel with underdog fully integrated
into the kernel image.

(3b) Generate a stand-alone initramfs image for use in booting. This
is done with the poorly named "gen_initramfs_list.sh" script provided
by the kernel distribution.

bash scritps/gen_initramfs_list.sh \
  -o /boot/initramfs.img-$(uname -r) \ .initramfs

(4) boot into the kernel. I have two entries in my config, one for
booting normally, which has no "initrd" directive, and so uses the last
stable version I built into the kernel. I have a second that does have
the initrd directive so that I can test incremental changes.

Installation Notes:

-- "scripts/gen_initramfs_list.sh" is poorly named. It should be called
"gen_initramfs_from_list.sh" since it makes the actual initramfs image
from a list instead of making the list itself. The underdog utility
make_initramfs_description.bash is what actually makes the list.

-- underdog must be in /opt/underdog because of one line in
make_initramfs_description.bash. If you know what you are doing you can
change that one line after installing it somewhere else. I could have
automated this but consider it the "you must know this much to mess
around in here" test.

-- You _DO_ _NOT_ normally need to re-run the make description script
if you edit /opt/underdog/prototype/init or if your system libraries
are updated but maintain the same names and paths. The description file
is just a plain text list of path names that should be included in the
initramfs image.

-- You _DO_ _NOT_ need to re-run the make description script, nor do
you need to rebuild your kernel or initramfs, if you change the list of
attached hardware.  The initramfs image built is fully plug-n-play for
your storage systems as long as they can be seen by the kernel.

-- You _DO_ need to re-run the make description phase if you change system
files you want/need to include. This includes adding cryptsetup, lvm2,
and mdadm support to your system _IFF_ you need that support to boot. It
_DEFINITELY_ includes the case where you update a system library or
tool in such a way that it's name changes, and that library or tool is
part of your boot needs. The make description step decides on the names
and paths of files, not the actual generation of the initramfs by the
kernel scripts.

-- The system uses the _full_ binaries as of this writing. Nothing
is copied aside, stripped, or in need of rebuilding with static
libraries. This means that some memory is used up during boot, but that
use is somewhat transient and will be steadily reduced as time goes by,
but there are no plans to eliminate this "waste" entirely. What you need
to boot safely, you need to un-boot as well.


What Is Going On and Why:

This project is a means to capture and maintain the Linux kernel early
user space so that the root and subsequent mounts made in that context
can be completely undone during shutdown.

To date, Unix and Linux perform a last-ditch effort to safe their storage
by unmounting what they can, then remounting the remaining file systems
as read-only, then syncing the disks and exiting. This is sub-optimal
in several ways.

(1) Remounting a file system read-only is not the same as unmounting
it. It's been "close enough" for years, but from the perspective of
some elements it is inferior. This may include checkpoint activities
of solid-state file systems etc.  now and who knows what gymnastics in
the future.

(2) Stateful storage servers (e.g. iSCSI et. al.) may hold non-trivial
resources, locks, and who-knows what else open because the drive media
is still mounted, and likely still mounted read-write, even if the
filesystem thereon has been switched to read-only.

(3) Using kernel built-ins to mount NFS is okay, but not so much for NFS4,
and no such support exists for iSCSI etc. So it's time to start ignoring
the root-on-nfs type kernel bits and get them into the early user context.

(4) It offends my sense of symmetry to leave things open while the system
dies. 8-)

The conceptual solution is fairly simple. If early user context (initramfs
environments) can connect up all this stuff, then returning to that
context should allow the disconnect of same.

The eventual full solution will be to do several things:

-- Replace the initramfs /init, which is usually a script of some sort,
with an executive that will do the normal /init stuff we expect.

-- Have that executive use the clone() syscall to create a new process and
mount name-spaces so that the "real" root file system can be positioned
and then run normally. (this is done instead of a switch_root)

-- Use exec*() in the clone to run the "real" /sbin/init as a virtual
pid==1 process that the /init in the early user context can wait for it.

-- Kill the /sbin/init instead of calling halt/reboot, but in such a
way that the initramfs knows whether to halt or reboot.

-- The death of /sbin/init will clean up all it's local mounts, so /init
can then clean up all the actual mounts it made prior to the clone.

-- Now that the system is as clean as if it never mouted or attached
anything, the halt or reboot (or kexec, or loop back to the clone step)
can proceed.


ROADMAP:

Currently the project consists of just a couple of scripts in a directory
structure. So WTF you ask?

Proof of Concept, Stage 1: In this stage I built the means to easily
make my own initramfs image descriptions. There were no good tools to do
this. All of the existing systems were complex and full of assumptions
about how the boot process should work. They also built the intramfs
image itself as a monolithic process. Nothing out there would just take
a list of files, resolve the dependencies and conflicts, and then output
the descriptor file. So I fixed that omission.

Proof of Concept, Stage 2: (Our current stage) I am producing an init
script based entirely proven tools that will conditionally assemble
an entire system.  That is, a system that will honor suspend/resume,
designation of root=, use of cryptseupt, lvm, mdadm.

Proof of Concept, Stage 3: Scrap busybox switch_root in favor of the
first real underdog binary. This binary will do the clone_and_run and
will eventually exit when the /sbin/init process is killed in the "real"
environment.

Alpha Releases: Once the proof of concept stages work, the underdog binary
will become /init, and the current init script will become /init.bash or
similar. The initial alpha version will surround the execution of the
shell script, and handle the mounts of /proc and /sys and optionally
/dev in the early user context, as well as the clone_and_exec. In the
alpha releases stage it is expected that features will be added to the
shell script (e.g.  iSCSI and nfs support) even as features migrate from
the shell script into the underdog binary.

UNDECIDED GOALS:

-- I am thinking of replacing sysvinit. The current version of sysvinit
has no well-defined exit behaviors. Right now if you manage to kill
init you get a kernel panic; which will not happen within underdog
because the "apparent runtime init" won't be the real system_wide init
process. Killing underdog would, of course, have that same panic as there
must be a real first process running, but that's not important. Besides,
it isn't like sysvinit is a terribly complex piece of logic... The _main_
reason for replacing sysvinit would be to capture and properly deal with
things like "/sbin/halt".

-- I vaguely envision the complete absence of os-level shell scripts in
the early user context. There isn't anything wrong with shell scripts
per-se, but if I can get things expressed in terms of shared object file
plugins then I can really clean out the initramfs once the system is
booted (as switch_root does) but still be left with everything needed for
underdog to do its cleanup. Simpler systems, such as embedded devices,
could have a real win if the whole default boot was reduced to one
stripped executable that only had the tidbits needed to run, and then
do cleanup, "stuck" in memory.

-- I envision being able to easily start more than one /sbin/init (root
environment) with its own mount and process space by passing a message
back to the early user context, or by specifying root= more than once
on the command line. This would beat the heck out of "user mode linux"
or chroot for isolating functions and features.

-- It should be reasonable, once we are free of the tyriany of current
root= interpretations done by the kernel, to compose one or file system
hierarchies out of fragments and overlays. This sounds wierd at first,
but imagine having different versions of libc in differnt directories
and selecting the one desired at rutime. That sounds contrived i'm sure
but it has interesting possibilities. Alternately consider a firewall
(etc) box that has a minimal root and a series of maintenance-ready root
overlays and such for, well, mainenance.


Robert White <rwhite@pobox.com>

Underdog -- Early and Late User Context Code

mkinitramfs & mkinitrd replacement and (eventually) alternate /init

Branches

Tree [a3bb3d] master / Download Snapshot History

Read Me

Tree [a3bb3d] master /

History