Menu

Multiboot kFreeBSD

WHR

An experiment to make a Multiboot-compliant kFreeBSD i386 image

Copyright 2019-2021 Rivoreo

Licensed under Creative Commons Attribution-Sharealike 4.0 International.

As a FreeBSD kernel hacker, I knowns there are only 2 bootloaders I known that capable for booting the kernel of FreeBSD for x86; one is the BTX loader(8) that is a part of FreeBSD project, while another is GRUB 2.
It could be very useful to make the FreeBSD kernel bootable with Multiboot protocol, as Multiboot are supported by many x86 bootloaders, such as GRUB (both legacy and version 2), syslinux (via mboot.32) and Linux kexec(2); as well as QEMU direct kernel boot, this could make kernel debugging easier.

Terms used in this article

  • kFreeBSD kernel of FreeBSD; the kernel image is usually located at /boot/kernel/kernel in FreeBSD.
  • Multiboot module file preloaded into memory by a multiboot bootloader for future use by the kernel.
  • KLD module kernel module used by kFreeBSD that links into original kernel image; this type of module can be either preloaded by bootloader or loaded later using kldload(2) system call; the KLD module file must be an ELF file and have file name ended with .ko.
  • initrd initial RAM disk; preloaded by bootloader into memory, to use as root disk, that mount at / by kernel; it could be a Multiboot module in Multiboot-compliant kernels.
  • BTX loader the bootloader in FreeBSD operating system that is responsible to boot kFreeBSD, usually located at /boot/loader; it is the third stage of FreeBSD booting progress; this program itself is a BTX client that linked with BTX kernel, so-called BTX loader; see man page loader(8) in FreeBSD for more details.

This work is based on Making DragonFly BSD operating system compliant with the Multiboot specification by Radek Szymczyszyn. Since DragonFly BSD is a fork of FreeBSD 4, the kernel startup progress is very similar between those 2 operating systems; this saved a lot of initial works to make kFreeBSD bootable by Multiboot bootloaders.

The final work has been released as a diff file for FreeBSD base 10.3-RELEASE source tree. This diff file itself is released under the FreeBSD license; the full license text are:

Copyright 2019 Rivoreo

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright notice,
   this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

Software

A FreeBSD 10.3-RELEASE-p29 amd64 operating system is used to develop the kernel; the target kernel version is also 10.3-RELEASE-p29, with many customizations.
The targeted kernel architecture is i386.

Testing is done on a QEMU i386 system emulator, that running on a Debian GNU/Linux 9 operating system.

Goals

There are 5 primary goals or stages in the progress of this project;
1. Add a multiboot header, to make the kernel image to be recognized as a multiboot kernel.
2. Make the kernel bootable from multiboot bootloader.
3. Parse the command line passed from multiboot bootloader, to construct kFreeBSD-specific boothowto variable and kernel environment variables.
4. Use multiboot module passed from bootloader as an initrd.
5. Use multiboot modules passed from bootloader as kFreeBSD KLD modules.

In addition to those goals, it is obvious that the kernel image should still be compatible with BTX loader; however it is not possible for BTX loader provided by version 10.3-RELEASE or later, due to a design issue of BTX loader.

Since 10.3-RELEASE, the BTX loader included in distribution supports Multiboot, but only for Xen (the Xen image is a Multiboot-compliant kernel). The real issue is the loader detecting multiboot kernel image at first; it will trying to load kernel with Multiboot protocol only, if a Multiboot-compliant kernel is detected. Because this multiboot support in BTX loader is designed for Xen only, it requires the first Multiboot module be original kFreeBSD image.
Trying to load a Multiboot kernel without loading any Multiboot modules would resulted in BTX loader complaining 'No FreeBSD kernel provided, aborting'. This happens right after achievement of goal 1.

BTX loader complaining no kFreeBSD loaded as first Multiboot module

To workaround this issue, I did first modified BTX loader to try the Multiboot format (for Xen) after kFreeBSD format; but considered the Xen domain 0 support wasn't available in kernel version 10.3-RELEASE anyway, I later decided to remove Multiboot support in it completely. Alternately using an older version of BTX loader is also working.

Insert a Multiboot header

This part of work is very similar to what Radek Szymczyszyn does in DragonFly BSD kernel; in fact the issue of placing the header in first 8 KiB of the image is same between kFreeBSD and DragonFly BSD kernel. (4.2.2)

By almost just copying existing code, the Multiboot header was added into source file i386/i386/locore.s:

    .section    .mbheader
    .align 4
#define MULTIBOOT_HEADER_FLAGS (MULTIBOOT_PAGE_ALIGN|MULTIBOOT_MEMORY_INFO)
multiboot_header:
    .long   MULTIBOOT_HEADER_MAGIC
    .long   MULTIBOOT_HEADER_FLAGS
    .long   -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)

Instead of placing it into .text, this header has its own section .mbheader; as Radek Szymczyszyn says in 4.2.2:

the .text section itself begins far in the file, after the program headers

The section .mbheader is used to control the location of Multiboot header in final image, using the linker script.
Unfortunately the GNU ld(1) doesn't provide a clear way to specify the location of a section; Radek Szymczyszyn has actually asked a question at Stack Overflow website for this issue.

Finally there is a hack I made to linker script:

diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/conf/ldscript.i386 b/sys/conf/ldscript.i386
--- a/sys/conf/ldscript.i386    2016-03-25 09:09:25.000000000 +0800
+++ b/sys/conf/ldscript.i386    2019-05-21 15:02:11.235524344 +0800
@@ -7,7 +7,7 @@
 {
   /* Read-only sections, merged into text segment: */
   . = kernbase + kernload + SIZEOF_HEADERS;
-  .interp         : { *(.interp) }
+  .mbheader       : AT (ADDR(.mbheader) - kernbase) { *(.mbheader) *(.interp) }
   .hash           : { *(.hash) }
   .gnu.hash       : { *(.gnu.hash) }
   .dynsym         : { *(.dynsym) }

Unlike Radek Szymczyszyn did in linker script (4.2.2 and 4.2.5), I actually replaced the .interp section with .mbheader, the original *(.interp) stay in the new section to keep this section in first place.
The interpreter string in kFreeBSD image is set from a ld(1) option in file conf/kern.pre.mk; this string is meaningless, and it is completely useless as it doesn't used by any part of FreeBSD source tree.
Along with .interp section, the PT_INTERP segment is also removed by this modification; don't worry, it is useless too.

Since .interp section is no longer exists, I tried to remove this interpreter string from final image; simply removing ld(1) option --dynamic-linker doesn't work because ld(1) will then uses default interpreter /usr/libexec/ld-elf.so.1 or /usr/lib/libc.so.1, for emulation types elf_i386_fbsd or elf_i386. I finally set interpreter to an empty string; it disappeared in image completely after that:

diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/conf/kern.pre.mk b/sys/conf/kern.pre.mk
--- a/sys/conf/kern.pre.mk  2016-03-25 09:09:25.000000000 +0800
+++ b/sys/conf/kern.pre.mk  2019-05-21 15:24:17.721373211 +0800
@@ -170,8 +170,8 @@
 SYSTEM_OBJS= locore.o ${MDOBJS} ${OBJS}
 SYSTEM_OBJS+= ${SYSTEM_CFILES:.c=.o}
 SYSTEM_OBJS+= hack.So
-SYSTEM_LD= @${LD} -Bdynamic -T ${LDSCRIPT} ${LDFLAGS} --no-warn-mismatch \
-   -warn-common -export-dynamic -dynamic-linker /red/herring \
+SYSTEM_LD= @${LD} --dy -T ${LDSCRIPT} ${LDFLAGS} --no-warn-mismatch \
+   --warn-common --export-dynamic --dynamic-linker "" \
    -o ${.TARGET} -X ${SYSTEM_OBJS} vers.o
 SYSTEM_LD_TAIL= @${OBJCOPY} --strip-symbol gcc2_compiled. ${.TARGET} ; \
    ${SIZE} ${.TARGET} ; chmod 755 ${.TARGET}

For now the resulting kernel image should be recognized as a Multiboot kernel image. Testing with grub-file(1) shows:

$ grub-file --is-x86-kfreebsd kernel && echo Yes || echo No
Yes
$ grub-file --is-x86-multiboot kernel && echo Yes || echo No
Yes

Make the kernel bootable and functional

When FreeBSD BTX loader boots a kernel up, it passes some add addiction information, in struct bootinfo bootinfo and some other variables, most importantly boothowto; kFreeBSD collects these information in function recover_bootinfo, defined in source file i386/i386/locore.s.
This step must be skipped if kernel is booted by a Multiboot bootloader; because Multiboot bootloaders are not aware those FreeBSD specific information, and if the kernel will simply halt if such information isn't present.

A check for Multiboot protocol is needed, immediately after entry of kernel. Multiboot bootloaders sets eax to a magic number to indicate that it is a Multiboot bootloader.
Modifying the startup code as:

--- a/sys/i386/i386/locore.s    2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/locore.s    2019-05-24 13:10:42.989381512 +0800
@@ -196,6 +223,17 @@
    movw    $0x1234,0x472
 #endif /* PC98 */

+#ifdef MULTIBOOT
+   /* Are we booted up by a multiboot bootloader? */
+   cmpl    $MULTIBOOT_BOOTLOADER_MAGIC, %eax
+   jne 1f
+   movl    $0, R(bootinfo+BI_KERNEND)
+   movl    %ebx, R(multiboot_env)
+   jmp 2f
+
+1:
+#endif
+
 /* Set up a real frame in case the double return in newboot is executed. */
    pushl   %ebp
    movl    %esp, %ebp
@@ -232,6 +270,8 @@

    call    recover_bootinfo

+2:
+
 /* Get onto a stack that we can trust. */
 /*
  * XXX this step is delayed in case recover_bootinfo needs to return via

As the code describes, if a Multiboot bootloader magic not found, jump to original path to perform tasks to deal with bootinfo passed by BTX loader.

There is also a variable multiboot_env; this is used to access Multiboot specific information, such as the memory information we requested via MULTIBOOT_MEMORY_INFO, and the kernel command line. Its address is passed by bootloader in ebx register; this address must be saved for use later (for goal 3 and laters) before ebx got overwritten.

At this point the kernel should be able to boot from Multiboot bootloader; however the kernel appears hang when I trying it with QEMU.

Kernel hangs in QEMU

QEMU has a built-in GDB-compatible server, for remote debugging with gdb(1). This is very useful for kernel debugging. To enable this server, simply adding option -s or -gdb tcp::<port>; if -s is used, it will listen(2) TCP port 1234. See qemu-system(1) for more information.

With help of gdb(1), I found the kernel encountered a panic shortly after boot; the panic occurs very early at kernel internal start up progress, before local console initialization.

$ gdb kernel
GNU gdb (GDB) 7.11.1 [GDB v7.11.1 for FreeBSD]
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
...
Reading symbols from kernel...Reading symbols from / ... /kernel.symbols...done.
done.
(gdb) target remote x.x.x.x:1234
Remote debugging using x.x.x.x:1234
0x000cb218 in ?? ()
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xc091e8a4 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:489
489             EVENTHANDLER_INVOKE(shutdown_final, howto);
(gdb) bt
#0  0xc091e8a4 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:489
#1  0xc091ed87 in vpanic (fmt=0xc0eb8193 "double fault", 
    ap=0xc11c5e14 <dblfault_stack+4084> "q~\353\300")
    at ../../../kern/kern_shutdown.c:889
#2  0xc091edc0 in panic (fmt=0xc0eb8193 "double fault")
    at ../../../kern/kern_shutdown.c:818
#3  0xc0bc1481 in dblfault_handler () at ../../../i386/i386/trap.c:1052
#4  0x00000000 in ?? ()

Although this backtrace didn't provide many useful information of where the issue occurred, by the fact that panic(9) has been called, indicating paging was activated, IDT and GDT has been setup; just before console finishes its initialization. The problem should be sitting in function init386, between those code:
https://svnweb.freebsd.org/base/releng/10.3/sys/i386/i386/machdep.c?revision=296373&view=markup#l3367

    vm86_initialize();
    getmemsize(first);
    init_param2(physmem);

    /* now running on new page tables, configured,and u/iom is accessible */

    /*
     * Initialize the console before we print anything out.
     */
    cninit();

Further stepping debugging in nearby code shows the kernel panic occurs in BIOS calls from function getmemsize. Appears the kernel is trying to detect memory size from BIOS.

When kFreeBSD is booted by BTX loader, this information was already passed by BTX loader (called SMAP), as the loader has done those BIOS calls already. The kernel will trying to call BIOS by itself if the information isn't available from bootloader.

This problem has made me suspect that memory detection code in kernel has somehow faulted, because the same kernel panic occurs even it is booted by BTX loader, after I manually wiped out SMAP information in bootinfo.

By manually skipping those memory detection code in gdb(1) and let it fallback to RTC I/O for memory size, the kernel could continue start up normally. However this resulted in final detected memory size be limited to 64 MiB, in a QEMU emulated machine have 512 MiB memory.

There is an easy solution; the Multiboot header previously placed into the kernel has a flag MULTIBOOT_MEMORY_INFO set, which means the kernel is requesting memory information from bootloader; then the bootloader will provide base memory and extended memory sizes for kernel.
However, according to Multiboot Specification, this information is a reference only, bootloader may or may not provide the accurate memory size; it is all depending on the kernel whether to use this information.
So far, to avoid debugging the VM86 code that triggered kernel panic, I decided to trust the memory size provided by bootloader at this time; at least it looks fine most times when testing an i386 kernel, comparing to the memory size from RTC.

Save this information that passed from bootloader needs a bit more code in i386/i386/locore.s

--- a/sys/i386/i386/locore.s    2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/locore.s    2019-05-24 13:10:42.989381512 +0800
@@ -126,6 +127,22 @@
    .space  0x240
 #endif

+#ifdef MULTIBOOT
+   .globl  multiboot_env
+multiboot_env: .long   0
+   .globl  multiboot_mem_lower, multiboot_mem_upper, multiboot_cmdline, multiboot_mods
+multiboot_mem_lower:
+   .long   0
+multiboot_mem_upper:
+   .long   0

...

@@ -239,6 +279,12 @@
  * returns via the old frame.
  */
    movl    $R(tmpstk),%esp
+#ifdef MULTIBOOT
+   cmpl    $MULTIBOOT_BOOTLOADER_MAGIC, %eax   /* %eax remains same if we jumped here by label 2 */
+   jne 3f
+   call    recover_multiboot_env
+3:
+#endif

 #ifdef PC98
    /* pc98_machine_type & M_EPSON_PC98 */
@@ -552,6 +598,69 @@

    ret

+#ifdef MULTIBOOT
+/* https://www.gnu.org/software/grub/manual/multiboot/html_node/Boot-information-format.html */
+recover_multiboot_env:
+   movl    R(multiboot_env), %eax
+   testl   $MI_FLAG_MEMORY, (%eax)
+   jz  1f
+   /* Recover memory information */
+   movl    MI_MEM_LOWER(%eax), %ecx
+   movl    %ecx, R(multiboot_mem_lower)
+   movl    MI_MEM_UPPER(%eax), %ecx
+   movl    %ecx, R(multiboot_mem_upper)
+1:

...

Near the entry point btext, after a temporary stack has setup, the magic number is checked again for Multiboot; function recover_multiboot_env may get called to save Multiboot specific information.

The kernel command line and Multiboot modules information may also available, and will be saved from this function if any; this will be described in later chapter.

Next is to modify function getmemsize to make use of those memory size if it available:

diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c
--- a/sys/i386/i386/machdep.c   2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/machdep.c   2019-05-21 11:18:28.491508546 +0800
@@ -2556,6 +2564,15 @@
        goto have_smap;
    }

+#ifdef MULTIBOOT
+   if(multiboot_env && multiboot_mem_lower && multiboot_mem_upper) {
+       basemem = multiboot_mem_lower;
+       basemem_setup();
+       extmem = multiboot_mem_upper;
+       goto skip_bios_calls;
+   }
+#endif
+
    /*
     * Some newer BIOSes have a broken INT 12H implementation
     * which causes a kernel panic immediately.  In this case, we
@@ -2644,6 +2662,9 @@
 #endif
    }

+#ifdef MULTIBOOT
+skip_bios_calls:
+#endif
    /*
     * Special hack for chipsets that still remap the 384k hole when
     * there's 16MB of memory - this really confuses people that

The kernel is now functional with Multiboot, achieving goal 2.
Following screenshots showing it booted up with QEMU direct kernel loading, but asking for root device due to kernel environment isn't available.

Kernel got booted in QEMU
Kernel got booted in QEMU, scrolled up

Prasing the kernel command line

Multiboot Specification provided a way to configure the kernel from bootloader by passing it a command line. Unlike usual user-space programs in UNIX, this command line is a C string ending with 0; kernels have to parse it into argv style string matrix manually if needed.

FreeBSD BTX loader also support some command line options to change the behavior of the loader itself or kFreeBSD; when the meaning of a command line option needs to be passed into kernel, loader uses a bitwise variable to store it; when this variable later been passed into kernel, it will be store into boothowto; possible bits of boothowto is defined in sys/reboot.h, note only some of the bits are making sence to pass from a bootloader. Some useful bits are RB_SINGLE to boot into single user mode, by passing option -s to init(8), and RB_VERBOSE to turn verbose logging on by set bootverbose variable.

Those options that originally been interpreted by BTX loader should be turned into the kernel itself to parse; the result should be store to boothowto variable as well. This means filling boothowto from command line options should happens as early as possible; it is best to do that before any uses of boothowto.

Another scheme the BTX loader passing information to kernel is kernel environment. Just like the environment for user-space programs, the kernel environment is built up by individual environment variables; where an environment variable is a C string with format <key>=<value>.
The kernel environment in kFreeBSD in an important scheme to adjust kernel configuration on boot time; many environment variables are used to set initial value of corresponding sysctl variables. Some variables are read-only to user-space as they can only be set from kernel environment, which are turn from the bootloader; they are called kernel tunables.

Most important variables for automatically booting the system are vfs.root.monutfrom*; otherwise the kernel will asking for which device to mount as root, like if RB_ASKNAME is set in boothowto.

To make those kernel tunables taking effect, initialization of kernel environment must be taken before any tunable was retrieved.

From the beginning of function init386, kernel module information pointer and kernel environment pointer was prepared, then init_param1 was called:

        metadata_missing = 0;
        if (bootinfo.bi_modulep) {
                preload_metadata = (caddr_t)bootinfo.bi_modulep + KERNBASE;
                preload_bootstrap_relocate(KERNBASE);
        } else {
                metadata_missing = 1;
        }

        if (bootinfo.bi_envp)
                init_static_kenv((caddr_t)bootinfo.bi_envp + KERNBASE, 0);
        else
                init_static_kenv(NULL, 0);

        /* Init basic tunables, hz etc */
        init_param1();

This code shows that both preload_metadata and kernel environment was retrieved from bootinfo structure, which won't be available if the kernel was booted from a Multiboot bootloader.
It should also be pointed out that function init_param1 will be called immediately after initialization of kernel environment, there is a comment for this call indicates some tunables will be initialized; this means there is no chance for any later initialization of kernel environment for Multiboot kernel command line.

To parse the command line string, the first task is to break the string into multiple strings by whitespaces, to form an array of pointers argv; this needs a buffer to copy those strings, and some more space to store the temporary array.
However in this early stage of kernel initialization, dynamic memory allocation system is far from ready, means no malloc(9) can ever be used; in fact when init386 is called, the virtual memory isn't fully initialized, until function getmemsize is done.

To allocate memory here, one can use this only argument of init386, first; it is passed from variable physfree in i386/i386/locore.s, which is believed to point to the physical memory of end of current kernel image.
By calling pmap_kenter with corresponding virtual and physical addresses of the first value, one page of that physical memory is mapped to virtual memory; the first should be added by a page size since the last page is then mapped and in use.

Here is an example, in function init386:

        pcpu_init(pc, 0, sizeof(struct pcpu));
        for (pa = first; pa < first + DPCPU_SIZE; pa += PAGE_SIZE)
                pmap_kenter(pa + KERNBASE, pa);
        dpcpu_init((void *)(first + KERNBASE), 0);
        first += DPCPU_SIZE;
        PCPU_SET(prvspace, pc);
        PCPU_SET(curthread, &thread0);

Parsing the command line should use no more than a page of memory, because the maximum length of a command line is limited by Multiboot Specification, which is 4 KiB, exactly same as the common page size on i386.

The added code will allocate one page and pass its virtual address to a new function parse_kernel_command_line; this function will parse the command line and initialize kernel environment, so a later call to init_static_kenv will be skipped in case of Multiboot.
This allocated page will holding initial static kernel environment, after the command line is fully parsed, in parse_kernel_command_line.

diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c
--- a/sys/i386/i386/machdep.c   2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/machdep.c   2019-05-22 11:31:03.982382400 +0800
@@ -3183,6 +3471,15 @@
    pc98_init_dmac();
 #endif

+#ifdef MULTIBOOT
+   if(multiboot_env) {
+       pmap_kenter(first + KERNBASE, first);
+       parse_kernel_command_line(multiboot_cmdline,
+           (char *)(first + KERNBASE), PAGE_SIZE);
+       first += PAGE_SIZE;

...

@@ -3191,10 +3488,18 @@
        metadata_missing = 1;
    }

-   if (bootinfo.bi_envp)
-       init_static_kenv((caddr_t)bootinfo.bi_envp + KERNBASE, 0);
-   else
-       init_static_kenv(NULL, 0);
+   /* Static environment is already initialized in
+    * parse_kernel_command_line if the kernel is loaded by a muiltboot
+    * boot loader. */
+#ifdef MULTIBOOT
+   if(!multiboot_env) {
+#endif
+       init_static_kenv(bootinfo.bi_envp ?
+           (caddr_t)bootinfo.bi_envp + KERNBASE : NULL,
+           0);
+#ifdef MULTIBOOT
+   }
+#endif

    /* Init basic tunables, hz etc */
    init_param1();

The implementation of function parse_kernel_command_line is added in source file kern/init_main.c.

In addition to the options that can be translated to boothowto flags, 3 custom options was also implemented:

  • -e <env-var> Set a kernel environment variable, exactly same as putting a variable without an option.
  • -i <init-path> Set init(8) path, overrides kernel environment variable init_path.
  • -M Ignore the memory information passed from Multiboot bootloader, use BIOS calls instead.

Handle Multiboot modules

A part from Multiboot kernel image itself, Multiboot bootloaders may also load one or more files into memory for kernel, these preloaded files usually called modules in Multiboot.

There are 2 typical uses of the preloaded files in kFreeBSD, memory disk for root file system (initrd), and KLD modules.

FreeBSD BTX loader load addition files as KLD modules by default; an explicit specification of file type is required of any other types. To load an initrd, the type must be set as md_image or mfs_root.

Since it is not possible to specify such a type string from a Multiboot bootloader when loading a Multiboot module, the module type must be determined by the kernel itself.
To determine the module type, an easy way is to check whether it is an ELF file, because all KLD modules must be ELF files.

All preloaded files that loaded by BTX loader will have its metadata stored, pointed by bootinfo.bi_modulep; the kernel image will also have its own metadata stored, with file type elf kernel.

When preparing this metadata from the kernel that booted up by Multiboot bootloaders, the early memory allocation method is again needed; some metadata may consume memory more than a page, and the total amount memory required for the metadata is not easy to known without examine each preloaded files.

The metadata buffer contains multiple 'nodes', each one is aligned to 4 bytes relative to start of metadata buffer; the format for the metadata node is as follows:

  • node type, uint32
  • data length, uint32
  • data

Each preloaded file need at least 4 nodes to describe it, MODINFO_NAME, MODINFO_TYPE, MODINFO_ADDR and MODINFO_SIZE; type MODINFO_NAME is specical, it must be placed before all other nodes that describes a particular file, because this type indicates the appearance of a new preloaded file, and all other type of nodes followed to describe this file.

Initially all address information in this metadata buffer are physical addresses; they will be converted to virtual addresses later in function preload_bootstrap_relocate.

As previously stated, the first file that needed to be described is the kernel image itself; the nodes will be constructed as follows:

  • MODINFO_NAME set from variable kernelname
  • MODINFO_TYPE set to elf kernel
  • MODINFO_ARGS will be set if there are apparently extra options or environment variables in kernel command line
  • MODINFO_ADDR set from KERNLOAD
  • MODINFO_SIZE is calculated from _end and KERNLOAD

The Multiboot module information is again retrieved at function recover_multiboot_env.
The beginning of function init386 was modified again to add another call to initialize bootinfo.bi_modulep:

--- a/sys/i386/i386/machdep.c   2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/machdep.c   2019-05-22 11:31:03.982382400 +0800
@@ -3183,6 +3471,15 @@
    pc98_init_dmac();
 #endif

+#ifdef MULTIBOOT
+   if(multiboot_env) {
+       pmap_kenter(first + KERNBASE, first);
+       parse_kernel_command_line(multiboot_cmdline,
+           (char *)(first + KERNBASE), PAGE_SIZE);
+       first += PAGE_SIZE;
+       first += init_multiboot_modules(first);
+   }
+#endif
    metadata_missing = 0;
    if (bootinfo.bi_modulep) {
        preload_metadata = (caddr_t)bootinfo.bi_modulep + KERNBASE;

Due to the flexibility needed by the metadata buffer, variable first is passed directly to the new function init_multiboot_modules, to allocate memory for use; the function returns the memory size in multiply of pages it uses, so first could shift accordingly.

static int init_multiboot_modules(int addr) {
    size_t module_len = fill_module_info_from_multiboot(NULL);
    int page = addr;
    do {
        pmap_kenter(page + KERNBASE, page);
        page += PAGE_SIZE;
    } while(page < addr + module_len);
    bootinfo.bi_modulep = addr;
    fill_module_info_from_multiboot((caddr_t)(addr + KERNBASE));
    return page - addr;
}

This function at first call fill_module_info_from_multiboot with a null pointer to get the required buffer size, then allocate such a buffer, call fill_module_info_from_multiboot again with that buffer.

To retrieve information of Multiboot modules, this new function fill_module_info_from_multiboot need to access multiboot_mods that previously saved; multiboot_mods contains 2 field, module count and a pointer to module information block array:

extern struct {
    uint32_t count;
    struct multiboot_module *address;
} multiboot_mods;

While the address pointer is stored in physical address, it need to be converted to virtual address before use.

static size_t fill_module_info_from_multiboot(caddr_t addr) {
    ...
    unsigned int i = 0;
    ...
    while(i < multiboot_mods.count) {
        struct multiboot_module *mbmod = (struct multiboot_module *)((caddr_t)(multiboot_mods.address + i) + KERNBASE);
        ...
        i++;
    }
    ...

Each Multiboot module is described by 3 values, a start and an end address, and and pointer to an optional command line, as following structure in C:

struct multiboot_module {
    uint32_t mod_start;
    uint32_t mod_end;
    uint32_t cmdline;
    uint32_t pad;
};

Field pad is a reserved in current specification, and should always be set to 0 by bootloaders.

Converting this Multiboot specific information to kFreeBSD preloaded file metadata is done as followings.

  • MODINFO_NAME will set from module command line, because many Multiboot bootloaders set the file name as the first part of the command line.
  • MODINFO_TYPE is determined by examining and validating the ELF header of the module, then set to either elf module or md_image.
  • MODINFO_ARGS will also be set if there are any additional command line after the module name.
  • MODINFO_ADDR and MODINFO_SIZE is set from mod_start and mod_end, the address is keeping in physical address for further converts.
  • Some extra examines and adjustments will be taken if this module is a KLD module (having a valid ELF header), in order to make this KLD functional later.

Testing with initrd

Before implementing the last (tricky) part to support KLD modules, I decide to test initrd at first. By now the kernel should be able to use a Multiboot module as an initrd, without digging into the detail of how a KLD module working.

A 32 MiB initrd with UFS2 was prepared to test the kernel; it contains FreeBSD C library, other base libraries, sh(1), many other useful commands, and an /init shell script that starts sh(1) after opening /dev/console for stdin, stdout and stderr.

The command line to start QEMU is:

qemu-system-i386 -m 512 -kernel kernel -append "-i /init" -initrd initrd -s

Kernel option -i is a custom extension for setting kern.init_path, implemented in last chapter Prasing the kernel command line.

Test shows there is a kernel panic right after the kernel component md(4) trying to access the initrd.

Kernel panic on accessing initrd with 512 MiB RAM

The good news is the kernel has treated this module as an preloaded memory disk, and tried to handle it with md(4).

By changing the physical memory size, the test result becomes a little different:

Kernel panic on mounting initrd with 2048 MiB RAM

This time the kernel panic orrurred when it actually trying to mount the initrd; which means the initial work of md(4) for this initrd has been successfully done. This phenomenon indicates the initrd is very likely been corrupted during the kernel starting up.

Debugging in gdb(1) shows something suspicious:

(gdb) target remote x.x.x.x:1234
Remote debugging using x.x.x.x:1234
delay_tc (n=100000) at ../../../x86/isa/clock.c:285
285                     u = func(tc) & mask;
(gdb) bt
#0  delay_tc (n=100000) at ../../../x86/isa/clock.c:285
#1  DELAY (n=100000) at ../../../x86/isa/clock.c:321
#2  0xc091e358 in shutdown_panic (junk=0x0, howto=260)
    at ../../../kern/kern_shutdown.c:630
#3  0xc091e9ac in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:489
#4  0xc091eef3 in vpanic (
    fmt=0xc0d36d0d "vm_fault: fault on nofault entry, addr: %lx", 
    ap=0xede96970 "") at ../../../kern/kern_shutdown.c:889
#5  0xc091ef2c in panic (
    fmt=0xc0d36d0d "vm_fault: fault on nofault entry, addr: %lx")
    at ../../../kern/kern_shutdown.c:818
#6  0xc0afc092 in vm_fault_hold (map=0xc17c1000, vaddr=3248734208, 
    fault_type=1 '\001', fault_flags=0, m_hold=0x0)
    at ../../../vm/vm_fault.c:329
#7  0xc0afdb28 in vm_fault (map=0xc17c1000, vaddr=3248734208, 
    fault_type=1 '\001', fault_flags=0) at ../../../vm/vm_fault.c:273
#8  0xc0bc1146 in trap_pfault (frame=0xede96c08, usermode=0, eva=3248734208)
    at ../../../i386/i386/trap.c:914
#9  0xc0bc169d in trap (frame=0xede96c08) at ../../../i386/i386/trap.c:532
#10 0xc0bb13b7 in calltrap () at ../../../i386/i386/exception.s:173
#11 0x00000008 in ?? ()
#12 0x00000028 in ?? ()
#13 0x00000028 in ?? ()
---Type <return> to continue, or q <return> to quit---
#14 0xe1814000 in ?? ()
#15 0xc06c5e54 in mdstart_preload (sc=0xc680b000, bp=0xc6a0d4c0)
    at ../../../dev/md/md.c:799
#16 0xc06c5be9 in md_kthread (arg=0xc680b000) at ../../../dev/md/md.c:1147
#17 0xc08f769a in fork_exit (callout=0xc06c5a46 <md_kthread>, arg=0xc680b000, 
    frame=0xede96ce8) at ../../../kern/kern_fork.c:1027
#18 0xc0bb146c in fork_trampoline () at ../../../i386/i386/exception.s:288
(gdb) b fill_module_info_from_multiboot
Breakpoint 1 at 0xc0bb3ee4: file ../../../i386/i386/machdep.c, line 2914.
(gdb) c
Continuing.

Breakpoint 1, fill_module_info_from_multiboot (addr=addr@entry=0x0)
    at ../../../i386/i386/machdep.c:2914
2914
(gdb) finish 
Run till exit from #0  fill_module_info_from_multiboot (addr=addr@entry=0x0)
    at ../../../i386/i386/machdep.c:2914
0xc0bb5d6b in init_multiboot_modules (addr=21135360)
    at ../../../i386/i386/machdep.c:3083
3083    size_t module_len = fill_module_info_from_multiboot(NULL);
Value returned is $1 = 148
(gdb) c
Continuing.

Breakpoint 1, fill_module_info_from_multiboot (
    addr=addr@entry=0xc1428000 "```P,_,#`P,#`P", '`' <repeats 11 times>, "9F9F9F9F?&!@P````'[T]/1T%!04%!0`\nM", '`' <repeats 13 times>, "!@8", '`' <repeats 13 times>, "&QL`'S&_L#`QGP``````,C(ZNW=VLC/\nMR,C(", '`' <repeats 12 times>, "?,;`^,#&?", '`' <repeats 11 times>, "V&PV-FS8``````````8&```&\nM!@8&!@8&QL9\\``!\\Q"...) at ../../../i386/i386/machdep.c:2914
2914
(gdb) bt
#0  fill_module_info_from_multiboot (
    addr=addr@entry=0xc1428000 "```P,_,#`P,#`P", '`' <repeats 11 times>, "9F9F9F9F?&!@P````'[T]/1T%!04%!0`\nM", '`' <repeats 13 times>, "!@8", '`' <repeats 13 times>, "&QL`'S&_L#`QGP``````,C(ZNW=VLC/\nMR,C(", '`' <repeats 12 times>, "?,;`^,#&?", '`' <repeats 11 times>, "V&PV-FS8``````````8&```&\nM!@8&!@8&QL9\\``!\\Q"...) at ../../../i386/i386/machdep.c:2914
#1  0xc0bb5d9b in init_multiboot_modules (addr=21135360)
    at ../../../i386/i386/machdep.c:3090
#2  init386 (first=21135360) at ../../../i386/i386/machdep.c:3387
#3  0xc04b05c0 in begin () at ../../../i386/i386/locore.s:353
(gdb) p multiboot_mods
$2 = {count = 1, address = 0x120b000}
(gdb) p/x *(struct multiboot_module *)0xc120b000
$3 = {mod_start = 0x120c000, mod_end = 0x320c000, cmdline = 0x120b010, 
  pad = 0x0}
(gdb) p/x first
$4 = 0x1428000
(gdb) p/x physfree 
$5 = 0x1427000

The Multiboot module information structure shows this module was loaded between physical addresses 0x120c000 and 0x320c000; but the first value that points to first free page of physical memory, in function init386 is 0x1427000. Since this 'free' memory will also be used when parsing kernel command line and storing the module metadata, preparing metadata for the module will also corrupting it!

Variable first is passed from begin in i386/i386/locore.s, by another variable physfree; which is turn set from bootinfo.bi_kernend, or if that isn't available, &_end, in function create_pagetables.

BTX loader sets bootinfo.bi_kernend according to ending location of the loaded kernel image and all modules. Multiboot bootloader, of course won't set it, the kernel fallback to set physfree using &_end value, so it simply didn't aware any addition modules when using the memory.

To get the correct ending of Multiboot modules, bootinfo.bi_kernend must be set if any of such modules present. This must be done before bootinfo.bi_kernend is used, in create_pagetables; preferably in recover_multiboot_env.

recover_multiboot_env:
    movl    R(multiboot_env), %eax
    testl   $MI_FLAG_MEMORY, (%eax)

...

6:
    testl   $MI_FLAG_MODS, (%eax)
    jz  7f
    movl    MI_MODS_ADDR(%eax), %ecx
    movl    %ecx, R(multiboot_mods+4)
    movl    MI_MODS_COUNT(%eax), %ecx
    movl    %ecx, R(multiboot_mods)
    /* Set bootinfo.bi_kernend from multiboot modules if any */
    testl   %ecx, %ecx
    jz  7f
    pushl   %ecx
    pushl   R(multiboot_mods+4)
    call    get_kernend_from_multiboot_mods
    addl    $8, %esp
7:
    ret

Function get_kernend_from_multiboot_mods is defined in i386/i386/machdep.c:

/* Called from locore.s, before paging */
void get_kernend_from_multiboot_mods(const struct multiboot_module *mbmod, size_t count) {
    struct bootinfo *bi = (struct bootinfo *)((char *)&bootinfo - KERNBASE);
    while(count > 0) {
        if(bi->bi_kernend < mbmod->mod_end) {
            bi->bi_kernend = roundup(mbmod->mod_end, PAGE_SIZE);
        }
        mbmod++;
        count--;
    }
}

Because the virtual memory is not yet setup when this function is called, referencing any symbol would need at first converting its virtual address to physical address; this is done by manually calculating from KERNBASE, just like the R macro does, in i386/i386/locore.s.

The kernel is now booting into initrd without problem in QEMU.

Booting QEMU Multiboot preloaded initrd

Making KLD modules work

As the last part of this project, loading KLD modules was never as simple as just putting the whole file in somewhere of the memory, telling the KLD subsystem where it locates and how big it is.
KLD modules of kFreeBSD are ELF shared object files on i386. Proper loading of a KLD module requiring bootloader to load each loadable segment of the ELF file, to its targeting virtual address, according to the ELF program headers.

For example KLD module ipfw.ko has following program headers:

$ readelf --program-headers ipfw.ko

Elf file type is DYN (Shared object file)
Entry point 0x48b0
There are 4 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x0e78c 0x0e78c R E 0x1000
  LOAD           0x00e78c 0x0000f78c 0x0000f78c 0x0069c 0x007d0 RW  0x1000
  DYNAMIC        0x00e78c 0x0000f78c 0x0000f78c 0x00078 0x00078 RW  0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

 Section to Segment mapping:
  Segment Sections...
   00     .hash .dynsym .dynstr .rel.dyn .text .rodata set_sysuninit_set set_sysinit_set set_modmetadata_set set_sysctl_set .eh_frame 
   01     .dynamic .data .bss 
   02     .dynamic 
   03     

Offset means offset to the ELF file. VirtAddr is the targeting address; it is relative because this file is a shared object. PhysAddr is ignored since KLD only care about virtual addresses. FileSiz indicates size of this segment store in the ELF file. MemSiz is the size that must be preserved after loading this segment into memory.
Only segments with the LOAD type (PT_LOAD) are needed to be loaded.

As above example, not all loadable segments have its offset equals to its relative virtual address; this will causing problems when the entire ELF file have been loaded by a Multiboot bootloader; don't expect this KLD module will work without fixing the loading location.

In some segments, the FileSiz may smaller than MemSiz; those segments are mostly always the data segments that holding static variables in the program. Some static variables are initialized with a value; all values of initialized static variables are stored in this segment thus included in FileSiz; while some static variables are not initialized; those uninitialized static variables are mapped to .bss section and expected to have value 0 when loaded; the values of uninitialized static variables does not need to be stored in file, but those variables variables does need memory when running, so MemSiz has to larger than FileSiz to reserve space for them.

For the KLD modules that built for the kernel, the data segment always have Offset less than VirtAddr; meaning such a segment must be moved to a higher memory (1 page in this case), combined with the initialization of .bss section, there is a risk of corrupting later segments, or even wrose, later Multiboot modules.

First, this implementation assuming that all data segments that need to be relocated will be the last loadable segment of given KLD module; second, the ending addresses of Multiboot modules will be checked, modules that would need to write over the end of current Multiboot module will be skipped for KLD handling; such a KLD module will remain in memory but won't functional due to missing module metadata.

Remember those KLD modules are shared objects, they need to be linked into kernel to function.
In order to link a KLD module, the KLD subsystem need to known its .dynamic section, that been mapped with DYNAMIC segment. The DYNAMIC segment is assumed to be included in another LOAD segment, usually the data segment; the relative start address of DYNAMIC segment is stored in a new metadata node with type MODINFO_METADATA | MODINFOMD_DYNAMIC.

BTX loader will also examine the section headers of KLD modules if any, to get more information of its symbol table; it is optional because section headers are not used to make a KLD module functional, but providing useful information of symbols, that could be used in the in-kernel debugger DDB.
A KLD module could have its symbol table stripped to reduce file size; this step will be skipped if symbol table is not there.

The implementation of retrieving symbol information is mostly copying from the BTX loader; 2 more metadata nodes will added if such information is available from a particular KLD module; node types MODINFO_METADATA | MODINFOMD_SSYM and MODINFO_METADATA | MODINFOMD_ESYM storing start and end address of symbol table. The addresses stored in metadata nodes are physical addresses, and expecting to be converted to virtual addresses later, just like MODINFO_ADDR.

Final test

This test in QEMU trying load many KLD modules and an initrd.

The QEMU command line is:

qemu-system-i386 -m 512 -kernel kernel -append "-i /init -vp" -initrd tmpfs.ko,msdosfs.ko,fuse.ko,initrd,ipl.ko,ipfw.ko,dummynet.ko,ipdivert.ko,if_tap.ko,netgraph.ko,vesa.ko,snp.ko,imgact_binmisc.ko,svr4.ko

The kernel is booting in verbose mode this time, to show more information including preloaded files.
Option -p is used to pause on each console output line, for early startup progress.

Final test in QEMU, starting up
Final test in QEMU, starting up
Final test in QEMU, test tmpfs
Final test in QEMU, test ipfw
Final test in QEMU, test ipfw and dummynet
Final test in QEMU, kldstat -v
Final test in QEMU, kldstat -v, scrolled up

References


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.