Copyright 2019-2021 Rivoreo
Licensed under Creative Commons Attribution-Sharealike 4.0 International.
As a FreeBSD kernel hacker, I knowns there are only 2 bootloaders I known that capable for booting the kernel of FreeBSD for x86; one is the BTX loader(8) that is a part of FreeBSD project, while another is GRUB 2.
It could be very useful to make the FreeBSD kernel bootable with Multiboot protocol, as Multiboot are supported by many x86 bootloaders, such as GRUB (both legacy and version 2), syslinux (via mboot.32) and Linux kexec(2); as well as QEMU direct kernel boot, this could make kernel debugging easier.
/boot/kernel/kernel
in FreeBSD..ko
./
by kernel; it could be a Multiboot module in Multiboot-compliant kernels./boot/loader
; it is the third stage of FreeBSD booting progress; this program itself is a BTX client that linked with BTX kernel, so-called BTX loader; see man page loader(8) in FreeBSD for more details.This work is based on Making DragonFly BSD operating system compliant with the Multiboot specification by Radek Szymczyszyn. Since DragonFly BSD is a fork of FreeBSD 4, the kernel startup progress is very similar between those 2 operating systems; this saved a lot of initial works to make kFreeBSD bootable by Multiboot bootloaders.
The final work has been released as a diff file for FreeBSD base 10.3-RELEASE source tree. This diff file itself is released under the FreeBSD license; the full license text are:
Copyright 2019 Rivoreo
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
A FreeBSD 10.3-RELEASE-p29 amd64 operating system is used to develop the kernel; the target kernel version is also 10.3-RELEASE-p29, with many customizations.
The targeted kernel architecture is i386.
Testing is done on a QEMU i386 system emulator, that running on a Debian GNU/Linux 9 operating system.
There are 5 primary goals or stages in the progress of this project;
1. Add a multiboot header, to make the kernel image to be recognized as a multiboot kernel.
2. Make the kernel bootable from multiboot bootloader.
3. Parse the command line passed from multiboot bootloader, to construct kFreeBSD-specific boothowto
variable and kernel environment variables.
4. Use multiboot module passed from bootloader as an initrd.
5. Use multiboot modules passed from bootloader as kFreeBSD KLD modules.
In addition to those goals, it is obvious that the kernel image should still be compatible with BTX loader; however it is not possible for BTX loader provided by version 10.3-RELEASE or later, due to a design issue of BTX loader.
Since 10.3-RELEASE, the BTX loader included in distribution supports Multiboot, but only for Xen (the Xen image is a Multiboot-compliant kernel). The real issue is the loader detecting multiboot kernel image at first; it will trying to load kernel with Multiboot protocol only, if a Multiboot-compliant kernel is detected. Because this multiboot support in BTX loader is designed for Xen only, it requires the first Multiboot module be original kFreeBSD image.
Trying to load a Multiboot kernel without loading any Multiboot modules would resulted in BTX loader complaining 'No FreeBSD kernel provided, aborting'. This happens right after achievement of goal 1.
To workaround this issue, I did first modified BTX loader to try the Multiboot format (for Xen) after kFreeBSD format; but considered the Xen domain 0 support wasn't available in kernel version 10.3-RELEASE anyway, I later decided to remove Multiboot support in it completely. Alternately using an older version of BTX loader is also working.
This part of work is very similar to what Radek Szymczyszyn does in DragonFly BSD kernel; in fact the issue of placing the header in first 8 KiB of the image is same between kFreeBSD and DragonFly BSD kernel. (4.2.2)
By almost just copying existing code, the Multiboot header was added into source file i386/i386/locore.s
:
.section .mbheader
.align 4
#define MULTIBOOT_HEADER_FLAGS (MULTIBOOT_PAGE_ALIGN|MULTIBOOT_MEMORY_INFO)
multiboot_header:
.long MULTIBOOT_HEADER_MAGIC
.long MULTIBOOT_HEADER_FLAGS
.long -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)
Instead of placing it into .text
, this header has its own section .mbheader
; as Radek Szymczyszyn says in 4.2.2:
the
.text
section itself begins far in the file, after the program headers
The section .mbheader
is used to control the location of Multiboot header in final image, using the linker script.
Unfortunately the GNU ld(1) doesn't provide a clear way to specify the location of a section; Radek Szymczyszyn has actually asked a question at Stack Overflow website for this issue.
Finally there is a hack I made to linker script:
diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/conf/ldscript.i386 b/sys/conf/ldscript.i386
--- a/sys/conf/ldscript.i386 2016-03-25 09:09:25.000000000 +0800
+++ b/sys/conf/ldscript.i386 2019-05-21 15:02:11.235524344 +0800
@@ -7,7 +7,7 @@
{
/* Read-only sections, merged into text segment: */
. = kernbase + kernload + SIZEOF_HEADERS;
- .interp : { *(.interp) }
+ .mbheader : AT (ADDR(.mbheader) - kernbase) { *(.mbheader) *(.interp) }
.hash : { *(.hash) }
.gnu.hash : { *(.gnu.hash) }
.dynsym : { *(.dynsym) }
Unlike Radek Szymczyszyn did in linker script (4.2.2 and 4.2.5), I actually replaced the .interp
section with .mbheader
, the original *(.interp)
stay in the new section to keep this section in first place.
The interpreter string in kFreeBSD image is set from a ld(1) option in file conf/kern.pre.mk
; this string is meaningless, and it is completely useless as it doesn't used by any part of FreeBSD source tree.
Along with .interp
section, the PT_INTERP
segment is also removed by this modification; don't worry, it is useless too.
Since .interp
section is no longer exists, I tried to remove this interpreter string from final image; simply removing ld(1) option --dynamic-linker
doesn't work because ld(1) will then uses default interpreter /usr/libexec/ld-elf.so.1
or /usr/lib/libc.so.1
, for emulation types elf_i386_fbsd
or elf_i386
. I finally set interpreter to an empty string; it disappeared in image completely after that:
diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/conf/kern.pre.mk b/sys/conf/kern.pre.mk
--- a/sys/conf/kern.pre.mk 2016-03-25 09:09:25.000000000 +0800
+++ b/sys/conf/kern.pre.mk 2019-05-21 15:24:17.721373211 +0800
@@ -170,8 +170,8 @@
SYSTEM_OBJS= locore.o ${MDOBJS} ${OBJS}
SYSTEM_OBJS+= ${SYSTEM_CFILES:.c=.o}
SYSTEM_OBJS+= hack.So
-SYSTEM_LD= @${LD} -Bdynamic -T ${LDSCRIPT} ${LDFLAGS} --no-warn-mismatch \
- -warn-common -export-dynamic -dynamic-linker /red/herring \
+SYSTEM_LD= @${LD} --dy -T ${LDSCRIPT} ${LDFLAGS} --no-warn-mismatch \
+ --warn-common --export-dynamic --dynamic-linker "" \
-o ${.TARGET} -X ${SYSTEM_OBJS} vers.o
SYSTEM_LD_TAIL= @${OBJCOPY} --strip-symbol gcc2_compiled. ${.TARGET} ; \
${SIZE} ${.TARGET} ; chmod 755 ${.TARGET}
For now the resulting kernel image should be recognized as a Multiboot kernel image. Testing with grub-file(1) shows:
$ grub-file --is-x86-kfreebsd kernel && echo Yes || echo No
Yes
$ grub-file --is-x86-multiboot kernel && echo Yes || echo No
Yes
When FreeBSD BTX loader boots a kernel up, it passes some add addiction information, in struct bootinfo
bootinfo
and some other variables, most importantly boothowto
; kFreeBSD collects these information in function recover_bootinfo
, defined in source file i386/i386/locore.s
.
This step must be skipped if kernel is booted by a Multiboot bootloader; because Multiboot bootloaders are not aware those FreeBSD specific information, and if the kernel will simply halt if such information isn't present.
A check for Multiboot protocol is needed, immediately after entry of kernel. Multiboot bootloaders sets eax
to a magic number to indicate that it is a Multiboot bootloader.
Modifying the startup code as:
--- a/sys/i386/i386/locore.s 2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/locore.s 2019-05-24 13:10:42.989381512 +0800
@@ -196,6 +223,17 @@
movw $0x1234,0x472
#endif /* PC98 */
+#ifdef MULTIBOOT
+ /* Are we booted up by a multiboot bootloader? */
+ cmpl $MULTIBOOT_BOOTLOADER_MAGIC, %eax
+ jne 1f
+ movl $0, R(bootinfo+BI_KERNEND)
+ movl %ebx, R(multiboot_env)
+ jmp 2f
+
+1:
+#endif
+
/* Set up a real frame in case the double return in newboot is executed. */
pushl %ebp
movl %esp, %ebp
@@ -232,6 +270,8 @@
call recover_bootinfo
+2:
+
/* Get onto a stack that we can trust. */
/*
* XXX this step is delayed in case recover_bootinfo needs to return via
As the code describes, if a Multiboot bootloader magic not found, jump to original path to perform tasks to deal with bootinfo passed by BTX loader.
There is also a variable multiboot_env
; this is used to access Multiboot specific information, such as the memory information we requested via MULTIBOOT_MEMORY_INFO
, and the kernel command line. Its address is passed by bootloader in ebx
register; this address must be saved for use later (for goal 3 and laters) before ebx
got overwritten.
At this point the kernel should be able to boot from Multiboot bootloader; however the kernel appears hang when I trying it with QEMU.
QEMU has a built-in GDB-compatible server, for remote debugging with gdb(1). This is very useful for kernel debugging. To enable this server, simply adding option -s
or -gdb tcp::<port>
; if -s
is used, it will listen(2) TCP port 1234. See qemu-system(1) for more information.
With help of gdb(1), I found the kernel encountered a panic shortly after boot; the panic occurs very early at kernel internal start up progress, before local console initialization.
$ gdb kernel
GNU gdb (GDB) 7.11.1 [GDB v7.11.1 for FreeBSD]
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
...
Reading symbols from kernel...Reading symbols from / ... /kernel.symbols...done.
done.
(gdb) target remote x.x.x.x:1234
Remote debugging using x.x.x.x:1234
0x000cb218 in ?? ()
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xc091e8a4 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:489
489 EVENTHANDLER_INVOKE(shutdown_final, howto);
(gdb) bt
#0 0xc091e8a4 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:489
#1 0xc091ed87 in vpanic (fmt=0xc0eb8193 "double fault",
ap=0xc11c5e14 <dblfault_stack+4084> "q~\353\300")
at ../../../kern/kern_shutdown.c:889
#2 0xc091edc0 in panic (fmt=0xc0eb8193 "double fault")
at ../../../kern/kern_shutdown.c:818
#3 0xc0bc1481 in dblfault_handler () at ../../../i386/i386/trap.c:1052
#4 0x00000000 in ?? ()
Although this backtrace didn't provide many useful information of where the issue occurred, by the fact that panic(9) has been called, indicating paging was activated, IDT and GDT has been setup; just before console finishes its initialization. The problem should be sitting in function init386
, between those code:
https://svnweb.freebsd.org/base/releng/10.3/sys/i386/i386/machdep.c?revision=296373&view=markup#l3367
vm86_initialize();
getmemsize(first);
init_param2(physmem);
/* now running on new page tables, configured,and u/iom is accessible */
/*
* Initialize the console before we print anything out.
*/
cninit();
Further stepping debugging in nearby code shows the kernel panic occurs in BIOS calls from function getmemsize
. Appears the kernel is trying to detect memory size from BIOS.
When kFreeBSD is booted by BTX loader, this information was already passed by BTX loader (called SMAP), as the loader has done those BIOS calls already. The kernel will trying to call BIOS by itself if the information isn't available from bootloader.
This problem has made me suspect that memory detection code in kernel has somehow faulted, because the same kernel panic occurs even it is booted by BTX loader, after I manually wiped out SMAP information in bootinfo
.
By manually skipping those memory detection code in gdb(1) and let it fallback to RTC I/O for memory size, the kernel could continue start up normally. However this resulted in final detected memory size be limited to 64 MiB, in a QEMU emulated machine have 512 MiB memory.
There is an easy solution; the Multiboot header previously placed into the kernel has a flag MULTIBOOT_MEMORY_INFO
set, which means the kernel is requesting memory information from bootloader; then the bootloader will provide base memory and extended memory sizes for kernel.
However, according to Multiboot Specification, this information is a reference only, bootloader may or may not provide the accurate memory size; it is all depending on the kernel whether to use this information.
So far, to avoid debugging the VM86 code that triggered kernel panic, I decided to trust the memory size provided by bootloader at this time; at least it looks fine most times when testing an i386 kernel, comparing to the memory size from RTC.
Save this information that passed from bootloader needs a bit more code in i386/i386/locore.s
--- a/sys/i386/i386/locore.s 2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/locore.s 2019-05-24 13:10:42.989381512 +0800
@@ -126,6 +127,22 @@
.space 0x240
#endif
+#ifdef MULTIBOOT
+ .globl multiboot_env
+multiboot_env: .long 0
+ .globl multiboot_mem_lower, multiboot_mem_upper, multiboot_cmdline, multiboot_mods
+multiboot_mem_lower:
+ .long 0
+multiboot_mem_upper:
+ .long 0
...
@@ -239,6 +279,12 @@
* returns via the old frame.
*/
movl $R(tmpstk),%esp
+#ifdef MULTIBOOT
+ cmpl $MULTIBOOT_BOOTLOADER_MAGIC, %eax /* %eax remains same if we jumped here by label 2 */
+ jne 3f
+ call recover_multiboot_env
+3:
+#endif
#ifdef PC98
/* pc98_machine_type & M_EPSON_PC98 */
@@ -552,6 +598,69 @@
ret
+#ifdef MULTIBOOT
+/* https://www.gnu.org/software/grub/manual/multiboot/html_node/Boot-information-format.html */
+recover_multiboot_env:
+ movl R(multiboot_env), %eax
+ testl $MI_FLAG_MEMORY, (%eax)
+ jz 1f
+ /* Recover memory information */
+ movl MI_MEM_LOWER(%eax), %ecx
+ movl %ecx, R(multiboot_mem_lower)
+ movl MI_MEM_UPPER(%eax), %ecx
+ movl %ecx, R(multiboot_mem_upper)
+1:
...
Near the entry point btext
, after a temporary stack has setup, the magic number is checked again for Multiboot; function recover_multiboot_env
may get called to save Multiboot specific information.
The kernel command line and Multiboot modules information may also available, and will be saved from this function if any; this will be described in later chapter.
Next is to modify function getmemsize
to make use of those memory size if it available:
diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c
--- a/sys/i386/i386/machdep.c 2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/machdep.c 2019-05-21 11:18:28.491508546 +0800
@@ -2556,6 +2564,15 @@
goto have_smap;
}
+#ifdef MULTIBOOT
+ if(multiboot_env && multiboot_mem_lower && multiboot_mem_upper) {
+ basemem = multiboot_mem_lower;
+ basemem_setup();
+ extmem = multiboot_mem_upper;
+ goto skip_bios_calls;
+ }
+#endif
+
/*
* Some newer BIOSes have a broken INT 12H implementation
* which causes a kernel panic immediately. In this case, we
@@ -2644,6 +2662,9 @@
#endif
}
+#ifdef MULTIBOOT
+skip_bios_calls:
+#endif
/*
* Special hack for chipsets that still remap the 384k hole when
* there's 16MB of memory - this really confuses people that
The kernel is now functional with Multiboot, achieving goal 2.
Following screenshots showing it booted up with QEMU direct kernel loading, but asking for root device due to kernel environment isn't available.
Multiboot Specification provided a way to configure the kernel from bootloader by passing it a command line. Unlike usual user-space programs in UNIX, this command line is a C string ending with 0; kernels have to parse it into argv
style string matrix manually if needed.
FreeBSD BTX loader also support some command line options to change the behavior of the loader itself or kFreeBSD; when the meaning of a command line option needs to be passed into kernel, loader uses a bitwise variable to store it; when this variable later been passed into kernel, it will be store into boothowto
; possible bits of boothowto
is defined in sys/reboot.h
, note only some of the bits are making sence to pass from a bootloader. Some useful bits are RB_SINGLE
to boot into single user mode, by passing option -s
to init(8), and RB_VERBOSE
to turn verbose logging on by set bootverbose
variable.
Those options that originally been interpreted by BTX loader should be turned into the kernel itself to parse; the result should be store to boothowto
variable as well. This means filling boothowto
from command line options should happens as early as possible; it is best to do that before any uses of boothowto
.
Another scheme the BTX loader passing information to kernel is kernel environment. Just like the environment for user-space programs, the kernel environment is built up by individual environment variables; where an environment variable is a C string with format <key>=<value>
.
The kernel environment in kFreeBSD in an important scheme to adjust kernel configuration on boot time; many environment variables are used to set initial value of corresponding sysctl variables. Some variables are read-only to user-space as they can only be set from kernel environment, which are turn from the bootloader; they are called kernel tunables.
Most important variables for automatically booting the system are vfs.root.monutfrom*
; otherwise the kernel will asking for which device to mount as root, like if RB_ASKNAME
is set in boothowto
.
To make those kernel tunables taking effect, initialization of kernel environment must be taken before any tunable was retrieved.
From the beginning of function init386
, kernel module information pointer and kernel environment pointer was prepared, then init_param1
was called:
metadata_missing = 0;
if (bootinfo.bi_modulep) {
preload_metadata = (caddr_t)bootinfo.bi_modulep + KERNBASE;
preload_bootstrap_relocate(KERNBASE);
} else {
metadata_missing = 1;
}
if (bootinfo.bi_envp)
init_static_kenv((caddr_t)bootinfo.bi_envp + KERNBASE, 0);
else
init_static_kenv(NULL, 0);
/* Init basic tunables, hz etc */
init_param1();
This code shows that both preload_metadata
and kernel environment was retrieved from bootinfo
structure, which won't be available if the kernel was booted from a Multiboot bootloader.
It should also be pointed out that function init_param1
will be called immediately after initialization of kernel environment, there is a comment for this call indicates some tunables will be initialized; this means there is no chance for any later initialization of kernel environment for Multiboot kernel command line.
To parse the command line string, the first task is to break the string into multiple strings by whitespaces, to form an array of pointers argv
; this needs a buffer to copy those strings, and some more space to store the temporary array.
However in this early stage of kernel initialization, dynamic memory allocation system is far from ready, means no malloc(9) can ever be used; in fact when init386
is called, the virtual memory isn't fully initialized, until function getmemsize
is done.
To allocate memory here, one can use this only argument of init386
, first
; it is passed from variable physfree
in i386/i386/locore.s
, which is believed to point to the physical memory of end of current kernel image.
By calling pmap_kenter
with corresponding virtual and physical addresses of the first
value, one page of that physical memory is mapped to virtual memory; the first
should be added by a page size since the last page is then mapped and in use.
Here is an example, in function init386
:
pcpu_init(pc, 0, sizeof(struct pcpu));
for (pa = first; pa < first + DPCPU_SIZE; pa += PAGE_SIZE)
pmap_kenter(pa + KERNBASE, pa);
dpcpu_init((void *)(first + KERNBASE), 0);
first += DPCPU_SIZE;
PCPU_SET(prvspace, pc);
PCPU_SET(curthread, &thread0);
Parsing the command line should use no more than a page of memory, because the maximum length of a command line is limited by Multiboot Specification, which is 4 KiB, exactly same as the common page size on i386.
The added code will allocate one page and pass its virtual address to a new function parse_kernel_command_line
; this function will parse the command line and initialize kernel environment, so a later call to init_static_kenv
will be skipped in case of Multiboot.
This allocated page will holding initial static kernel environment, after the command line is fully parsed, in parse_kernel_command_line
.
diff -ru --exclude-from freebsd-src-diff-exclude-names --new-file a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c
--- a/sys/i386/i386/machdep.c 2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/machdep.c 2019-05-22 11:31:03.982382400 +0800
@@ -3183,6 +3471,15 @@
pc98_init_dmac();
#endif
+#ifdef MULTIBOOT
+ if(multiboot_env) {
+ pmap_kenter(first + KERNBASE, first);
+ parse_kernel_command_line(multiboot_cmdline,
+ (char *)(first + KERNBASE), PAGE_SIZE);
+ first += PAGE_SIZE;
...
@@ -3191,10 +3488,18 @@
metadata_missing = 1;
}
- if (bootinfo.bi_envp)
- init_static_kenv((caddr_t)bootinfo.bi_envp + KERNBASE, 0);
- else
- init_static_kenv(NULL, 0);
+ /* Static environment is already initialized in
+ * parse_kernel_command_line if the kernel is loaded by a muiltboot
+ * boot loader. */
+#ifdef MULTIBOOT
+ if(!multiboot_env) {
+#endif
+ init_static_kenv(bootinfo.bi_envp ?
+ (caddr_t)bootinfo.bi_envp + KERNBASE : NULL,
+ 0);
+#ifdef MULTIBOOT
+ }
+#endif
/* Init basic tunables, hz etc */
init_param1();
The implementation of function parse_kernel_command_line
is added in source file kern/init_main.c
.
In addition to the options that can be translated to boothowto
flags, 3 custom options was also implemented:
-e <env-var>
Set a kernel environment variable, exactly same as putting a variable without an option.-i <init-path>
Set init(8) path, overrides kernel environment variable init_path
.-M
Ignore the memory information passed from Multiboot bootloader, use BIOS calls instead.A part from Multiboot kernel image itself, Multiboot bootloaders may also load one or more files into memory for kernel, these preloaded files usually called modules in Multiboot.
There are 2 typical uses of the preloaded files in kFreeBSD, memory disk for root file system (initrd), and KLD modules.
FreeBSD BTX loader load addition files as KLD modules by default; an explicit specification of file type is required of any other types. To load an initrd, the type must be set as md_image
or mfs_root
.
Since it is not possible to specify such a type string from a Multiboot bootloader when loading a Multiboot module, the module type must be determined by the kernel itself.
To determine the module type, an easy way is to check whether it is an ELF file, because all KLD modules must be ELF files.
All preloaded files that loaded by BTX loader will have its metadata stored, pointed by bootinfo.bi_modulep
; the kernel image will also have its own metadata stored, with file type elf kernel
.
When preparing this metadata from the kernel that booted up by Multiboot bootloaders, the early memory allocation method is again needed; some metadata may consume memory more than a page, and the total amount memory required for the metadata is not easy to known without examine each preloaded files.
The metadata buffer contains multiple 'nodes', each one is aligned to 4 bytes relative to start of metadata buffer; the format for the metadata node is as follows:
Each preloaded file need at least 4 nodes to describe it, MODINFO_NAME
, MODINFO_TYPE
, MODINFO_ADDR
and MODINFO_SIZE
; type MODINFO_NAME
is specical, it must be placed before all other nodes that describes a particular file, because this type indicates the appearance of a new preloaded file, and all other type of nodes followed to describe this file.
Initially all address information in this metadata buffer are physical addresses; they will be converted to virtual addresses later in function preload_bootstrap_relocate
.
As previously stated, the first file that needed to be described is the kernel image itself; the nodes will be constructed as follows:
kernelname
elf kernel
KERNLOAD
_end
and KERNLOAD
The Multiboot module information is again retrieved at function recover_multiboot_env
.
The beginning of function init386
was modified again to add another call to initialize bootinfo.bi_modulep
:
--- a/sys/i386/i386/machdep.c 2016-03-25 09:09:25.000000000 +0800
+++ b/sys/i386/i386/machdep.c 2019-05-22 11:31:03.982382400 +0800
@@ -3183,6 +3471,15 @@
pc98_init_dmac();
#endif
+#ifdef MULTIBOOT
+ if(multiboot_env) {
+ pmap_kenter(first + KERNBASE, first);
+ parse_kernel_command_line(multiboot_cmdline,
+ (char *)(first + KERNBASE), PAGE_SIZE);
+ first += PAGE_SIZE;
+ first += init_multiboot_modules(first);
+ }
+#endif
metadata_missing = 0;
if (bootinfo.bi_modulep) {
preload_metadata = (caddr_t)bootinfo.bi_modulep + KERNBASE;
Due to the flexibility needed by the metadata buffer, variable first
is passed directly to the new function init_multiboot_modules
, to allocate memory for use; the function returns the memory size in multiply of pages it uses, so first
could shift accordingly.
static int init_multiboot_modules(int addr) {
size_t module_len = fill_module_info_from_multiboot(NULL);
int page = addr;
do {
pmap_kenter(page + KERNBASE, page);
page += PAGE_SIZE;
} while(page < addr + module_len);
bootinfo.bi_modulep = addr;
fill_module_info_from_multiboot((caddr_t)(addr + KERNBASE));
return page - addr;
}
This function at first call fill_module_info_from_multiboot
with a null pointer to get the required buffer size, then allocate such a buffer, call fill_module_info_from_multiboot
again with that buffer.
To retrieve information of Multiboot modules, this new function fill_module_info_from_multiboot
need to access multiboot_mods
that previously saved; multiboot_mods
contains 2 field, module count and a pointer to module information block array:
extern struct {
uint32_t count;
struct multiboot_module *address;
} multiboot_mods;
While the address
pointer is stored in physical address, it need to be converted to virtual address before use.
static size_t fill_module_info_from_multiboot(caddr_t addr) {
...
unsigned int i = 0;
...
while(i < multiboot_mods.count) {
struct multiboot_module *mbmod = (struct multiboot_module *)((caddr_t)(multiboot_mods.address + i) + KERNBASE);
...
i++;
}
...
Each Multiboot module is described by 3 values, a start and an end address, and and pointer to an optional command line, as following structure in C:
struct multiboot_module {
uint32_t mod_start;
uint32_t mod_end;
uint32_t cmdline;
uint32_t pad;
};
Field pad
is a reserved in current specification, and should always be set to 0 by bootloaders.
Converting this Multiboot specific information to kFreeBSD preloaded file metadata is done as followings.
MODINFO_NAME
will set from module command line, because many Multiboot bootloaders set the file name as the first part of the command line.MODINFO_TYPE
is determined by examining and validating the ELF header of the module, then set to either elf module
or md_image
.MODINFO_ARGS
will also be set if there are any additional command line after the module name.MODINFO_ADDR
and MODINFO_SIZE
is set from mod_start
and mod_end
, the address is keeping in physical address for further converts.Before implementing the last (tricky) part to support KLD modules, I decide to test initrd at first. By now the kernel should be able to use a Multiboot module as an initrd, without digging into the detail of how a KLD module working.
A 32 MiB initrd with UFS2 was prepared to test the kernel; it contains FreeBSD C library, other base libraries, sh(1), many other useful commands, and an /init
shell script that starts sh(1) after opening /dev/console
for stdin, stdout and stderr.
The command line to start QEMU is:
qemu-system-i386 -m 512 -kernel kernel -append "-i /init" -initrd initrd -s
Kernel option -i
is a custom extension for setting kern.init_path
, implemented in last chapter Prasing the kernel command line.
Test shows there is a kernel panic right after the kernel component md(4) trying to access the initrd.
The good news is the kernel has treated this module as an preloaded memory disk, and tried to handle it with md(4).
By changing the physical memory size, the test result becomes a little different:
This time the kernel panic orrurred when it actually trying to mount the initrd; which means the initial work of md(4) for this initrd has been successfully done. This phenomenon indicates the initrd is very likely been corrupted during the kernel starting up.
Debugging in gdb(1) shows something suspicious:
(gdb) target remote x.x.x.x:1234
Remote debugging using x.x.x.x:1234
delay_tc (n=100000) at ../../../x86/isa/clock.c:285
285 u = func(tc) & mask;
(gdb) bt
#0 delay_tc (n=100000) at ../../../x86/isa/clock.c:285
#1 DELAY (n=100000) at ../../../x86/isa/clock.c:321
#2 0xc091e358 in shutdown_panic (junk=0x0, howto=260)
at ../../../kern/kern_shutdown.c:630
#3 0xc091e9ac in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:489
#4 0xc091eef3 in vpanic (
fmt=0xc0d36d0d "vm_fault: fault on nofault entry, addr: %lx",
ap=0xede96970 "") at ../../../kern/kern_shutdown.c:889
#5 0xc091ef2c in panic (
fmt=0xc0d36d0d "vm_fault: fault on nofault entry, addr: %lx")
at ../../../kern/kern_shutdown.c:818
#6 0xc0afc092 in vm_fault_hold (map=0xc17c1000, vaddr=3248734208,
fault_type=1 '\001', fault_flags=0, m_hold=0x0)
at ../../../vm/vm_fault.c:329
#7 0xc0afdb28 in vm_fault (map=0xc17c1000, vaddr=3248734208,
fault_type=1 '\001', fault_flags=0) at ../../../vm/vm_fault.c:273
#8 0xc0bc1146 in trap_pfault (frame=0xede96c08, usermode=0, eva=3248734208)
at ../../../i386/i386/trap.c:914
#9 0xc0bc169d in trap (frame=0xede96c08) at ../../../i386/i386/trap.c:532
#10 0xc0bb13b7 in calltrap () at ../../../i386/i386/exception.s:173
#11 0x00000008 in ?? ()
#12 0x00000028 in ?? ()
#13 0x00000028 in ?? ()
---Type <return> to continue, or q <return> to quit---
#14 0xe1814000 in ?? ()
#15 0xc06c5e54 in mdstart_preload (sc=0xc680b000, bp=0xc6a0d4c0)
at ../../../dev/md/md.c:799
#16 0xc06c5be9 in md_kthread (arg=0xc680b000) at ../../../dev/md/md.c:1147
#17 0xc08f769a in fork_exit (callout=0xc06c5a46 <md_kthread>, arg=0xc680b000,
frame=0xede96ce8) at ../../../kern/kern_fork.c:1027
#18 0xc0bb146c in fork_trampoline () at ../../../i386/i386/exception.s:288
(gdb) b fill_module_info_from_multiboot
Breakpoint 1 at 0xc0bb3ee4: file ../../../i386/i386/machdep.c, line 2914.
(gdb) c
Continuing.
Breakpoint 1, fill_module_info_from_multiboot (addr=addr@entry=0x0)
at ../../../i386/i386/machdep.c:2914
2914
(gdb) finish
Run till exit from #0 fill_module_info_from_multiboot (addr=addr@entry=0x0)
at ../../../i386/i386/machdep.c:2914
0xc0bb5d6b in init_multiboot_modules (addr=21135360)
at ../../../i386/i386/machdep.c:3083
3083 size_t module_len = fill_module_info_from_multiboot(NULL);
Value returned is $1 = 148
(gdb) c
Continuing.
Breakpoint 1, fill_module_info_from_multiboot (
addr=addr@entry=0xc1428000 "```P,_,#`P,#`P", '`' <repeats 11 times>, "9F9F9F9F?&!@P````'[T]/1T%!04%!0`\nM", '`' <repeats 13 times>, "!@8", '`' <repeats 13 times>, "&QL`'S&_L#`QGP``````,C(ZNW=VLC/\nMR,C(", '`' <repeats 12 times>, "?,;`^,#&?", '`' <repeats 11 times>, "V&PV-FS8``````````8&```&\nM!@8&!@8&QL9\\``!\\Q"...) at ../../../i386/i386/machdep.c:2914
2914
(gdb) bt
#0 fill_module_info_from_multiboot (
addr=addr@entry=0xc1428000 "```P,_,#`P,#`P", '`' <repeats 11 times>, "9F9F9F9F?&!@P````'[T]/1T%!04%!0`\nM", '`' <repeats 13 times>, "!@8", '`' <repeats 13 times>, "&QL`'S&_L#`QGP``````,C(ZNW=VLC/\nMR,C(", '`' <repeats 12 times>, "?,;`^,#&?", '`' <repeats 11 times>, "V&PV-FS8``````````8&```&\nM!@8&!@8&QL9\\``!\\Q"...) at ../../../i386/i386/machdep.c:2914
#1 0xc0bb5d9b in init_multiboot_modules (addr=21135360)
at ../../../i386/i386/machdep.c:3090
#2 init386 (first=21135360) at ../../../i386/i386/machdep.c:3387
#3 0xc04b05c0 in begin () at ../../../i386/i386/locore.s:353
(gdb) p multiboot_mods
$2 = {count = 1, address = 0x120b000}
(gdb) p/x *(struct multiboot_module *)0xc120b000
$3 = {mod_start = 0x120c000, mod_end = 0x320c000, cmdline = 0x120b010,
pad = 0x0}
(gdb) p/x first
$4 = 0x1428000
(gdb) p/x physfree
$5 = 0x1427000
The Multiboot module information structure shows this module was loaded between physical addresses 0x120c000 and 0x320c000; but the first
value that points to first free page of physical memory, in function init386
is 0x1427000. Since this 'free' memory will also be used when parsing kernel command line and storing the module metadata, preparing metadata for the module will also corrupting it!
Variable first
is passed from begin
in i386/i386/locore.s
, by another variable physfree
; which is turn set from bootinfo.bi_kernend
, or if that isn't available, &_end
, in function create_pagetables
.
BTX loader sets bootinfo.bi_kernend
according to ending location of the loaded kernel image and all modules. Multiboot bootloader, of course won't set it, the kernel fallback to set physfree
using &_end
value, so it simply didn't aware any addition modules when using the memory.
To get the correct ending of Multiboot modules, bootinfo.bi_kernend
must be set if any of such modules present. This must be done before bootinfo.bi_kernend
is used, in create_pagetables
; preferably in recover_multiboot_env
.
recover_multiboot_env:
movl R(multiboot_env), %eax
testl $MI_FLAG_MEMORY, (%eax)
...
6:
testl $MI_FLAG_MODS, (%eax)
jz 7f
movl MI_MODS_ADDR(%eax), %ecx
movl %ecx, R(multiboot_mods+4)
movl MI_MODS_COUNT(%eax), %ecx
movl %ecx, R(multiboot_mods)
/* Set bootinfo.bi_kernend from multiboot modules if any */
testl %ecx, %ecx
jz 7f
pushl %ecx
pushl R(multiboot_mods+4)
call get_kernend_from_multiboot_mods
addl $8, %esp
7:
ret
Function get_kernend_from_multiboot_mods
is defined in i386/i386/machdep.c
:
/* Called from locore.s, before paging */
void get_kernend_from_multiboot_mods(const struct multiboot_module *mbmod, size_t count) {
struct bootinfo *bi = (struct bootinfo *)((char *)&bootinfo - KERNBASE);
while(count > 0) {
if(bi->bi_kernend < mbmod->mod_end) {
bi->bi_kernend = roundup(mbmod->mod_end, PAGE_SIZE);
}
mbmod++;
count--;
}
}
Because the virtual memory is not yet setup when this function is called, referencing any symbol would need at first converting its virtual address to physical address; this is done by manually calculating from KERNBASE
, just like the R
macro does, in i386/i386/locore.s
.
The kernel is now booting into initrd without problem in QEMU.
As the last part of this project, loading KLD modules was never as simple as just putting the whole file in somewhere of the memory, telling the KLD subsystem where it locates and how big it is.
KLD modules of kFreeBSD are ELF shared object files on i386. Proper loading of a KLD module requiring bootloader to load each loadable segment of the ELF file, to its targeting virtual address, according to the ELF program headers.
For example KLD module ipfw.ko
has following program headers:
$ readelf --program-headers ipfw.ko
Elf file type is DYN (Shared object file)
Entry point 0x48b0
There are 4 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x0e78c 0x0e78c R E 0x1000
LOAD 0x00e78c 0x0000f78c 0x0000f78c 0x0069c 0x007d0 RW 0x1000
DYNAMIC 0x00e78c 0x0000f78c 0x0000f78c 0x00078 0x00078 RW 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
Section to Segment mapping:
Segment Sections...
00 .hash .dynsym .dynstr .rel.dyn .text .rodata set_sysuninit_set set_sysinit_set set_modmetadata_set set_sysctl_set .eh_frame
01 .dynamic .data .bss
02 .dynamic
03
Offset
means offset to the ELF file. VirtAddr
is the targeting address; it is relative because this file is a shared object. PhysAddr
is ignored since KLD only care about virtual addresses. FileSiz
indicates size of this segment store in the ELF file. MemSiz
is the size that must be preserved after loading this segment into memory.
Only segments with the LOAD
type (PT_LOAD
) are needed to be loaded.
As above example, not all loadable segments have its offset equals to its relative virtual address; this will causing problems when the entire ELF file have been loaded by a Multiboot bootloader; don't expect this KLD module will work without fixing the loading location.
In some segments, the FileSiz
may smaller than MemSiz
; those segments are mostly always the data segments that holding static variables in the program. Some static variables are initialized with a value; all values of initialized static variables are stored in this segment thus included in FileSiz
; while some static variables are not initialized; those uninitialized static variables are mapped to .bss
section and expected to have value 0 when loaded; the values of uninitialized static variables does not need to be stored in file, but those variables variables does need memory when running, so MemSiz
has to larger than FileSiz
to reserve space for them.
For the KLD modules that built for the kernel, the data segment always have Offset
less than VirtAddr
; meaning such a segment must be moved to a higher memory (1 page in this case), combined with the initialization of .bss
section, there is a risk of corrupting later segments, or even wrose, later Multiboot modules.
First, this implementation assuming that all data segments that need to be relocated will be the last loadable segment of given KLD module; second, the ending addresses of Multiboot modules will be checked, modules that would need to write over the end of current Multiboot module will be skipped for KLD handling; such a KLD module will remain in memory but won't functional due to missing module metadata.
Remember those KLD modules are shared objects, they need to be linked into kernel to function.
In order to link a KLD module, the KLD subsystem need to known its .dynamic
section, that been mapped with DYNAMIC
segment. The DYNAMIC
segment is assumed to be included in another LOAD
segment, usually the data segment; the relative start address of DYNAMIC
segment is stored in a new metadata node with type MODINFO_METADATA | MODINFOMD_DYNAMIC
.
BTX loader will also examine the section headers of KLD modules if any, to get more information of its symbol table; it is optional because section headers are not used to make a KLD module functional, but providing useful information of symbols, that could be used in the in-kernel debugger DDB.
A KLD module could have its symbol table stripped to reduce file size; this step will be skipped if symbol table is not there.
The implementation of retrieving symbol information is mostly copying from the BTX loader; 2 more metadata nodes will added if such information is available from a particular KLD module; node types MODINFO_METADATA | MODINFOMD_SSYM
and MODINFO_METADATA | MODINFOMD_ESYM
storing start and end address of symbol table. The addresses stored in metadata nodes are physical addresses, and expecting to be converted to virtual addresses later, just like MODINFO_ADDR
.
This test in QEMU trying load many KLD modules and an initrd.
The QEMU command line is:
qemu-system-i386 -m 512 -kernel kernel -append "-i /init -vp" -initrd tmpfs.ko,msdosfs.ko,fuse.ko,initrd,ipl.ko,ipfw.ko,dummynet.ko,ipdivert.ko,if_tap.ko,netgraph.ko,vesa.ko,snp.ko,imgact_binmisc.ko,svr4.ko
The kernel is booting in verbose mode this time, to show more information including preloaded files.
Option -p
is used to pause on each console output line, for early startup progress.