I ran into a problem with gcc-6.3 allocating registers for some asm code;
re-using a register that resulted in the code not working.
Before entering a bug report to gcc, I compiled the latest version, 8.2,
and tried that. Unfortunately, that version insists on inserting a
prologue to my _start routine, which means it will not work. The same
problem occurred with 6.3, at the beginning, but with the optimisation
level -Os, the prologue and epilogue are not generated.... read more
To debug my bare metal code, I found that I could use qemu with the following command, but only with a simple modification to qemu.
qemu-system-aarch64 -m 1024 -cpu cortex-a53 -machine raspi3 -smp 4 -bios kernel8.img
Without modifying qemu, my code, the "bios", gets loaded at 0x80000; there's no option to change this, but editing the definition of FIRMWARE_ADDR_3 in qemu-3.0.0/hw/arm/raspi.c and re-compiling works.... read more
To get my kernel to load and be run by the Cortex-A53 processors at reset, the config.txt file contains this single line:
kernel_old=1
And my code is in the file kernel8.img.
The cmdline.txt file is read by the GPU firmware (start.elf) and, with some other tags, overwrites memory, including some of the contents of kernel8.img, from 0x100.... read more
I have just isolated possibly the most confusing bug I have ever created! It is possibly the most confusing bug ever created!
The initial block of Isembard code run via u-boot on the beagleboard (all hand crafted ARM code) consists of the kernel, followed by a list of initial drivers.
The kernel consists of 128 bytes of relocation code that copies the rest of the kernel and the drivers to sensible places in physical RAM, and 7216 bytes of core code and data. The drivers are currently all less than 1000 bytes each and at this point there are only three. Total size of the file loaded into memory at boot is 8572 bytes. The final driver opens up the serial port in order to download other code.... read more
I'm going to write resource allocation drivers for muxed pins (physical omap35x pins which can be connected to a range of internal features).
Modules in the omap SoC can be powered down, and their clocks switched off when they are not needed. The drivers will power up the required modules, and enable any required pins, as features are requested by programs; they will power them down again if all the users of the module are released.
After making my previous blog entry, I realised that 250k system calls per second wasn't really acceptable, so I set out to find any bottlenecks in the process.
My biggest worry was that using the data abort mechanism to enter the kernel was flawed, and that the traditional SWI mechanism was significantly faster.
What I did find was that reading the abort address from coprocessor register 15 is extremely slow, taking the equivalent of over 30 normal instructions. I had assumed (hoped) that it would be almost like reading a processor register or, at worst, like reading a word from memory. It has to be read, however, so I modified the kernel to store the value in memory and retrieve it as necessary.... read more
Now I've got two programs running in separate maps, one a simple UART (serial port) driver that using busy-waiting and the other a program that uses the service of the driver, I thought I'd try to work out how long it takes to make a simple inter-map call. Since there are no timers running yet, the simplest way of timing the calls is to do lots of them in a loop and output a character after every million or so, and time the frequency of the output characters.... read more