When the legacy bootloader doesn't completely fit above the segment 9000h, it is hardcoded to load at either segment 8800h or 8000h (TC__BOOT_LOADER_SEGMENT_LOW), depending on TC__BOOT_MEMORY_REQUIRED which depends on the chosen encryption cipher. The problem is that the load segment is hardcoded based on the chonse cipher without taking into account the actual available memory on the system. This leads to an interesting problem:
#define TC__BOOT_LOADER_SEGMENT TC_HEX (9000)
#if TC__BOOT_MEMORY_REQUIRED <= 32 //(kilobytes)
# define TC__BOOT_LOADER_SEGMENT_LOW (TC__BOOT_LOADER_SEGMENT - 32 * 1024 / 16)
#else
# define TC__BOOT_LOADER_SEGMENT_LOW (TC__BOOT_LOADER_SEGMENT - 64 * 1024 / 16)
#endif
Here's the exact problematic code. For AES, 29K is needed. For Twofish, 41K is needed. Consider a system that has 568K free (highest reserved segment is 8e00h). If I select AES as my cipher, VC will try to load at segment 8800h because 29 <= 32. It would need to reserve segments 8800h - 8f3fh. This intersects with the already reserved segment starting at 8e00h so would fail giving me BIOS reserved too much RAM: 568.
Now, if I select Twofish as my cipher, VC will try to load at segment 8000h because 41 > 32. It would need to reserve segments 8000h - 8a3fh. 8a3fh is less than 8e00h so the machine would boot without problem.
As you can see, on such a machine, using a cipher with a heavier memory footprint would work while using one with a smaller memory footprint would, surprisingly, fail. The solution would be to properly determine TC__BOOT_LOADER_SEGMENT_LOW in the 1st stage (or a 2nd stage) bootloader.
It would also be nice to have the NT driver look for the BootArguments in more than the 3 hardcoded memory locations. Currently, up to 31K of low memory can be wasted because BootArguments have to be aligned to a 32K boundary.
Here's the solution. Replaces a block in
bootsector.asm.Thank you for reporting this and for the interesting analysis.
MBR bootloader logic for memory handling has always been a nightmare and
great deal to time was already spent to make it work on most
configuration. But as you found out, there are many cases where the
implementation doesn't work.
The size limit of the bootloader 1st stage makes it almost impossible to
implement a foolproof logic and it is too late for second stage. The
issue is that we have different bootloaders with different memory
requirements and at the same time we want to be compatible with maximum
hardware configurations.
I personally stopped spending time on MBR bootloader because of lack of
time since EFI takes all resources since it has a larger users base and
it enables implementing more advanced features that are on VeraCrypt
roadmap (like smart card support). I welcome any help or contribution on
MBR bootloader that would enhance its compatibility.
Concerning your specific case, I don't see an easy solution right now.
Don't hesitate if you have a proposal.
For the BootArguments memory locations, supporting more memory locations
can make the driver startup slower in some cases. Nevertheless, I will
explore this proposal since it will allow more flexibility for those
building customized bootloader (EFI or MBR).
Thank you for your reply.
Replacing lines 44 - 62 of BootSector.asm with the code I posted above solves this issue, as far as I can see. It effectively obsoletes the
TC_BOOT_LOADER_SEGMENT_LOWdefine and calculates it, with the same math, dynamically in the 1st stage loader instead. That define is only used in that one place in BootSector.asm, so this shouldn't break anything else.By the way, plenty of space can be reclaimed in the first stage by getting rid of the decompressor (2nd) stage and jumping straight into the UPX-compressed final stage. The decompressor stage literally wastes 8 sectors of boot area and almost 100 bytes of the first stage just to shave ~10 bytes off the final stage.
Last edit: neos6464 2018-11-07
Sorry for missing the code you posted: I was answering using email and
somehow Sourceforge didn't send me your reply containing the code.
Thank you for the code, it's a clever implementation. However, I have a
doubt about execution path if the BIOS available memory is lower than
TC_BOOT_MEMORY_REQUIRED: in this case, the sub instruction will give a
negative result and we will certainly end up setting ax to
TC_BOOT_LOADER_SEGMENT instead of TC_BOOT_LOADER_LOWMEM_SEGMENT.
I think you should add a line "jc mem_toolow" after "sub ax,
TC_BOOT_MEMORY_REQUIRED" and the generated assembly will still be lower
than the maximum size 0x190 (TC_BOOT_SECTOR_PIM_VALUE_OFFSET). Do you agree?
Concerning removing the decompressor, this was my intent when I first
introduced UPX usage but I did not do it because the decompressor offers
the advantage of being able to detect corruption in compressed data and
returning the decompression status to the first stage before executing
the UPX compressed code. This feature was important during the security
audit that was performed although it is not 100% foolproof. So, if we
remove the decompressor, we have to come up with an alternative that
would validate the compressed code before execution. There is a basic
checksum done in bootsector but we need a better one. Do you have any
proposal?
Good catch, I agree. I didn't think to cover this case, since with so little free low memory you won't be able to start Windows anyway.
As for the checksum, if the current one is not enough to reliably detect corruption, perhaps it can be replaced with CRC32? There should be enough space after removing the decompressor logic. Neither the decompressor nor a better checksum method prevent an attacker from executing arbitrary code, so is it just the issue of a 32 bit value not being reliable enough to detect unintentional corruption?