When running a VirtualBox virtual machine using LiveCD with rEFInd (doesn't matter if it is an
official refind_cd_0.11.3.iso or a custom Linux distribution using rEFInd as a boot manager) icon
missing bug occurs.
Description: In case of refind_cd_0.11.3.iso the bug is easily reproducible and has a "triggering" symptom:
for the first time boot occurs some icons are missing and replaced with black-yellow striped squares).
After timeout is hit and "Boot Fallback" bootloader is run rEFInd rescans bootloaders and ALL icons
appear on screen.
Then one more rescan triggered by running "Boot Fallback" bootloader will make them dissapear and this may long
ad infinitum.
How to reproduce:
1. Install VirtualBox for Linux (5.2.16 r123745 in my case, but earlier version like 5.1 and latest development version also will fit)
2. Create a linux machine with EFI
3. "load" refind_cd_0.11.3.iso as an only boot media
4. Run the machine
5. Wait until reEFInd timeout is hit with "Boot Fallback bootloader" option selected or run it manually
Result: triggering appearance/disappearance of some boot options'/tools' icons
PS: this bug CANNOT BE reproduced using real hardware and QEMU/KVM virtual machine.
--
Best regards,
Nikolai Kostrigin
ALT Linux Team
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've just poked around with this, and the problem seems to be related to the ISO-9660 filesystem driver in VirtualBox. Here's why:
Unlike most computers' EFIs, VirtualBox's EFI includes an ISO-9660 filesystem driver. Most computers boot via an El Torito image, which itself holds a FAT filesystem. When I boot rEFInd with VirtualBox on my system, I get the ugly display you describe, and the last boot entry (on the far right of the display) is identified as "Boot Fallback boot loader from ElTorito." This is the copy of rEFInd inside the El Torito image on the rEFInd boot CD, which is what most computers will launch. Selecting this option produces a rEFInd display with its regular icons (loaded from the El Torito image's FAT filesystem), not the ugly yellow/black striped things (attempted, and failed, to be loaded from the ISO-9660 filesystem). The last item then becomes "Boot Fallback boot loader from ISO-9660 volume." Selecting it produces the original ugly display.
Thus, rEFInd is working fine from the El Torito image, but not from the ISO-9660 filesystem. The ISO-9660 filesystem does contain all the necessary files, and in fact if you launch an EFI shell, you can see that the EFI does show the files as being present; however, attempting to access them (via edit os_ubuntu.png, for instance) fails with an "access denied" error. Attempting the same with icons that do show up correctly (edit func_about.png, for instance) works, although of course the binary PNG file looks like gibberish in the editor.
Thus, this looks like it's either a bug in the ISO-9660 filesystem driver provided by VirtualBox or some quirk in the way mkisofs creates the filesystem that VirtualBox's ISO-9660 driver doesn't like -- possibly the driver isn't fully utilizing the Rock Ridge or Joliet long filename support (whichever the driver is using). I can see no differences in permissions on either the source files on my Linux development computer or in the files on the CD as available in Linux or Windows, and I'm able to copy files from the CD from both Linux and Windows 7, so I don't think the CD is objectively bad.
I have a vague recollection of looking at VirtualBox's EFI drivers a few years ago, and IIRC, its ISO-9660 driver was based on rEFIt's ISO-9660 driver, which was unfinished. I don't know offhand if the version in VirtualBox today has been significantly improved since then, but it's entirely plausible that it's got significant bugs. If so, there might not be much that can be done in rEFInd to work around the issue; this may need to be reported to VirtualBox's developers as a bug. OTOH, it's conceivable that some workaround in the call to mkisofs in my mkcdimage script (which is in the project's git repository) might get it working better. Of course, a workaround in rEFInd's script would not fix the bug in VirtualBox's EFI, if indeed that is the cause.
Note also that rEFInd's ISO-9660 driver is also based on the incomplete rEFIt driver, and so is likely buggy; but it sees very little use itself. The Clover boot loader is another rEFIt derivative, but their developers have significantly modified the ISO-9660 driver. I looked into pulling in their version a while back, but I had problems getting it to compile, so I dropped that effort, since it didn't seem like a high-priority task. The VirtualBox developers might be more motivated to update their driver. Unfortunately, even using the compiled Clover driver in VirtualBox is unlikely to help, since the built-in VirtualBox driver is almost certain to take priority over the Clover driver. If I'm right, the fix must be included in the VirtualBox "firmware." Also, I've not tested either rEFInd's or Clover's ISO-9660 driver, so I don't know if either of them might have the same problem. If they did, it would show up only after booting rEFInd from some medium that includes the ISO-9660 driver and then launching another rEFInd from the ISO-9660 medium -- something that few people would do.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for your efforts! I mentioned your opinion in a VirtualBox bugtracker thread: https://www.virtualbox.org/ticket/18039#comment:2
Recently I've examined the difference between UDK2014.SP1 and UDK2017 code (which are OVMF base for VirtualBox and my current QEMU setup respectively), which is also described in VB bugtracker (https://www.virtualbox.org/ticket/18039#comment:3). Could you also comment on that?
So while we wait changes from their side, could you, please, consider introducing a workaround which would test if there are unaccessible files in a found partition and deny usage of this partition by default (switching to another one in the list).
I discovered that VirtualBox's main issue now is that it "sees" files on a "broken" filesystem while hardware PCs and QEMU virtual machine reports "no files" on such partitions when trying to 'ls' them. That's why VirtualBox run /EFI/BOOT/bootx64.efi from a "wrong" partition. That results in unaccessible files (not only pictures, but kernel image also in my case)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The small change you noted in the UDK code relates to media block size. AFAIK, VirtualBox identifies virtualized CD media as having a conventional 2048-byte block size, so the change shouldn't affect it; and if it did, I'd expect the media to turn up as completely unreadable without the change. Thus, this doesn't look like a likely root cause. As I said, the last I checked, VirtualBox took unfinished ISO-9660 code from rEFIt, not from the UDK, and that's where I suspect the problem lies; however, I've not looked at the code very closely, and not recently, either -- my comments are based on my recollection when I looked into the code's origin several years ago.
Working around this bug by looking for accessible files would be tricky at best, and would almost certainly introduce new problems in other cases. Currently, rEFInd reads icons from a subdirectory of its own location. What you're suggesting is that, if this fails, rEFInd should go on a fishing expedition to find other icons. This would require a fair amount of extra code to find valid icons elsewhere, which in turn means that new bugs would likely crop up. Thus, I'm reluctant to do this. The bug is almost certainly not rEFInd's; it affects a very specialized use case; and it does not impair functionality, just aesthetics. In my experience, attempts to fix such minor and rare problems often create new and bigger problems. It's far better to address the source of the problem, which seems to be in VirtualBox.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As I mentioned earlier the issue is concerned not only to aesthetics in our use case.
We have to put linux kernel image to ElTorito partition. So in the same way like pictures aren't accessible, the kernel image also becomes inaccessible for boot loader, therefore boot gets stuck.
What I really meant requesting workaround was not to try to find icons in another location, but rather find new "another and right location" and start boot process from scratch if "wrong partition" was unfortunately selected by EFI boot algorithm, e.g.:
rEFInd should detect kernels on any and all media, so even with the problem caused by the ISO-9660 filesystem driver, kernels located elsewhere should be bootable. If the problem is that the kernel is only on an ISO-9660 filesystem, I don't see a way to work around that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here's another possible workaround, given that you're using VirtualBox: Instead of relying on an ISO-9660 image, create a virtual hard disk image that holds the same files in a FAT filesystem. This should work just as well in VirtualBox, and will bypass the ISO-9660 filesystem driver bug.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Greetings!
When running a VirtualBox virtual machine using LiveCD with rEFInd (doesn't matter if it is an
official refind_cd_0.11.3.iso or a custom Linux distribution using rEFInd as a boot manager) icon
missing bug occurs.
Description: In case of refind_cd_0.11.3.iso the bug is easily reproducible and has a "triggering" symptom:
for the first time boot occurs some icons are missing and replaced with black-yellow striped squares).
After timeout is hit and "Boot Fallback" bootloader is run rEFInd rescans bootloaders and ALL icons
appear on screen.
Then one more rescan triggered by running "Boot Fallback" bootloader will make them dissapear and this may long
ad infinitum.
How to reproduce:
1. Install VirtualBox for Linux (5.2.16 r123745 in my case, but earlier version like 5.1 and latest development version also will fit)
2. Create a linux machine with EFI
3. "load" refind_cd_0.11.3.iso as an only boot media
4. Run the machine
5. Wait until reEFInd timeout is hit with "Boot Fallback bootloader" option selected or run it manually
Result: triggering appearance/disappearance of some boot options'/tools' icons
PS: this bug CANNOT BE reproduced using real hardware and QEMU/KVM virtual machine.
--
Best regards,
Nikolai Kostrigin
ALT Linux Team
I've just poked around with this, and the problem seems to be related to the ISO-9660 filesystem driver in VirtualBox. Here's why:
Unlike most computers' EFIs, VirtualBox's EFI includes an ISO-9660 filesystem driver. Most computers boot via an El Torito image, which itself holds a FAT filesystem. When I boot rEFInd with VirtualBox on my system, I get the ugly display you describe, and the last boot entry (on the far right of the display) is identified as "Boot Fallback boot loader from ElTorito." This is the copy of rEFInd inside the El Torito image on the rEFInd boot CD, which is what most computers will launch. Selecting this option produces a rEFInd display with its regular icons (loaded from the El Torito image's FAT filesystem), not the ugly yellow/black striped things (attempted, and failed, to be loaded from the ISO-9660 filesystem). The last item then becomes "Boot Fallback boot loader from ISO-9660 volume." Selecting it produces the original ugly display.
Thus, rEFInd is working fine from the El Torito image, but not from the ISO-9660 filesystem. The ISO-9660 filesystem does contain all the necessary files, and in fact if you launch an EFI shell, you can see that the EFI does show the files as being present; however, attempting to access them (via
edit os_ubuntu.png
, for instance) fails with an "access denied" error. Attempting the same with icons that do show up correctly (edit func_about.png
, for instance) works, although of course the binary PNG file looks like gibberish in the editor.Thus, this looks like it's either a bug in the ISO-9660 filesystem driver provided by VirtualBox or some quirk in the way
mkisofs
creates the filesystem that VirtualBox's ISO-9660 driver doesn't like -- possibly the driver isn't fully utilizing the Rock Ridge or Joliet long filename support (whichever the driver is using). I can see no differences in permissions on either the source files on my Linux development computer or in the files on the CD as available in Linux or Windows, and I'm able to copy files from the CD from both Linux and Windows 7, so I don't think the CD is objectively bad.I have a vague recollection of looking at VirtualBox's EFI drivers a few years ago, and IIRC, its ISO-9660 driver was based on rEFIt's ISO-9660 driver, which was unfinished. I don't know offhand if the version in VirtualBox today has been significantly improved since then, but it's entirely plausible that it's got significant bugs. If so, there might not be much that can be done in rEFInd to work around the issue; this may need to be reported to VirtualBox's developers as a bug. OTOH, it's conceivable that some workaround in the call to
mkisofs
in mymkcdimage
script (which is in the project's git repository) might get it working better. Of course, a workaround in rEFInd's script would not fix the bug in VirtualBox's EFI, if indeed that is the cause.Note also that rEFInd's ISO-9660 driver is also based on the incomplete rEFIt driver, and so is likely buggy; but it sees very little use itself. The Clover boot loader is another rEFIt derivative, but their developers have significantly modified the ISO-9660 driver. I looked into pulling in their version a while back, but I had problems getting it to compile, so I dropped that effort, since it didn't seem like a high-priority task. The VirtualBox developers might be more motivated to update their driver. Unfortunately, even using the compiled Clover driver in VirtualBox is unlikely to help, since the built-in VirtualBox driver is almost certain to take priority over the Clover driver. If I'm right, the fix must be included in the VirtualBox "firmware." Also, I've not tested either rEFInd's or Clover's ISO-9660 driver, so I don't know if either of them might have the same problem. If they did, it would show up only after booting rEFInd from some medium that includes the ISO-9660 driver and then launching another rEFInd from the ISO-9660 medium -- something that few people would do.
Thank you for your efforts! I mentioned your opinion in a VirtualBox bugtracker thread:
https://www.virtualbox.org/ticket/18039#comment:2
Recently I've examined the difference between UDK2014.SP1 and UDK2017 code (which are OVMF base for VirtualBox and my current QEMU setup respectively), which is also described in VB bugtracker (https://www.virtualbox.org/ticket/18039#comment:3). Could you also comment on that?
So while we wait changes from their side, could you, please, consider introducing a workaround which would test if there are unaccessible files in a found partition and deny usage of this partition by default (switching to another one in the list).
I discovered that VirtualBox's main issue now is that it "sees" files on a "broken" filesystem while hardware PCs and QEMU virtual machine reports "no files" on such partitions when trying to 'ls' them. That's why VirtualBox run /EFI/BOOT/bootx64.efi from a "wrong" partition. That results in unaccessible files (not only pictures, but kernel image also in my case)
The small change you noted in the UDK code relates to media block size. AFAIK, VirtualBox identifies virtualized CD media as having a conventional 2048-byte block size, so the change shouldn't affect it; and if it did, I'd expect the media to turn up as completely unreadable without the change. Thus, this doesn't look like a likely root cause. As I said, the last I checked, VirtualBox took unfinished ISO-9660 code from rEFIt, not from the UDK, and that's where I suspect the problem lies; however, I've not looked at the code very closely, and not recently, either -- my comments are based on my recollection when I looked into the code's origin several years ago.
Working around this bug by looking for accessible files would be tricky at best, and would almost certainly introduce new problems in other cases. Currently, rEFInd reads icons from a subdirectory of its own location. What you're suggesting is that, if this fails, rEFInd should go on a fishing expedition to find other icons. This would require a fair amount of extra code to find valid icons elsewhere, which in turn means that new bugs would likely crop up. Thus, I'm reluctant to do this. The bug is almost certainly not rEFInd's; it affects a very specialized use case; and it does not impair functionality, just aesthetics. In my experience, attempts to fix such minor and rare problems often create new and bigger problems. It's far better to address the source of the problem, which seems to be in VirtualBox.
As I mentioned earlier the issue is concerned not only to aesthetics in our use case.
We have to put linux kernel image to ElTorito partition. So in the same way like pictures aren't accessible, the kernel image also becomes inaccessible for boot loader, therefore boot gets stuck.
What I really meant requesting workaround was not to try to find icons in another location, but rather find new "another and right location" and start boot process from scratch if "wrong partition" was unfortunately selected by EFI boot algorithm, e.g.:
The example algorithm is not strict enough, but it's just for explanation.
How do you think is it worth to be implemented?
rEFInd should detect kernels on any and all media, so even with the problem caused by the ISO-9660 filesystem driver, kernels located elsewhere should be bootable. If the problem is that the kernel is only on an ISO-9660 filesystem, I don't see a way to work around that.
Here's another possible workaround, given that you're using VirtualBox: Instead of relying on an ISO-9660 image, create a virtual hard disk image that holds the same files in a FAT filesystem. This should work just as well in VirtualBox, and will bypass the ISO-9660 filesystem driver bug.