Menu

General Protection Fault in refind.efi +00DD5C on iDRAC

Frigo
2021-03-08
2021-03-17
  • Frigo

    Frigo - 2021-03-08

    Hello! I gave refind.efi a try for the first time. On the second run I am hitting a general protection fault. Any idea what I am doing wrong?

    I am running refind-bin-0.13.1.zip

    I am attaching the picture, following is the content of it as parsed by an OCR, with some typos.

    I can reproduce the issue by opening the HTML5 console, booting into refind, then closing the console windows and reopening it.

    Regards

    PowerEdge C6420 - BIOS 2.9.3
    Virtual Media
    Disconnect Viewer
    Console Controls
    system restart is required. The system detected an exception during
    the UEFI pre-boot
    environment. Check serial output or iDRAC debug logs for detai led information.
    Type: General Protection Fault (13) Source: Software (UEFI0011) on BSP
    RAX-000000006244E068
    RCX=000000006244E068
    RIO=0000000065FB2558
    R14=OOOOOOOOOOOOOOOO
    RIP=0000000050662D5C
    LastHsg :
    RBX=000000005068FOEO
    RDX=0000000052A0F220
    RII=0000000000000004
    R15=OOOOOOOOOOOOOOOO
    Flags=00010202
    RSI-OOOOOOOOOOOOOOOI
    R12=OOOOOOOOOOOOOOOO
    RBP=OOOOOOOOOOOOOOOO
    RDI-0000000052AOF220
    B=oooooooooooooooo
    R13=OOOOOOOOOOOOOOOO
    RSP=0000000052AOF140
    CurrentTPL = 04, LastEventT ime 0000001F3422
    LBRfr2
    51214794 ConSplitterDxe .efi +006794
    LBRfr1 50662D70 refind .efi +00DD70
    LBRt01 50662DEA refind .efi +OODDEA
    LBRfrO 50662DED refind.efi +OODDED
    LBRt00 50662D23 refind .efi +OODD23
    -->RIP 50662D5C refind .efi +OODD5C
    Stack trace not
    available
    Use arrow keys to move cursor; Enter to boot;
    Tab, or F2 for more options; ESC or Backspace to refresh
    Insert,

     
  • Roderick W. Smith

    I've fixed some memory management bugs since the 0.13.1 release. You may want to give the latest version from the git repository a try, or this binary build:

    https://www.rodsbooks.com/refind-bin-0.13.1.12.zip

    If that doesn't help, please try enabling logging in rEFInd by setting log_level 4 in refind.conf. This should produce a log file called refind.log in rEFInd's home directory. You'll need to access it by booting in some other way and send it to me.

     
  • Frigo

    Frigo - 2021-03-11

    tail /vagrant/refind.log
    07:00:52 - Loading file 'icons\mouse.png'
    07:00:52 - Scaling image to 16 x 16
    07:00:52 - Scaling of image complete

    ==========Entering main loop==========
    07:00:52 - Entering RunMainMenu()
    07:00:52 - Running menu screen: 'Main Menu'
    07:00:52 - Scaling image to 64 x 64
    07:00:52 - Scaling image to 144 x 144

     
  • Roderick W. Smith

    Sorry it's taken a while to respond; this one just sort of slipped off my radar. I've made a change to a pre-release build that might help with your problem, but that's far from certain. Could you please try the following version?

    https://www.rodsbooks.com/refind-bin-0.13.2.2.zip

    If that doesn't help, please send me another log file.

     
  • Frigo

    Frigo - 2021-03-16

    Thanks for your time. Problem persists

     
  • Roderick W. Smith

    Thanks to some new logging lines in rEFInd, I think I see at least part of what's going on: rEFInd is receiving a huge number of keystrokes (282) and pointer (mouse or touch screen; 101) events over a short period of time (5 seconds). These inputs are clearly bogus; there's no way you could be typing ~50 keystrokes per second. I have some suggestions:

    • Check your refind.conf file and disable the mouse (enable_mouse) and touchscreen (enable_touch) options, if they're enabled. If neither of these is currently enabled, please tell me and I'll take a closer look at the code to try to figure out how log entries implying that one of them is enabled are being generated.
    • I've posted a new version (link below) that adds information on the precise keystrokes received. This may provide more information, or enable a workaround -- for instance, if the keystroke is something nonsensical, I could program rEFInd to ignore it.
    • Check to ensure that nothing is plugged into the computer's USB ports except (if appropriate) a real keyboard and a real mouse. If some oddball device is plugged in, it's conceivable that the firmware is misidentifying it as a keyboard and/or mouse and generating these bogus inputs.
    • Check your BMC's settings to ensure that it's not doing something weird to generate bogus keyboard/mouse inputs. I have no specific suggestions for things to examine; I don't recall ever seeing anything in a BMC that might produce such problems. Still, it's worth checking.
    • Do the same with the computer's main firmware settings. Many desktop/laptop computers have a "fast boot" option that (among other things) disables keyboard input during POST. Such a setting, if malfunctioning, might conceivably create the sort of problem you're seeing. I don't know if your server might have something comparable, though.
    • Update your firmware (both the computer's main UEFI and the BMC's firmware) to the latest. It's possible that this is caused by a firmware bug, and it's conceivable that Dell has already released a fix.
    • If possible, try shutting off the BMC's remote access and testing in person with a real keyboard and mouse.
    • Even if you can't test in person, if the computer has no keyboard and mouse plugged in, try plugging one in while doing your testing via the BMC's remote KVM. It's conceivable that the firmware is generating "phantom" keypresses because there's no physical keyboard plugged in.

    Here's the new test version:

    https://www.rodsbooks.com/refind-bin-0.13.2.3.zip

    I'll also review the code some more. Theoretically, rEFInd shouldn't crash even under the load of a ridiculous number of bogus keystrokes as input, so there may be some subtle flaw in rEFInd's keystroke and/or mouse/touchpad handling. OTOH, it could be that it's the firmware that's crashing, not rEFInd -- rEFInd could just be processing the events and asking for more too quickly for the firmware to handle.

     
  • Frigo

    Frigo - 2021-03-16

    nice! don't underestimate my typing skills ;)
    Logs are basically filled with Processing keystroke 91 and end with 108

    17:41:32 - Processing keystroke (UnicodeChar = 91)....
    17:41:32 - Entering WaitForInput(), Timeout = 0
    17:41:32 - Processing keystroke (ScanCode = 0)....
    17:41:32 - Processing keystroke (UnicodeChar = 91)....
    17:41:32 - Entering WaitForInput(), Timeout = 0
    17:41:32 - Processing keystroke (ScanCode = 0)....
    17:41:32 - Processing keystroke (UnicodeChar = 91)....
    17:41:32 - Entering WaitForInput(), Timeout = 0
    17:41:32 - Processing keystroke (ScanCode = 0)....
    17:41:32 - Processing keystroke (UnicodeChar = 91)....
    17:41:32 - Entering WaitForInput(), Timeout = 0
    17:41:32 - Processing keystroke (ScanCode = 0)....
    17:41:32 - Processing keystroke (UnicodeChar = 91)....
    17:41:32 - Entering WaitForInput(), Timeout = 0
    17:41:32 - Processing keystroke (ScanCode = 0)....
    17:41:32 - Processing keystroke (UnicodeChar = 91)....
    17:41:32 - Entering WaitForInput(), Timeout = 0
    17:41:32 - Processing keystroke (ScanCode = 0)....
    17:41:32 - Processing keystroke (UnicodeChar = 108)....
    17:41:32 - Entering WaitForInput(), Timeout = 0
    

    I only run the latest and greatest firmwares from Dell. However it's not the first time I see this symptom (sometimes during a reboot, when the console window is opened, the system would complain I have a key stuck!)

    Note that this versionof refind (maybe because debug logs) is super slow. touch disabled, enable_mouse is enabled.

    EDIT: The problem DOES NOT happen when enabled_mouse disabled. I could not reproduce it at least.

    (last edit) just opened the window again and I see a bunch of [ in there. I will do my best to notify Dell about this, I think it's a recurring problem with the iDRAC HTML5 Window that "opens with a stuck key"

    (last last edit) or maybe it's my laptop who indeed has a stuck control key :D

     

    Last edit: Frigo 2021-03-16
  • Roderick W. Smith

    According to Wikipedia, Unicode 91 is left square bracket ([) and 108 is lowercase L (l), so the rEFInd logs track with your observation of seeing a bunch of the former.

    I'll take a look at the mouse code. It could be that it's triggering or interacting with the Dell's BMC. I tried poking around with a Dell C6320p to which I have access a day or two ago, but I couldn't reproduce your problem -- but I also didn't enable the mouse support. I'll go back and try that, though. At least you have a workaround, though.

     
  • Roderick W. Smith

    Oh, and yes, the logging can slow down rEFInd quite a bit, particularly if disk accesses are slow on a particular EFI implementation. Disabling logging should help.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.