Menu

Encrypted root partition: rEFInd hangs

2021-02-16
2021-02-21
  • Harish Rajagopal

    I'm using rEFInd as my boot loader. My setup consisted of a separate BTRFS /boot partition (/dev/sdb4: with the main contents in the @boot subvolume) to be mounted at /boot, refind_linux.conf, etc., a BTRFS root partition (/dev/sdb5: with the main contents in the @ subvolume), and an EFI partition (/dev/sdb1) to be mounted at /efi. So far, rEFInd was booting Arch (and Windows, since I have a dual-boot config) correctly.

    I decided to encrypt my root partition. I did it using cryptsetup's reencrypt command to make it a LUKS2 partition. I verified (in a live USB) that I can open the container and mount my BTRFS partition as before. However, rEFInd is now simply hanging when it starts. I can't even see the list of OSes.

    The only changes I made after the encryption are:
    Add the sd-encrypt hook in mkinitcpio.conf
    Change the kernel parameters in refind_linux.conf
    Run mkinitcpio -P
    Run refind-install

    I don't know why rEFInd isn't even showing up. In text mode, it just hangs at "rEFInd: Initializing", without throwing any errors.

    lsblk -f output (as seen from a live USB):

    NAME     FSTYPE      FSVER            LABEL       UUID                                 FSAVAIL FSUSE% MOUNTPOINT
    loop0    squashfs    4.0                                                                              
    sda                                                                                                   
    ├─sda1   ntfs                         Data        01D621BFF7721F90                      163.3G    75% /mnt/Data
    └─sda2   crypto_LUKS 2                            dfb58f5f-34e2-4c73-b990-fb6826d86f84                
    sdb                                                                                                   
    ├─sdb1   vfat        FAT32            EFI         2631-73D7                             214.4M    16% /efi
    ├─sdb2                                                                                                
    ├─sdb3   ntfs                         Windows     60BC37E7BC37B5FE                                    
    ├─sdb4   btrfs                        Arch-Boot   0f1c6f5f-6056-4ba7-8e7c-f22b1671c33b  394.3M    21% /boot/.snapshots
    └─sdb5   crypto_LUKS 2                            4fd15a7d-f552-4e12-9ac4-4bb3a05396d2                
      └─root btrfs                        Arch        f63b9c05-a4a3-4677-82b6-9f5b14377e7d   25.8G    43% /.snapshots
    sdc      iso9660     Joliet Extension ARCH_202101 2021-01-01-09-18-56-00                              
    ├─sdc1   iso9660     Joliet Extension ARCH_202101 2021-01-01-09-18-56-00                              
    ├─sdc2   vfat        FAT16            ARCHISO_EFI 96C2-37AA                                           
    └─sdc3                                                                                                
    sr0                                                                                                   
    

    The output doesn't show everything, namely how @boot and @snapshots on /dev/sdb4 are mounted at /boot and /boot/.snapshots. Similarly for my BTRFS subvolumes on /dev/sdb5 (inside the LUKS2 container).

    I've also attached mkinitcpio.conf, refind_linux.conf, and refind.conf.

    EDIT: I installed GRUB (alongside rEFInd), and it works perfectly. Seems like it's definitely a rEFInd issue.

     

    Last edit: Harish Rajagopal 2021-02-16
  • Roderick W. Smith

    My guess is that rEFInd's Btrfs driver is flaking out when it gets to the encrypted partition. If so, then the solution is to convert the separate Btrfs /boot partition to use another filesystem. If you're using Arch, FAT should be fine for /boot, and will require no extra drivers; or you could use any other filesystem for which you can find EFI drivers. Ext4fs should work well, and rEFInd ships with an ext4fs driver.

    You can test this by removing the ext4fs driver from EFI/refind/drivers_x64 on the ESP. That should get rEFInd to launch, but it won't be able to detect your Linux kernels. It should still be able to boot to GRUB, though. If that works, then you can try converting /boot and, if necessary, add a new EFI driver for whatever filesystem you use. If you adjusted rEFInd's auto-scanning configuration or used a manual boot stanza, you might need to adjust that after making the change, since you used a Btrfs subvolume that will no longer apply.

     
  • Harish Rajagopal

    Thanks for your reply! I tried removing either and both of the drivers in EFI/refind/drivers_x64 (without doing anything else), but it still failed to show the list of OSes. I think I'll just convert my /boot to ext4.

    Before I do that, do you want me to test something else or get some logs for the Btrfs driver? I like the snapshotting feature in Btrfs, so if you can get this driver fixed, I'll shift back to it.

     
  • Roderick W. Smith

    If you removed all the drivers from the drivers_x64 subdirectory of the directory from which rEFInd launches and it's still hanging, then converting to ext4fs will not help, since the problem is not with the drivers. rEFInd sometimes hangs because of buggy filesystem drivers or damaged filesystems, but this happens only if a relevant driver is loaded. If you've actually disabled the drivers, then this isn't the case, and you need to start looking for other causes. (OTOH, if you've actually deleted the wrong files -- say, from the rEFInd installation source that's not on the ESP -- then you could try again with disabling the drivers.)

    All that said, it's very strange that rEFInd should begin hanging now if a filesystem driver is not at fault. Have you upgraded rEFInd along with your other changes? If so, try backing out to the earlier version.

     
  • Harish Rajagopal

    I've tested removing everything (and individual files) in /efi/EFI/refind/drivers_x64 (by moving them to /efi/EFI, outside rEFInd's directory) with both versions 0.12.0 and 0.13.0 (since these are the only two versions available in Arch Linux's archives), and the problem persists. I was initially running 0.12.0, but upgraded later to 0.13.0 after the problem started, hoping that the newer version had a fix, but it didn't.

    In text mode, it just hangs at "rEFInd: Initializing", without throwing any errors.

    Is there a way to somehow get verbose output from the text mode? I guess it might show where and why it's hanging.

     

    Last edit: Harish Rajagopal 2021-02-17
  • Roderick W. Smith

    No, there's currently no debug output in rEFInd, aside from the occasional error message when something bad (but recoverable) happens. RefindPlus adds a logging feature, so you could try that, but the output might or might not be helpful in debugging rEFInd's problem. (RefindPlus has diverged a lot from rEFInd in a short time.) I do plan to add an optional logging feature to rEFInd, but I haven't even begun coding it yet.

    You may want to try rEFInd 0.11.5, which you can obtain from the SourceForge downloads page. You can simply copy the refind_x64.efi binary from the binary .zip file over your regular rEFInd binary on the ESP.

    All that said, I'm still puzzled about why it stopped working. It's plausible that your change to an encrypted filesystem would cause the Btrfs driver to fail; but then removing that filesystem driver would have restored the system to bootability. A change in rEFInd version (upgrading or downgrading) could also plausibly cause a failure, if a bug was introduced (or fixed) between versions; but your description suggests that wasn't the case -- or at least, that you've tried both the old known-working version and a newer version with no difference. All of this leads me to wonder if you're really launching the rEFInd version you think you are. That's easy to get wrong -- even I do it from time to time. Putting rEFInd on a USB flash drive (as EFI/BOOT/bootx64.efi) and using your firmware's boot manager to boot the flash drive can be a way to double-check this detail. Note that the rEFInd flash drive images I distribute include all the drivers, so if you start with that image, you should probably delete or move the drivers before you begin.

     
  • joevt

    joevt - 2021-02-17

    RefindPlus has a log file saved to the EFI partition. You can try that temporarily (I think the conf file needs to be renamed?) to see if it crashes at a similar point.

     
  • Harish Rajagopal

    I tried running v0.11.5 on a USB using the flashdrive zip. I tested with all drivers, with all but Btrfs, and with no drivers. The same error occurs. I'm attaching the blank picture for reference.

    Thanks for the RefindPlus reference! I used v0.12.0.AQ on top of the USB where I flashed rEFInd v0.11.5. I ran it with the default config, except I made it use text mode. Then, I see that it flashes an error message for about a second before blanking out and hanging, both with and without drivers. The message is:

    Error: No Media Found While Reading Boot Sector on Volume Below

    After that, I got the logs that were saved to the USB (in the EFI partition of the USB). I'm attaching them here too. Could you have a look at them?

    FYI, the way I was changing rEFInd versions was by installing a new version through pacman (Arch's package manager), then running refind-install, which I hoped would change the rEFInd version in the EFI partition.

     

    Last edit: Harish Rajagopal 2021-02-17
  • Roderick W. Smith

    The RefindPlus logs show that, whether the drivers were loaded or not, RefindPlus finished most or all of the scans and then hung. The one scan where it might be hanging is in parsing the manual boot stanzas. Your refind.conf file doesn't specify any new manual boot stanzas (beyond the sample ones that have been parsed successfully by thousands of rEFInd installations for years), so it shouldn't be causing problems; however, it does include a secondary configuration file (refind-theme-regular/theme.conf) on the last line. If that file has an error or is defining a bad entry, that might conceivably cause a hang. It's a long shot, but I suggest commenting out that line in refind.conf.

    Another thing you might try: Uncomment the scanfor line in refind.conf and remove the manual option from that line. This should prevent rEFInd from running the "scans" for manual boot stanzas. Try this both with my rEFInd and with RefindPlus. Neither program should be hanging when doing this, but looking at the code, it's conceivable that an error when reading the files might cause a hang.

    Along those lines, it occurs to me that filesystem damage on the ESP might be causing problems when rEFInd tries to read the configuration file to parse the manual boot stanzas. (rEFInd can actually read that file multiple times, depending on the configuration options.) If this is the problem, then doing a filesystem check on the ESP might fix the problem. As a more radical solution, backing up the ESP (using cp, zip, tar, or other file-level tools), unmounting it, creating a fresh FAT filesystem on it, and restoring the files might fix the problem. That's also a bit of a long shot, and if you're not careful, you could cause further damage; but if you're comfortable with this sort of thing and take care when doing it, it shouldn't take long to do.

     

    Last edit: Roderick W. Smith 2021-02-17
    • JT Moree

      JT Moree - 2021-02-18

      it may not be refined. Is it possible that the refined is booting the
      kernel and init RD but it is asking you for a password to decrypt Luk's and
      you can't see it because splash and quiet are turned on.

      On Wed, Feb 17, 2021, 4:58 PM Roderick W. Smith srs5694@users.sourceforge.net wrote:

      The RefindPlus logs show that, whether the drivers were loaded or not,
      RefindPlus finished most or all of the scans and then hung. The one scan
      where it might be hanging is in parsing the manual boot stanzas. Your
      refind.conf file doesn't specify any new manual boot stanzas (beyond the
      sample ones that have been parsed successfully by thousands of rEFInd
      installations for years), so it shouldn't be causing problems; however, it
      does include a secondary configuration file (
      refind-theme-regular/theme.conf) on the last line. If that file has an
      error or is defining a bad entry, that might conceivably cause a hang. It's
      a long shot, but I suggest commenting out that line in refind.conf.

      Another thing you might try: Uncomment the scanfor line in refind.conf
      and remove the manual option from that line. This should prevent rEFInd
      from running the "scans" for manual boot stanzas. Try this both with my
      rEFInd and with RefindPlus. Neither program should be hanging when
      doing this, but looking at the code, it's conceivable that an error when
      reading the files might cause a hang.

      Along those lines, it occurs to me that filesystem damage on the ESP might
      be causing problems when rEFInd tries to read the configuration file to
      parse the manual boot stanzas. (rEFInd can actually read that file multiple
      times, depending on the configuration options.) If this is the problem,
      then backing up the ESP (using cp, zip, tar, or other file-level tools),
      unmounting it, creating a fresh FAT filesystem on it, and restoring the
      files might fix the problem. That's also a bit of a long shot, and if
      you're not careful, you could cause further damage; but if you're
      comfortable with this sort of thing and take care when doing it, it
      shouldn't take long to do.


      Encrypted root partition: rEFInd hangs
      https://sourceforge.net/p/refind/discussion/general/thread/82dbaa4272/?limit=25#b16d


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/refind/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • Harish Rajagopal

        The problem isn't rEFInd hiding the encryption dialog. The dialog that is raised before booting the kernel is when the boot partition is itself encrypted. rEFInd doesn't even support that, so I didn't do it. My boot partition is unencrypted, only my root partition is.

        In my case, the encryption dialog shows up during boot. I use the quiet and splash parameters because I use Plymouth for a graphical boot, and it supports encryption dialogs. It shows up after I tell the boot loader (GRUB or rEFInd) to boot my encrypted Arch Linux root, gives me a graphical password entry box, and then continues booting.

        My issue was before boot, because rEFInd wasn't giving me the list of OSes installed.

         

        Last edit: Harish Rajagopal 2021-02-18
  • Harish Rajagopal

    SOLVED!!!

    Removing the manual option in scanfor solved the issue. I first tried this with RefindPlus on the USB, which worked (the error message still flashed for a second, but it went ahead and showed me the boot options). Then I tried rEFInd 0.11.5 on the USB, then 0.13.0 on my laptop's EFI partition, and all of them worked after changing scanfor in their config files.

    Now I'll experiment with the config on the USB to see which manual stanza causes this, and report here, so that this can get fixed.

    Thanks a lot for your help and patience!

     
  • Harish Rajagopal

    Got the source of the error. It's in the Arch Linux manual stanza. Specifically, this line:

        volume   "Arch Linux"
    

    I restored the scanfor option to the default one, i.e. commented out my changes. Then I changed that line to:

        volume   "My Arch Linux"
    

    It works now.

    I know why this is causing the issue. It's because my encrypted root partition is actually named "Arch Linux" (GParted says that this is the partition name, not the label), and rEFInd is trying to get into that partition to see if the other options (like loader, initrd, etc.) are valid or not. The disabled option just "hides" this manual stanza - it doesn't prevent rEFInd from parsing and trying to interpret it, which is what the scanfor option does.

    I guess this is a problem specific to my setup, so I don't know if this is something that can be "fixed". Maybe changing the volume option in this default stanza to something like "Arch Linux Volume Name" would be "safer"? Or should the disabled option be changed so that rEFInd completely ignores that stanza?

     

    Last edit: Harish Rajagopal 2021-02-18
  • Roderick W. Smith

    Thanks for the follow-up, and I'm glad you've gotten it working. Doing disabled the way it does (where rEFInd does most of the parsing and work but then throws it away when it hits disabled) was convenient and easy from a coding perspective, but you've obviously discovered a big flaw in that approach. I'll have to look over the code and give it some thought to see if there's a better/safer way to do that....

     
  • dakanji

    dakanji - 2021-02-18

    Please try this build of RefindPlus with your original manual stanza. That is, don't remove or comment anything out from the original setup.

    @srs5694 ... Just includes an initial scan for the disabled flag in AddStanzaEntries
    If it works, I can issue a merge request.

     

    Last edit: dakanji 2021-02-18
  • Harish Rajagopal

    Thanks for the build! I tested it, both with the default config, and after modifying the volume line. Both of them work exactly the same! Please go ahead an issue a merge request.

     

    Last edit: Harish Rajagopal 2021-02-18
  • Roderick W. Smith

    I'm afraid that @Dayo's fix is badly broken; it basically skips over any entry that's not hidden, and leaves the in-file pointer in an inconsistent state. It happens to resolve your problem, @Harish Rajagopal, but creates others in the process.

    I've tried another approach that I hope will be more robust, although I'm not 100% positive it will cope with your encrypted Btrfs volume. Could you please give this one a try with manual still enabled in scanfor? Thanks.

    https://www.rodsbooks.com/refind-bin-0.13.0.2.zip

     
  • dakanji

    dakanji - 2021-02-19

    Aim was to skip things looking specifically for one token but I see what you mean with the pointer which may need to be reset to the original position after the first loop.

    Not yet sure that it is actually in a wrong position but will review.

     

    Last edit: dakanji 2021-02-19
  • Harish Rajagopal

    @srs5694 I tried out this version with the default refind.conf, and it works!

     
    • Roderick W. Smith

      Thanks for checking. I've just pushed the update to rEFInd's git repository.

       
      • Harish Rajagopal

        Thanks for the bug fix! And thanks a lot for your help throughout the process!

         
  • dakanji

    dakanji - 2021-02-21

    Dropped the problematic implementation and synced with Rod's.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.