Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#22 failed "mount -o loop,encryption=xxx" leaves loop device

closed
nobody
None
5
2012-11-13
2011-05-24
Anonymous
No

I've just noticed that when I enter a wrong password with "mount -o loop,encryption=AES128", mount exits with an error being unable to recognize a filesystem and leaves an active loop device behind. So that after repeating the mount with a correct password, "losetup -a" reports two active loop devices for the partition (only one mounted, of course). When mount fails, it should also delete the loop device that was created by the "loop" mount option.

While this is primarily a problem of util-linux 2.19 mount, it does not seem to happen with
non-encrypted failed "mount -o loop".

I'm using loop-aes-3.6c and util-linux 2.19
(OpenSUSE 11.4 with 2.6.37 kernel)

-Martin

Discussion

1 2 > >> (Page 1 of 2)
  • Jari Ruusu
    Jari Ruusu
    2011-05-26

    So far, I have not been able to reproduce this error.

    Is it easily reproducible? How?

    What exact messages you see? Does it say "ioctl:
    LOOP_CLR_FD: Device or resource busy"?

    However,
    there is small but usually avoidable race or window of
    opportunity that can trigger this error case. It involves
    loop device status queries. When a loop device is detached
    from its backing device or file, kernel driver checks that
    loop device is not mounted and not opened by more than one
    process (the one process is the one actually doing the
    detach operation). If loop device is mounted or opened by
    more than one process, kernel driver refuses the detach
    operation, which leads to "ioctl: LOOP_CLR_FD: Device or
    resource busy" error message.

    Loop device status query involves opening loop device,
    reading status, and closing loop device. So, while loop
    device status query is going on, loop device open count is
    temporarily increased for short period of time. If loop
    device detach operation is attempted during this window of
    opportunity, the detach operation will fail, and lead to
    "ioctl: LOOP_CLR_FD: Device or resource busy" error message.

    So, avoiding loop device status queries during loop device
    detach operations is the key to avoiding this window of
    opportunity.

    Operations that may attempt loop device detach:

    1) Unmount of file system using loop device
    2) Wrong passphrase mount of file system using loop device
    3) Root user running losetup -d
    4) Swapoff of loop encrypted swap device

    Operations that do loop device status query:

    1) Loop device status query using losetup. Examples:
    "losetup -a", "losetup /dev/loop6", "losetup -f"
    2) Mount of file system using loop device, where the exact
    loop device to use is not specified. In this case mount
    has to resort to loop device status queries to find
    unused loop device. Example: "mount -o
    loop,encryption=AES128"

    How to avoid this window of opportunity:

    1) Always specify exact loop device for loop mount, either
    on command line or using /etc/fstab options. Example
    "mount -o loop=/dev/loop3,encryption=AES128"
    2) Do not run "watch losetup -a" on terminal/window/console.

     
  • Jari Ruusu
    Jari Ruusu
    2011-05-26

    Another question: On your setup, is /etc/mtab a normal file
    or a symlink to /proc/mounts ?

     
  • Thank you very much for the detailed analysis.

    I could be a race condition.

    I was able to repro the issue 3 times before I posted this report. Then I counter checked with loop-mounting a non-encrypted device (explicitly specifying an incompatible filesystem) and it did not leave a dangling loop device.

    But I'm currently facing difficulties to repro the issue myself . :-o

    I will investigate further, considering your background information and suggestions and come back.

    -Martin

     
  • Yup, there appears to be a race condition, now that you described it, I noticed the suspicious addtional error message:

    # mount /f2
    Password:
    ioctl: LOOP_CLR_FD: Device or resource busy
    mount: wrong fs type, bad option, bad superblock on /dev/loop0,
    missing codepage or helper program, or other error
    In some cases useful info is found in syslog - try
    dmesg | tail or so

    root@linux-xu84 [Documents] # losetup -a
    /dev/loop0: [0005]:1395 (/dev/sda8) encryption=AES128

    I'm currently running that linux installation in VMware Player, and I get the
    ioctl: LOOP_CLR_FD: Device or resource busy
    frequently, but not always. Sometimes the cleanup works fine.

    -Martin

     
  • Yup, there appears to be a race condition, now that you described it, I noticed the suspicious addtional error message:

    # mount /f2
    Password:
    ioctl: LOOP_CLR_FD: Device or resource busy
    mount: wrong fs type, bad option, bad superblock on /dev/loop0,
    missing codepage or helper program, or other error
    In some cases useful info is found in syslog - try
    dmesg | tail or so

    root@linux-xu84 [Documents] # losetup -a
    /dev/loop0: [0005]:1395 (/dev/sda8) encryption=AES128

    I'm currently running that linux installation in VMware Player, and I get the
    ioctl: LOOP_CLR_FD: Device or resource busy
    frequently, but not always. Sometimes the cleanup works fine.

    -Martin

     
  • Jari Ruusu
    Jari Ruusu
    2011-05-27

    Running it inside VMware is important info here. Kernel
    driver code that runs inside virtual machine may be subject
    to much longer code execution delays than bare metal
    hardware would.

    Last year (November 10 2010) loop-AES was changed to work
    around open/close reference count race that some kernels
    seem to have. The fix was to wait up to one second for
    open/close reference count to reach expected value. On bare
    metal hardware that is enough. When that race triggered, it
    caused that same "ioctl: LOOP_CLR_FD: Device or resource
    busy" error message.

    Now I am not sure if that one second is enough when the code
    runs inside virtual machine. My guess is that previously
    mentioned race triggered by another process reading loop
    status, is not the case here. That race still exists, but
    maybe that is not causing the error that you are seeing.

    I have attached a bzip2 compressed patch
    loop-AES-v3.6c-20110527.diff.bz2 that changes LOOP_CLR_FD
    ioctl() max wait delay from 1 second to 4 seconds. Max wait
    delay can't be indefinite because the real in-use case
    (device really mounted) has to wait full timeout before the
    code can decide that the device is really in-use.

    After applying this patch, and recompiling new loop.ko
    module, and loading new module to kernel, does it make it
    work more reliably?

     
  • Jari Ruusu
    Jari Ruusu
    2011-05-27

    Are you 100% sure that your system doesn't have any
    auto-mounter type system daemon process running that tries
    to "help" by probing new devices? If there is a such daemon
    process, then it can ruin loop device detach by having the
    loop device open at wrong time.

    If I create a patch that makes loop driver output info about
    when and what process opened / closed / attempted
    LOOP_CLR_FD ioctl() to kernel log, would you be willing to
    test that patch?

     
  • Jari Ruusu
    Jari Ruusu
    2011-05-28

    I uploaded new patch loop-AES-v3.6c-20110528.diff.bz2 that
    adds few debug outputs to kernel log. It prints what process
    opens / closes / issues ioctl(LOOP_CLR_FD) and when. Maybe
    this info provides clue who holds extra refence to a loop
    device during detach operation.

    After applying this patch, and recompiling new loop.ko
    module, and loading new module to kernel, you need to re-run
    your error test case and then output kernel error messages
    using "dmesg | grep '^loop'" command.

    Note that longer maximum wait (4 seconds) for detach
    operation may prevent error from happening, but debug time
    stamps and order of debug messages is important info even if
    error does not trigger.

     
1 2 > >> (Page 1 of 2)