#25 Panic/System Hang

closed-fixed
Crash (44)
9
2004-08-02
2004-06-02
Brian Bergstrand
No

Hi!

While trying to compile linux, I found a bug in ext2fsx: If I inflate
http://icculus.org/~schaffner/ext2fs-bugtrigger.tar.gz
on an ext2 partition (256MB) or disk image (10MB), and then from
this directory I execute
bash trigger.bash
then Mac OS X crashes.

The crash can have varying effects: once I saw the "You need to
restart your computer. Hold down the power button..." kernel
crash, other times the terminal window will not respond anymore
but other things still work, until I try to force-close that window.

Mac OS X 10.3.4
384 MB SDRAM
ext2fsx 1.2b2

--
Martin Schaffner <schaffner _at_ gmx _dot_ li>

BDB - Note, you have to chmod 755 trigger.bash before running it.

Discussion

  • Logged In: YES
    user_id=595265

    For some reason, I can't get the trigger archive to attach, so I've
    uploaded it to <http://www.bergstrand.org/brian/software/bin/
    ext2fs_bugtrigger.tar.gz>.

     
  • Logged In: YES
    user_id=595265

    I've reproduced the bug on a 10MB image and 256 MB image. The 10MB
    image runs out of space and then hangs or kp's (I've seen both). The
    256MB image has so far always kp'd. With the hang, the linker process is
    holding the inode spinlock while doing a cluster_push() from
    VOP_CLOSE(). It seems cluster_push() gets into the VOP_STRATEGY() of
    the disk image driver and never returns.

    The panic happens in the same place, but it actually happens on the Disk
    Image driver's I/O thread with an invalid physical page address error.

    I'm not sure where the problem is, but my first guesses are with
    VOP_PAGEOUT() or possibly the block size of the disk images.

    I've only seen this bug with this specific test case. It doesn't seem to be
    a general problem.

     
  • Logged In: YES
    user_id=595265

    Just tested this on a 10GB HD partition, and there was no problem --
    vmlinux is successfully linked. So, this leads me to belive the problem is
    with the device/fs block size.

     
  • Logged In: YES
    user_id=595265

    Okay, the problem is with fs block sizes less than 4KB. The device type
    does not matter. This trigger case causes a system hang or panic on any
    device containing an fs with a block size < 4KB.

    I verified this by formatting both the 10MB and 256MB disk image with '-
    b 4096'. With the 4KB block size, neither one hung or paniced (the link
    completed on the 256MB image, and returned a ENOSPC error on the
    10MB image). I then formatted a 4.5GB HD partition with '-b 2048' and
    ran the trigger. It hung in the exact same place as the disk images when
    they had a 1KB block size.

    I'm still not sure what is causing this particular test case to trigger the
    bug. It may have to do with file size -- maybe when the file gets into
    indirect or double indirect blocks.

     
  • Logged In: YES
    user_id=595265

    I did another test today.

    1. Untar the trigger archive onto an HFS volume, chmod 455 trigger.bash
    and run it. It completes and vmlinux is created.

    2. Create a 256MB disk image and format it has ext2

    $ hdiutil create -megabytes 256 -layout NONE ~/Desktop/e2tst256.dmg
    $ hdid -nomount ~/Desktop/e2tst256.dmg
    /dev/disk1
    $ newfs_fsck /dev/disk1
    # Note by default, this will create an fs with a 1KB block size

    3. cp -pR <HFS TRIGGER DIR> /Volumes/UNTITLED

    This copies the whole trigger dir, including the newly created vmlinux.

    This scenerio completes w/o problems, including the copy of vmlinux.
    This means that the write access pattern of the linker is causing the
    problem -- something specific to the way it creates the vmlinux file. So
    far, I've been unable to reproduce this bug with any other program.

    More debug info:

    It looks like the cause of this bug is a corruption of vp->v_clusters. I'm
    still not really sure though.

     
  • Logged In: YES
    user_id=595265

    A breakthrough! I found the why of this bug. The powerpc-750 linker
    creates a sparse file. It looks like it writes the exec header, then skips
    some # of KB and writes the rest of the file.

    I wrote my own simple test case, and this causes the same problem (all
    though it is more likely to hang than kp). The hang/crash happens in the
    same place: cluster_push() from VOP_CLOSE(). It doesn't happen in
    VOP_WRITE() because cluster writes are delayed until close or sync.

    Now to trace down the how of this problem. We're a little closer to a fix.

     
  • Logged In: YES
    user_id=595265

    More info.. Any size sparse file will do. It does not have to extend into
    the indirect blocks to cause the problem.

     
  • Logged In: YES
    user_id=595265

    And some more info...

    I've been having problems getting gdb to stay connected to my PB so I
    can trace what's happening (10.3.4). Eventually, after a few steps, the
    screen on my PB just goes dark, and I start getting kdp connection errors
    in GDB.

    So, I switched to Jaguar in the hopes I could get a trace going. Not only
    did that work, but I found out the hang/panic does not occur on Jag
    (10.2.8)! Instead, the system "hangs" for a few seconds while
    cluster_push() (from VOP_CLOSE()) syncs the file, and then the close
    completes. Now this is still not normal behavior, but at least there is no
    permanent hang.

    I was also able to trace what was going on. It looks like cluster_io()
    continually calls VOP_CMAP() trying to map higher and higher offsets.
    And since we are holding the VNODE spin lock while this happens, the
    machine is effectively locked. But for some reason, Jaguar eventually
    stops this loop while Panther doesn't.

    CMAP offsets for a sparse file with >1 <25 bytes written at offsets 0 and
    2048 -- this gives a file a little over 2K in size. But look at the offsets,
    they defintely get a lot bigger than 2K.

    a_foffset = 0,
    a_size = 2560,
    /* Sparse block here */
    a_foffset = 1024,
    a_size = 1536,
    /* And then we jump way past 2K */
    a_foffset = 5120,
    a_size = 131072,
    a_foffset = 9216,
    a_size = 131072,
    a_foffset = 13312,
    a_size = 131072,
    a_foffset = 17408,
    a_size = 131072,
    ...
    a_foffset = 46080,
    a_size = 131072,
    /* Stopped trace here */

     
  • Logged In: YES
    user_id=595265

    And some more info...

    I've been having problems getting gdb to stay connected to my PB so I
    can trace what's happening (10.3.4). Eventually, after a few steps, the
    screen on my PB just goes dark, and I start getting kdp connection errors
    in GDB.

    So, I switched to Jaguar in the hopes I could get a trace going. Not only
    did that work, but I found out the hang/panic does not occur on Jag
    (10.2.8)! Instead, the system "hangs" for a few seconds while
    cluster_push() (from VOP_CLOSE()) syncs the file, and then the close
    completes. Now this is still not normal behavior, but at least there is no
    permanent hang.

    I was also able to trace what was going on. It looks like cluster_io()
    continually calls VOP_CMAP() trying to map higher and higher offsets.
    And since we are holding the VNODE spin lock while this happens, the
    machine is effectively locked. But for some reason, Jaguar eventually
    stops this loop while Panther doesn't.

    CMAP offsets for a sparse file with >1 <25 bytes written at offsets 0 and
    2048 -- this gives a file a little over 2K in size. But look at the offsets,
    they defintely get a lot bigger than 2K.

    a_foffset = 0,
    a_size = 2560,
    /* Sparse block here */
    a_foffset = 1024,
    a_size = 1536,
    /* And then we jump way past 2K */
    a_foffset = 5120,
    a_size = 131072,
    a_foffset = 9216,
    a_size = 131072,
    a_foffset = 13312,
    a_size = 131072,
    a_foffset = 17408,
    a_size = 131072,
    ...
    a_foffset = 46080,
    a_size = 131072,
    /* Stopped trace here */

     
  • Logged In: YES
    user_id=595265

    Looks like I spoke too soon when I said Jag did not hang. The permanent
    hang on Jag happens during an unmount. It hangs in ubc_unmount(). This
    still isn't a system hang, but the umount process is stuck.

    0x11de3a80 0x90210 <switch_context+332>
    0x11de3ae0 0x34300 <thread_invoke+1104>
    0x11de3b40 0x344dc <thread_block_reason+212>
    0x11de3ba0 0x335c0 <thread_sleep_mutex+56>
    0x11de3bf0 0x58fb0 <vm_object_update+1668>
    0x11de3cd0 0x5874c <memory_object_lock_request+164>
    0x11de3d40 0x1f7064 <ubc_clean+244>
    0x11de3da0 0x1f6adc <ubc_umount+148>
    0x11de3e00 0xb7300 <dounmount+180>
    0x11de3e60 0xb7238 <unmount+216>
    0x11de3f10 0x20ff8c <unix_syscall+592>

     
  • Logged In: YES
    user_id=595265

    Okay, I found the loop that is calling our VOP_CMAP() repeatedly. It's in
    cluster_io(), xnu517.7.7/bsd/vfs/vfs_cluster.c:465. The thing is, that this
    loop should break once our file size is exceeded, but it's not. So this
    made me think the ubc size was not being updated, but breaking in
    VOP_CLOSE() right before cluster_push() we see:

    Breakpoint 1, ext2_close (ap=0x11dbbdc0) at src/gnu/ext2fs/
    ext2_vnops.c:488
    488 if (vp->v_type != type || vp->v_id != vid
    (gdb) p *vp->v_ubcinfo
    $9 = {
    ui_pager = 0x28e4dd4,
    ui_control = 0x28b8240,
    ui_flags = 7,
    ui_vnode = 0x29d6438,
    ui_ucred = 0x0,
    ui_refcount = 2,
    ui_size = 2068,
    ui_mapped = 0,
    ui_owner = 0x0
    }
    (gdb) p vp->v_clen
    $11 = 1
    (gdb) p vp->v_un1
    $12 = {
    v_ilk = {
    ...
    },
    v_cl = {
    v_pad = 0,
    v_c = {{
    start_pg = 0,
    last_pg = 1
    }, {
    start_pg = 0,
    last_pg = 0
    }, {
    start_pg = 0,
    last_pg = 0
    }, {
    start_pg = 0,
    last_pg = 0
    }}
    }
    }

    Which shows us ui_size == ip->i_size. But, cluster_io() is still repeatedly
    calling VOP_CMAP() with offsets way past the file size. So, next, I added
    the following to VOP_CMAP:

    if (ap->a_foffset > roundup(ip->i_size,
    (u_int64_t)EXT2_BLOCK_SIZE(fs)))
    return (EFBIG);

    This stopped the hangs in VOP_CLOSE(), but when we try to read back
    the file (with cat), we hang with:

    (gdb) bt
    #0 0x00090e88 in cswnovect ()
    #1 0x0009024c in switch_context (old=0x24e24bc, continuation=0,
    new=0x2631978) at /SourceCache/xnu/xnu-344.49/osfmk/ppc/pcb.c:248
    #2 0x000344dc in thread_block_reason (continuation=0, reason=0) at /
    SourceCache/xnu/xnu-344.49/osfmk/kern/sched_prim.c:1629
    #3 0x000335c0 in thread_sleep_mutex (event=0x898a280,
    mutex=0x26c99d4, interruptible=213884) at /SourceCache/xnu/xnu
    -344.49/osfmk/kern/sched_prim.c:680
    #4 0x00059e00 in memory_object_page_op (control=0x898a280,
    offset=0, ops=1073741840, phys_entry=0x11e63b70, flags=0x0) at /
    SourceCache/xnu/xnu-344.49/osfmk/vm/memory_object.c:1654
    #5 0x000aec04 in cluster_read_x (vp=0x29d6438, uio=0x11e63e40,
    filesize=2068, devblocksize=512, flags=564138624) at /SourceCache/
    xnu/xnu-344.49/bsd/vfs/vfs_cluster.c:2313
    #6 0x000ae82c in cluster_read (vp=0x11e63b70, uio=0x0,
    filesize=4611686087187536340, devblocksize=606216386,
    flags=606216386) at /SourceCache/xnu/xnu-344.49/bsd/vfs/
    vfs_cluster.c:2180
    #7 0x22548938 in ext2_read (ap=0x0) at src/gnu/ext2fs/
    ext2_readwrite.c:122
    #8 0x000be97c in vn_read (fp=0x0, uio=0x40000010, cred=0x0,
    flags=0, p=0x242220c2) at /SourceCache/xnu/xnu-344.49/bsd/sys/
    vnode_if.h:358

    So, after all of this, I'm still at a loss as to what is happening. Especially
    since UFS works fine (and the ext2 cluster code is basically the same).

     
  • Logged In: YES
    user_id=595265

    Okay, thanks to S. Vaughn (from SGI), I was able to figure out where
    this bug occurs. It's a bug in the kernel -- cluster_io() to be exact.
    Basically, cluster_io() will overflow the var it uses to track the size of the
    IO when a sparse file is read or written to a filesystem with a block size
    < PAGE_SIZE (4096 in all current Darwin releases). This causes the
    continuous calls to VOP_CMAP() and it can also cause data to be lost
    during a write.

    I suspect this same bug would happen with Apple UFS if it allowed < 4KB
    block sizes, but it currently doesn't.

    I'm going to open a bug with Apple and see what they say.

    Here are the gory details:

    I made an internal copy of cluster_io(), and put a bp on VOP_CLOSE()
    and traced into my cluster_io() copy.

    On a 1KB block filesystem, I wrote 17 bytes at offset 0, and then the
    same 17 bytes at offset 2048. This created a 2065 byte file and left a 1
    block hole in the file:

    i_size = 2065
    i_db = {547, 0, 548, 0, 0, 0, 0, 0, 0, 0, 0, 0},
    i_ib = {0, 0, 0},

    cluster_push(vp=0x29ae570)
    cluster_try_push (vp=0x29ae570, EOF=2065, can_delay=0, push_all=1)
    cluster_push_x (vp=0x29ae570, EOF=2065, first=0, last=1,
    can_delay=0)
    cluster_io (vp=0x29ae570, upl=0x29a4d80, upl_offset=0, f_offset=0,
    non_rounded_size=2065, devblocksize=512, flags=38, real_bp=0x0,
    iostate=0x0)
    // flags = CL_COMMIT | CL_AGE | CL_ASYNC

    388 size = (non_rounded_size + (devblocksize - 1)) & ~(devblocksize
    - 1);

    (gdb) p size
    $36 = 2560

    In the while loop on line 443 (vfs_cluster.c - xnu-517.7.7)

    The first time through the loop, VOP_CMAP() replies with:

    (gdb) p blkno
    $40 = 1094
    (gdb) p io_size
    $41 = 1024

    And we get to the end of the loop with:

    (gdb) p upl_offset
    $55 = 1024
    (gdb) p f_offset
    $56 = 1024
    (gdb) p size
    $57 = 1536

    This is all fine.

    Now the second time through the loop, we are on the sparse hole and
    VOP_CMAP() returns:

    (gdb) p blkno
    $61 = -1
    (gdb) p io_size
    $62 = 1024

    So now the following is true, and we enter that condition block:

    if ( (!(flags & CL_READ) && (long)blkno == -1) || io_size == 0) {
    ....
    482 upl_offset += PAGE_SIZE_64;
    (gdb)
    483 f_offset += PAGE_SIZE_64;
    (gdb)
    484 size -= PAGE_SIZE_64;
    (gdb)
    485 continue;
    }

    Andhere's where the trouble happens:

    (gdb) p upl_offset
    $66 = 5120
    (gdb) p f_offset
    $67 = 5120
    (gdb) p size
    $68 = 4294964736

    So not only does the size overflow, but the block after the hole is
    skipped!

    If io_size was used instead, then everything would be correct:

    upl_offset += io_size == 2048
    f_offset += io_size == 2048
    size -= io_size == 512

    In fact, this is what is done in the READ case:

    510 if ((flags & CL_READ) && (long)blkno == -1) {
    ...
    575 upl_offset += io_size;
    576 f_offset += io_size;
    577 size -= io_size;
    ...
    }

     
    • status: open --> open-fixed
     
  • Logged In: YES
    user_id=595265

    Okay, I found a workaround for this bug. We no longer support sparse
    files on < 4KB file systems. When the file is closed, any holes in the file
    will have disk blocks allocated to them. This wastes some space, but it
    solves the panics and system hangs. 1.3b2 contains this workaround, and
    with this version, the linker in the trigger case now completes.

     
    • status: open-fixed --> closed-fixed