Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#147 deadlock on loop mounted fs

default
closed-fixed
Roger Tsang
Filesystem (49)
5
2010-03-13
2007-10-11
John Hughes
No

1. Make a sparse file

perl -e 'open BIGFILE, ">BIGFILE"; seek BIGFILE, 1024 * 1024 * 1024, 0; print BIGFILE "big"'

2. make a filesystem on it

losetup /dev/loop/0 BIGFILE
mkfs -t ext3 /dev/loop/0

3. mount it

mount -t ext3 /dev/loop/0 /mnt

4. write a lot of files to it

cd /mnt
dump 0f - / | restore rf -

eventualy the node where we are writing to the loopback mounted fs gets deadlocked. It's still up as far as the cluster is concerned, but any attempt to start a process on it blocks.

Discussion

  • John Hughes
    John Hughes
    2007-10-11

    Logged In: YES
    user_id=166336
    Originator: YES

    This is with the 2.6.11 kernel

     
  • John Hughes
    John Hughes
    2007-10-11

    • labels: --> Filesystem
    • milestone: --> 663197
     
  • Roger Tsang
    Roger Tsang
    2007-10-12

    Logged In: YES
    user_id=1246761
    Originator: NO

    Sounds like [ 686748 ] Filesystem stacking deadlock.

     
  • John Hughes
    John Hughes
    2007-10-12

    Logged In: YES
    user_id=166336
    Originator: YES

    Here's some debugging. I've got to the point where the "restore" process on node 1 seems hung. On node 2 I try an "onnode 1 pwd". It hangs.

    One node 1:

    Entering kdb (current=0xc0502bc0, pid 0) on processor 0 due to Keyboard Entry
    [0]kdb> ps
    1 idle process (state I) and 50 sleeping system daemon (state M) processes suppressed
    Task Addr Pid Parent [*] cpu State Thread Command

    0xcf82a5d0 5 2 0 0 R 0xcf82a7b0 events/0
    0xcf68b990 117 11 0 0 D 0xcf68bb70 pdflush
    0xcf68a310 121 2 0 0 D 0xcf68a4f0 cfs_async
    0xcf6b99b0 122 2 0 0 D 0xcf6b9b90 cfs_async
    0xcf6b9410 123 2 0 0 D 0xcf6b95f0 cfs_async
    0xcf6b8e70 124 2 0 0 D 0xcf6b9050 cfs_async
    0xcf6b88d0 125 2 0 0 D 0xcf6b8ab0 cfs_async
    0xcf6b8330 126 2 0 0 D 0xcf6b8510 cfs_async
    0xcf6c99d0 127 2 0 0 D 0xcf6c9bb0 cfs_async
    0xcf6c9430 128 2 0 0 D 0xcf6c9610 cfs_async
    0xce92b730 1 0 0 0 D 0xce92b910 init
    [...]
    0xce90b150 67763 2 0 0 D 0xce90b330 loop0
    0xce90d170 67820 2 0 0 D 0xce90d350 kjournald
    0xce8f96d0 67822 67636 0 0 S 0xce8f98b0 dump
    0xce8f8b90 67823 67636 0 0 D 0xce8f8d70 restore
    0xcf13f970 67824 67822 0 0 S 0xcf13fb50 dump
    0xcf13f3d0 67825 67824 0 0 S 0xcf13f5b0 dump
    0xcf7861f0 67826 67824 0 0 S 0xcf7863d0 dump
    0xcf786790 67827 67824 0 0 S 0xcf786970 dump
    0xcf47d9b0 132773 2 0 0 D 0xcf47db90 onnode
    [0]kdb> btp 132773
    Stack traceback for pid 132773
    0xcf47d9b0 132773 2 0 0 D 0xcf47db90 onnode
    EBP EIP Function (args)
    0xce879ba8 0xc046c2e6 schedule+0x3a6 (0xce879c10)
    0xce879bb4 0xc046d348 io_schedule+0x28 (0xc1271c70)
    0xce879bc0 0xc014aed5 sync_page+0x45 (0xc10c37f8, 0x0, 0xc014ae90, 0xcf47d9b0, 0xce879c10)
    0xce879be0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc10c37f8, 0xc10c37f8, 0x0, 0x0)
    0xce879c3c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xce7c31a0, 0x0, 0x1)
    0xce879cc4 0xc014beeb do_generic_mapping_read+0x3db (0xce88ca00, 0xce7c31f0, 0xce7c31a0, 0xce879e00, 0xce879d00)
    0xce879d1c 0xc014c3ed __generic_file_aio_read+0x1ed (0xce879dc4, 0xce879d34, 0x1, 0xce879e00, 0xcf06d600)
    0xce879d48 0xc014c473 generic_file_aio_read+0x53 (0xce879dc4, 0xcf06d600, 0x80, 0x0, 0x0)
    0xce879d84 0xc028375a __cfs_file_read+0xaa (0xce879dc4, 0x0, 0xcf06d600, 0x80, 0xce879da0)
    0xce879da8 0xc0283828 cfs_file_aio_read+0x38 (0xce879dc4, 0xcf06d600, 0x80, 0x0, 0x0)
    0xce879e50 0xc016c3b3 do_sync_read+0xa3 (0xce7c31a0, 0xcf06d600, 0x80, 0xce879e8c, 0xce879000)
    0xce879e74 0xc016c490 vfs_read+0xb0 (0xce7c31a0, 0xcf06d600, 0x80, 0xce879e8c, 0x0)
    0xce879e9c 0xc017895a kernel_read+0x4a (0xce7c31a0, 0x0, 0xcf06d600, 0x80, 0xcf06d600)
    0xce879ec0 0xc017946a prepare_binprm+0xca (0xcf06d600, 0x7fff, 0xc13b4080, 0x0, 0x0)
    0xce879eec 0xc0179a16 ssi_do_execve+0x1a6 (0xcf012920, 0xce6f8800, 0xcf6aa400, 0xce879fa0, 0x0)
    0xce879f78 0xc0245c3a rexecve_server+0xea (0xcf50e000, 0xcf47d9b0, 0xcf012920, 0xce6f8800, 0xcf6aa400)
    0xce879fec 0xc02454f5 rexecve_server_setup+0x55
    0xc01023a5 kernel_thread_helper+0x5
    [0]kdb> btp 67823
    Stack traceback for pid 67823
    0xce8f8b90 67823 67636 0 0 D 0xce8f8d70 restore
    EBP EIP Function (args)
    0xca2fcea0 0xc046c2e6 schedule+0x3a6 (0x0, 0xce8f8b90, 0xc013f0a0, 0xca2fced4, 0xca2fced4)
    0xca2fcef4 0xc029f3ba cfs_wait_on_request+0x7a (0xc9a8c200, 0xca2fcf14, 0x0, 0x1, 0x0)
    0xca2fcf24 0xc0285a9e cfs_wait_on_requests+0x8e (0xccb63be4, 0x0, 0x0, 0x0, 0xce7c3600)
    0xca2fcf48 0xc0286f66 cfs_sync_inode+0x76 (0xccb63be4, 0x0, 0x0, 0x2, 0x0)
    0xca2fcf80 0xc0283653 cfs_file_flush+0x93 (0xce7c3600, 0x81a4, 0xccdef200, 0x5, 0xccdef204)
    0xca2fcf9c 0xc016bb3c filp_close+0x6c (0xce7c3600, 0xccdef200, 0xce7c3600, 0x5, 0x0)
    0xca2fcfbc 0xc016bbce sys_close+0x6e
    0xc0105a3b syscall_call+0x7
    [0]kdb>
    [0]kdb> btp 67763
    Stack traceback for pid 67763
    0xce90b150 67763 2 0 0 D 0xce90b330 loop0
    EBP EIP Function (args)
    0xca488db8 0xc046c2e6 schedule+0x3a6 (0xca488e20)
    0xca488dc4 0xc046d348 io_schedule+0x28 (0xc12711e0)
    0xca488dd0 0xc014aed5 sync_page+0x45 (0xc11d6be0, 0x0, 0xc014ae90, 0xce90b150, 0xca488e20)
    0xca488df0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc11d6be0, 0xc11d6be0, 0x0, 0x0)
    0xca488e4c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xcd6ca600, 0x38002, 0x1)
    0xca488ed4 0xc014beeb do_generic_mapping_read+0x3db (0xcb632f40, 0xcd6ca650, 0xcd6ca600, 0xca488f58, 0xca488ef4)
    0xca488f04 0xc014c61b generic_file_sendfile+0x5b (0xcd6ca600, 0xca488f58, 0x1000, 0xd08f15d0, 0xca488f60)
    0xca488f3c 0xc02838bd cfs_file_sendfile+0x8d (0xcd6ca600, 0xca488f58, 0x1000, 0xd08f15d0, 0xca488f60)
    0xca488f74 0xd08f16fc [loop]do_lo_receive+0x5c (0xc9353000, 0xc4279630, 0x1000, 0x38002000, 0x0)
    0xca488fa4 0xd08f176e [loop]lo_receive+0x5e (0xc9353000, 0xc1ed33e0, 0x1000, 0x38002000, 0x0)
    0xca488fc8 0xd08f17eb [loop]do_bio_filebacked+0x4b (0xc9353000, 0xc1ed33e0, 0x0, 0xc9353138, 0xd08f1a60)
    0xca488fec 0xd08f1b3b [loop]loop_thread+0xdb
    0xc01023a5 kernel_thread_helper+0x5

     
  • Logged In: NO

    Still looks the same as the old bug... This time it is stacked generic_file_writev().

    cfs_async (has i_sem)
    loop0
    pdflush
    kjournald
    cfs_async (waiting for i_sem)

     
  • Roger Tsang
    Roger Tsang
    2007-10-20

    Logged In: YES
    user_id=1246761
    Originator: NO

    Does 2.6.10-ssi run into this bug?

     
  • Roger Tsang
    Roger Tsang
    2007-10-20

    • milestone: 663197 --> 767946
     
  • Roger Tsang
    Roger Tsang
    2007-10-21

    Logged In: YES
    user_id=1246761
    Originator: NO

    It looks like CFS ran out of memory. Try the latest checkin of kernel/cluster/ssi/cfs code that re-enables commit for soft mounts.

     
  • Roger Tsang
    Roger Tsang
    2008-03-17

    Logged In: YES
    user_id=1246761
    Originator: NO

    Should be fixed in 2.0.0pre3...

     
  • Roger Tsang
    Roger Tsang
    2008-06-14

    • assigned_to: nobody --> rogertsang
     
  • Roger Tsang
    Roger Tsang
    2008-06-19

     
  • Roger Tsang
    Roger Tsang
    2008-06-19

    Logged In: YES
    user_id=1246761
    Originator: NO

    Try the attached patch.

    More work would need to be done to pass a flag to kernel space for CFS to use a different congestion bit in the case of CFS on loopback. However the proposed solution only works if you are not going to CFS mount another loopback on top of a CFS mount on loopback on CFS. So the simple fix would be this patch. Loopback becomes a standard mount.
    File Added: util-linux.1811510.patch

     
  • Roger Tsang
    Roger Tsang
    2008-07-04

    • status: open --> open-fixed
     
  • Roger Tsang
    Roger Tsang
    2008-10-10

    checked-in

     
  • Roger Tsang
    Roger Tsang
    2009-03-22

    • status: open-fixed --> open-remind
     
  • Roger Tsang
    Roger Tsang
    2009-03-22

    Testing new code to associate separate BDI per mount. This should allow us to support recursively stacked CFS.

     
  • Roger Tsang
    Roger Tsang
    2009-03-22

    • milestone: 767946 --> default
     
  • Roger Tsang
    Roger Tsang
    2009-03-24

    checked-in but needs verification

     
  • Roger Tsang
    Roger Tsang
    2009-03-24

    • status: open-remind --> open-accepted
     
  • Roger Tsang
    Roger Tsang
    2009-10-27

    • status: open-accepted --> open-fixed
     
  • Roger Tsang
    Roger Tsang
    2010-03-13

    • status: open-fixed --> closed-fixed