[SSI-devel] [ ssic-linux-Bugs-1811510 ] deadlock on loop mounted fs
Brought to you by:
brucewalker,
rogertsang
From: SourceForge.net <no...@so...> - 2008-07-04 03:22:25
|
Bugs item #1811510, was opened at 2007-10-11 08:22 Message generated for change (Settings changed) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1811510&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Filesystem Group: v1.9.3 Status: Open >Resolution: Fixed Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Roger Tsang (rogertsang) Summary: deadlock on loop mounted fs Initial Comment: 1. Make a sparse file perl -e 'open BIGFILE, ">BIGFILE"; seek BIGFILE, 1024 * 1024 * 1024, 0; print BIGFILE "big"' 2. make a filesystem on it losetup /dev/loop/0 BIGFILE mkfs -t ext3 /dev/loop/0 3. mount it mount -t ext3 /dev/loop/0 /mnt 4. write a lot of files to it cd /mnt dump 0f - / | restore rf - eventualy the node where we are writing to the loopback mounted fs gets deadlocked. It's still up as far as the cluster is concerned, but any attempt to start a process on it blocks. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-06-19 02:32 Message: Logged In: YES user_id=1246761 Originator: NO Try the attached patch. More work would need to be done to pass a flag to kernel space for CFS to use a different congestion bit in the case of CFS on loopback. However the proposed solution only works if you are not going to CFS mount another loopback on top of a CFS mount on loopback on CFS. So the simple fix would be this patch. Loopback becomes a standard mount. File Added: util-linux.1811510.patch ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2008-03-16 20:35 Message: Logged In: YES user_id=1246761 Originator: NO Should be fixed in 2.0.0pre3... ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-10-20 21:58 Message: Logged In: YES user_id=1246761 Originator: NO It looks like CFS ran out of memory. Try the latest checkin of kernel/cluster/ssi/cfs code that re-enables commit for soft mounts. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-10-20 14:42 Message: Logged In: YES user_id=1246761 Originator: NO Does 2.6.10-ssi run into this bug? ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-10-16 10:33 Message: Logged In: NO Still looks the same as the old bug... This time it is stacked generic_file_writev(). cfs_async (has i_sem) loop0 pdflush kjournald cfs_async (waiting for i_sem) ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2007-10-12 07:36 Message: Logged In: YES user_id=166336 Originator: YES Here's some debugging. I've got to the point where the "restore" process on node 1 seems hung. On node 2 I try an "onnode 1 pwd". It hangs. One node 1: Entering kdb (current=0xc0502bc0, pid 0) on processor 0 due to Keyboard Entry [0]kdb> ps 1 idle process (state I) and 50 sleeping system daemon (state M) processes suppressed Task Addr Pid Parent [*] cpu State Thread Command 0xcf82a5d0 5 2 0 0 R 0xcf82a7b0 events/0 0xcf68b990 117 11 0 0 D 0xcf68bb70 pdflush 0xcf68a310 121 2 0 0 D 0xcf68a4f0 cfs_async 0xcf6b99b0 122 2 0 0 D 0xcf6b9b90 cfs_async 0xcf6b9410 123 2 0 0 D 0xcf6b95f0 cfs_async 0xcf6b8e70 124 2 0 0 D 0xcf6b9050 cfs_async 0xcf6b88d0 125 2 0 0 D 0xcf6b8ab0 cfs_async 0xcf6b8330 126 2 0 0 D 0xcf6b8510 cfs_async 0xcf6c99d0 127 2 0 0 D 0xcf6c9bb0 cfs_async 0xcf6c9430 128 2 0 0 D 0xcf6c9610 cfs_async 0xce92b730 1 0 0 0 D 0xce92b910 init [...] 0xce90b150 67763 2 0 0 D 0xce90b330 loop0 0xce90d170 67820 2 0 0 D 0xce90d350 kjournald 0xce8f96d0 67822 67636 0 0 S 0xce8f98b0 dump 0xce8f8b90 67823 67636 0 0 D 0xce8f8d70 restore 0xcf13f970 67824 67822 0 0 S 0xcf13fb50 dump 0xcf13f3d0 67825 67824 0 0 S 0xcf13f5b0 dump 0xcf7861f0 67826 67824 0 0 S 0xcf7863d0 dump 0xcf786790 67827 67824 0 0 S 0xcf786970 dump 0xcf47d9b0 132773 2 0 0 D 0xcf47db90 onnode [0]kdb> btp 132773 Stack traceback for pid 132773 0xcf47d9b0 132773 2 0 0 D 0xcf47db90 onnode EBP EIP Function (args) 0xce879ba8 0xc046c2e6 schedule+0x3a6 (0xce879c10) 0xce879bb4 0xc046d348 io_schedule+0x28 (0xc1271c70) 0xce879bc0 0xc014aed5 sync_page+0x45 (0xc10c37f8, 0x0, 0xc014ae90, 0xcf47d9b0, 0xce879c10) 0xce879be0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc10c37f8, 0xc10c37f8, 0x0, 0x0) 0xce879c3c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xce7c31a0, 0x0, 0x1) 0xce879cc4 0xc014beeb do_generic_mapping_read+0x3db (0xce88ca00, 0xce7c31f0, 0xce7c31a0, 0xce879e00, 0xce879d00) 0xce879d1c 0xc014c3ed __generic_file_aio_read+0x1ed (0xce879dc4, 0xce879d34, 0x1, 0xce879e00, 0xcf06d600) 0xce879d48 0xc014c473 generic_file_aio_read+0x53 (0xce879dc4, 0xcf06d600, 0x80, 0x0, 0x0) 0xce879d84 0xc028375a __cfs_file_read+0xaa (0xce879dc4, 0x0, 0xcf06d600, 0x80, 0xce879da0) 0xce879da8 0xc0283828 cfs_file_aio_read+0x38 (0xce879dc4, 0xcf06d600, 0x80, 0x0, 0x0) 0xce879e50 0xc016c3b3 do_sync_read+0xa3 (0xce7c31a0, 0xcf06d600, 0x80, 0xce879e8c, 0xce879000) 0xce879e74 0xc016c490 vfs_read+0xb0 (0xce7c31a0, 0xcf06d600, 0x80, 0xce879e8c, 0x0) 0xce879e9c 0xc017895a kernel_read+0x4a (0xce7c31a0, 0x0, 0xcf06d600, 0x80, 0xcf06d600) 0xce879ec0 0xc017946a prepare_binprm+0xca (0xcf06d600, 0x7fff, 0xc13b4080, 0x0, 0x0) 0xce879eec 0xc0179a16 ssi_do_execve+0x1a6 (0xcf012920, 0xce6f8800, 0xcf6aa400, 0xce879fa0, 0x0) 0xce879f78 0xc0245c3a rexecve_server+0xea (0xcf50e000, 0xcf47d9b0, 0xcf012920, 0xce6f8800, 0xcf6aa400) 0xce879fec 0xc02454f5 rexecve_server_setup+0x55 0xc01023a5 kernel_thread_helper+0x5 [0]kdb> btp 67823 Stack traceback for pid 67823 0xce8f8b90 67823 67636 0 0 D 0xce8f8d70 restore EBP EIP Function (args) 0xca2fcea0 0xc046c2e6 schedule+0x3a6 (0x0, 0xce8f8b90, 0xc013f0a0, 0xca2fced4, 0xca2fced4) 0xca2fcef4 0xc029f3ba cfs_wait_on_request+0x7a (0xc9a8c200, 0xca2fcf14, 0x0, 0x1, 0x0) 0xca2fcf24 0xc0285a9e cfs_wait_on_requests+0x8e (0xccb63be4, 0x0, 0x0, 0x0, 0xce7c3600) 0xca2fcf48 0xc0286f66 cfs_sync_inode+0x76 (0xccb63be4, 0x0, 0x0, 0x2, 0x0) 0xca2fcf80 0xc0283653 cfs_file_flush+0x93 (0xce7c3600, 0x81a4, 0xccdef200, 0x5, 0xccdef204) 0xca2fcf9c 0xc016bb3c filp_close+0x6c (0xce7c3600, 0xccdef200, 0xce7c3600, 0x5, 0x0) 0xca2fcfbc 0xc016bbce sys_close+0x6e 0xc0105a3b syscall_call+0x7 [0]kdb> [0]kdb> btp 67763 Stack traceback for pid 67763 0xce90b150 67763 2 0 0 D 0xce90b330 loop0 EBP EIP Function (args) 0xca488db8 0xc046c2e6 schedule+0x3a6 (0xca488e20) 0xca488dc4 0xc046d348 io_schedule+0x28 (0xc12711e0) 0xca488dd0 0xc014aed5 sync_page+0x45 (0xc11d6be0, 0x0, 0xc014ae90, 0xce90b150, 0xca488e20) 0xca488df0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc11d6be0, 0xc11d6be0, 0x0, 0x0) 0xca488e4c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xcd6ca600, 0x38002, 0x1) 0xca488ed4 0xc014beeb do_generic_mapping_read+0x3db (0xcb632f40, 0xcd6ca650, 0xcd6ca600, 0xca488f58, 0xca488ef4) 0xca488f04 0xc014c61b generic_file_sendfile+0x5b (0xcd6ca600, 0xca488f58, 0x1000, 0xd08f15d0, 0xca488f60) 0xca488f3c 0xc02838bd cfs_file_sendfile+0x8d (0xcd6ca600, 0xca488f58, 0x1000, 0xd08f15d0, 0xca488f60) 0xca488f74 0xd08f16fc [loop]do_lo_receive+0x5c (0xc9353000, 0xc4279630, 0x1000, 0x38002000, 0x0) 0xca488fa4 0xd08f176e [loop]lo_receive+0x5e (0xc9353000, 0xc1ed33e0, 0x1000, 0x38002000, 0x0) 0xca488fc8 0xd08f17eb [loop]do_bio_filebacked+0x4b (0xc9353000, 0xc1ed33e0, 0x0, 0xc9353138, 0xd08f1a60) 0xca488fec 0xd08f1b3b [loop]loop_thread+0xdb 0xc01023a5 kernel_thread_helper+0x5 ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-10-11 22:21 Message: Logged In: YES user_id=1246761 Originator: NO Sounds like [ 686748 ] Filesystem stacking deadlock. ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2007-10-11 08:22 Message: Logged In: YES user_id=166336 Originator: YES This is with the 2.6.11 kernel ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1811510&group_id=32541 |