Re: [Jfs-discussion] Strange problem with JFS
Brought to you by:
blaschke-oss,
shaggyk
From: Sonny R. <so...@bu...> - 2004-12-20 12:48:55
|
On Mon, Dec 20, 2004 at 03:20:58AM -0500, Sonny Rao wrote: > On Sat, Dec 18, 2004 at 02:49:38PM -0600, Jon Nelson wrote: > > > > > > I should note that if I crack open another terminal and strace the find, > > this is what I get: > > > > > > open(".", O_RDONLY|O_LARGEFILE) = 5 > > fchdir(5) = 0 > > lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 > > lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 > > chdir(".") = 0 > > lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 > > lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 > > fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 5), ...}) = 0 > > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > > 0) = 0x40252000 > > write(1, ".\n", 2. > > ) = 2 > > open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 6 > > fstat64(6, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 > > fcntl64(6, F_SETFD, FD_CLOEXEC) = 0 > > getdents(6, > > ^^^^^ hangs here. > > Yeah, there appears to be a deadlock somewhere in the directory > handling code. I find that it's not always reproducible on my laptop > depending on how much memory is free, other stuff going on, etc. But > I was able to trigger it once by simply running fstest and a find > concurrently, then killing fstest and running ls on the directory > where fstest was running. This message is mostly for Shaggy, but here it is anyway (kinda long): Ok I have some suspicious looking backtraces now where some threads are sleeping in jfs_unlink (which holds the directory i_sem I think) which hold the i_sem (from sys_mkdir). Overall of the 100 fstest threads, here's numbers for what syscall each was running: unlink - 4 pwrite64 - 24 close - 7 open/create - 3 mkdir - 1 unknown - 60 (error path, blocked on namei semaphore, probably JFS namei.c ? ) Here's some more backtraces: pdflusher writing back inodes: pdflush D C0348640 0 59 3 61 58 (L-TLB) c1717e2c 00000046 dfeb0b90 c0348640 c1717e30 dcaf8a44 c1717e20 00000000 00000010 c1717f34 db127260 00000000 e19f81c0 000f424a dfeb0d38 dcaf8800 00000282 c1716000 dfeb0b90 c026d7d7 dcaf8808 00000001 dfeb0b90 c0118714 Call Trace: [<c026d7d7>] __down+0x8b/0xfd [<c0118714>] default_wake_function+0x0/0x12 [<c026d984>] __down_failed+0x8/0xc [<e096360a>] .text.lock.inode+0x5/0x9b [jfs] [<e0962e4c>] jfs_write_inode+0x4c/0x60 [jfs] [<c0170e43>] write_inode+0x46/0x48 [<c017102b>] __sync_single_inode+0x1e6/0x202 [<c0171263>] sync_sb_inodes+0x169/0x282 [<c0171462>] writeback_inodes+0xe6/0xf2 [<c01393c8>] wb_kupdate+0x99/0x10e [<c0139d77>] __pdflush+0xcb/0x1c4 [<c0139e70>] pdflush+0x0/0x2c [<c0139e98>] pdflush+0x28/0x2c [<c013932f>] wb_kupdate+0x0/0x10e [<c0139e70>] pdflush+0x0/0x2c [<c012d579>] kthread+0xa5/0xab [<c012d4d4>] kthread+0x0/0xab [<c010425d>] kernel_thread_helper+0x5/0xb jfs kernel daemon threads: jfsIO S C0348640 0 1801 1 1802 1795 (L-TLB) deeadf90 00000046 dfaed770 c0348640 dfb619c0 00000050 00000000 df2cdd40 df2cdd40 df75d5c0 d809d780 00000000 0e3514c0 000f424b dfaed918 deeac000 deeac000 deeadfc0 ffffe000 e097faa5 df2cdd40 dfaed770 00000000 dfaed770 Call Trace: [<e097faa5>] jfsIOWait+0x125/0x160 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0105d26>] ret_from_fork+0x6/0x14 [<c0118714>] default_wake_function+0x0/0x12 [<e097f980>] jfsIOWait+0x0/0x160 [jfs] [<c010425d>] kernel_thread_helper+0x5/0xb jfsCommit D C0348640 0 1802 1 1803 1801 (L-TLB) de429ee0 00000046 dee18800 c0348640 df7dc000 dec76234 00004000 e09357e4 de428000 00270284 dbe060f0 00000000 f6e6e500 000f424a dee189a8 dec7607c de429ef4 dee18800 dec76234 c026e3cd deeaa000 00001bcc dec76080 dec76080 Call Trace: [<c026e3cd>] rwsem_down_read_failed+0x8f/0x17c [<e097085d>] .text.lock.jfs_imap+0x3ff/0x422 [jfs] [<e09824de>] txUpdateMap+0xae/0x250 [jfs] [<e09808d7>] txEnd+0xd7/0x140 [jfs] [<e0982d00>] txLazyCommit+0x20/0xe0 [jfs] [<e0982f78>] jfs_lazycommit+0x1b8/0x1e0 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0105d26>] ret_from_fork+0x6/0x14 [<c0118714>] default_wake_function+0x0/0x12 [<e0982dc0>] jfs_lazycommit+0x0/0x1e0 [jfs] [<c010425d>] kernel_thread_helper+0x5/0xb jfsSync D C0348640 0 1803 1 2174 1802 (L-TLB) de94bdd8 00000046 dee198b0 c0348640 def1d72c 00000008 c01f1c3a def1d72c d808f600 000f424b dbe060f0 00000000 12707480 000f424b dee19a58 de94a000 de94bde8 c0118714 de94be08 e097fe87 00000000 00000000 00000000 dee198b0 Call Trace: [<c01f1c3a>] generic_make_request+0x15e/0x1de [<c0118714>] default_wake_function+0x0/0x12 [<e097fe87>] txLockAlloc+0xa7/0x160 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0118714>] default_wake_function+0x0/0x12 [<e0980bb5>] txLock+0x275/0x470 [jfs] [<e097f5b1>] lbmStartIO+0xb1/0xe0 [jfs] [<e097f46f>] lbmWrite+0x10f/0x150 [jfs] [<e097c99e>] __get_metapage+0x5e/0x3d0 [jfs] [<e097dcbd>] lmGCwrite+0xdd/0xf0 [jfs] [<e096ce50>] diWrite+0x190/0x5f0 [jfs] [<e0981473>] txCommit+0x1b3/0x320 [jfs] [<e098083b>] txEnd+0x3b/0x140 [jfs] [<e098351f>] jfs_sync+0x21f/0x2c0 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0105d26>] ret_from_fork+0x6/0x14 [<c0118714>] default_wake_function+0x0/0x12 [<e0983300>] jfs_sync+0x0/0x2c0 [jfs] [<c010425d>] kernel_thread_helper+0x5/0xb (some of the 100) fstest threads: fstest D C0348640 0 4565 4561 4566 4564 (NOTLB) db6a7e8c 00000082 d9acfa30 c0348640 da11a6e0 c0167a32 da11a6e0 c0356cb0 db6a7eac 00000000 dbe07730 00000000 d8255c00 000f424a d9acfbd8 db6a6000 00000000 db6a7e9c db6a7f70 e098061f df75d5c0 db6a6000 00000000 d9acfa30 Call Trace: [<c0167a32>] dput+0x22/0x212 [<e098061f>] txBegin+0x36f/0x3a0 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0118714>] default_wake_function+0x0/0x12 [<e0963f4a>] jfs_unlink+0x6a/0x420 [jfs] [<c01609a1>] vfs_unlink+0x102/0x1d5 [<c0160b3c>] sys_unlink+0xc8/0x13a [<c0105dfd>] sysenter_past_esp+0x52/0x71 fstest D C0348640 0 4567 4561 4568 4566 (NOTLB) dc447f14 00000082 d9ace3f0 c0348640 fffffff5 dce97c68 c015eeca dce97c68 00000001 dc447f70 d9912230 000f4240 753f0940 000f4242 d9ace598 dce97cd0 00000292 dc446000 d9ace3f0 c026d7d7 dce97cd8 00000001 d9ace3f0 c0118714 Call Trace: [<c015eeca>] link_path_walk+0xccf/0xdb2 [<c026d7d7>] __down+0x8b/0xfd [<c0118714>] default_wake_function+0x0/0x12 [<c026d984>] __down_failed+0x8/0xc [<c0162261>] .text.lock.namei+0x109/0x168 [<c01502da>] filp_close+0x59/0x86 [<c015036a>] sys_close+0x63/0x96 [<c0105dfd>] sysenter_past_esp+0x52/0x71 fstest D C0348640 0 4572 4561 4573 4571 (NOTLB) da275c00 00000082 d9940d10 c0348640 da094c68 e0966a0e da094c68 00000000 00000000 da275c08 d9978640 00000000 e0314bc0 000f424a d9940eb8 da275c08 df75d5c0 da274000 df75d500 e09807b4 00000000 d9940d10 c0118714 db19bc14 Call Trace: [<e0966a0e>] xtLookup+0xde/0x1e0 [jfs] [<e09807b4>] txBeginAnon+0x164/0x1b0 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0118714>] default_wake_function+0x0/0x12 [<e097be66>] extAlloc+0x46/0x3d0 [jfs] [<c011875e>] __wake_up_common+0x38/0x57 [<c011880e>] __wake_up_locked+0x2a/0x2e [<e09631a5>] jfs_get_blocks+0x2a5/0x330 [jfs] [<e0963268>] jfs_get_block+0x38/0x40 [jfs] [<c0154d24>] nobh_prepare_write+0xed/0x3eb [<c0137ee4>] buffered_rmqueue+0xf8/0x1c5 [<c019d013>] radix_tree_node_alloc+0x1f/0x5a [<c019d2a1>] radix_tree_insert+0xe5/0x105 [<c0133b82>] add_to_page_cache+0x69/0xaf [<c0135bfa>] generic_file_aio_write_nolock+0x3af/0xb3e [<e0963230>] jfs_get_block+0x0/0x40 [jfs] [<e0963874>] jfs_create+0x1d4/0x2a0 [jfs] [<c01363fd>] generic_file_write_nolock+0x74/0x8c [<c015f731>] may_open+0x59/0x253 [<c014ff6b>] dentry_open+0x109/0x222 [<c013650a>] generic_file_write+0x55/0x6c [<c01364b5>] generic_file_write+0x0/0x6c [<c0150ca1>] vfs_write+0xd0/0x135 [<c0150f1a>] sys_pwrite64+0x88/0x8c [<c0105dfd>] sysenter_past_esp+0x52/0x71 fstest D C0348640 0 4579 4561 4580 4578 (NOTLB) dc12dcac 00000086 db126cd0 c0348640 dee7f1c0 dc12dcc0 c026dc40 db1261b0 c0348640 00000001 00000000 00000000 04225880 000f424b db126e78 dc12c000 debfc754 debfc784 dc12dcd4 e097c77e 00000000 db126cd0 c0118714 00000000 Call Trace: [<c026dc40>] schedule+0x294/0x485 [<e097c77e>] __lock_metapage+0x8e/0xd0 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0118714>] default_wake_function+0x0/0x12 [<c0118714>] default_wake_function+0x0/0x12 [<e097ca69>] __get_metapage+0x129/0x3d0 [jfs] [<e097cedc>] release_metapage+0x7c/0x320 [jfs] [<e096ce04>] diWrite+0x144/0x5f0 [jfs] [<e0981473>] txCommit+0x1b3/0x320 [jfs] [<e096b02f>] xtInitRoot+0x2f/0x80 [jfs] [<e096392b>] jfs_create+0x28b/0x2a0 [jfs] [<c015dce7>] permission+0x67/0x79 [<c015dce7>] permission+0x67/0x79 [<c015f64b>] vfs_create+0xb9/0x146 [<c015feba>] open_namei+0x58f/0x5de [<c014fe3c>] filp_open+0x3e/0x64 [<c01500bb>] get_unused_fd+0x37/0xd9 [<c015022e>] sys_open+0x5b/0x8b [<c0105dfd>] sysenter_past_esp+0x52/0x71 fstest D C0348640 0 4577 4561 4578 4576 (NOTLB) d991dc64 00000086 db1261b0 c0348640 d991dc54 00000282 da053608 00000003 00000001 00000000 db126cd0 00000000 12707480 000f424b db126358 d991c000 d991dc74 c0118714 d991dc94 e097fe87 dbe060f0 db1261b0 00000000 db1261b0 Call Trace: [<c0118714>] default_wake_function+0x0/0x12 [<e097fe87>] txLockAlloc+0xa7/0x160 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<c0118714>] default_wake_function+0x0/0x12 [<e0980bb5>] txLock+0x275/0x470 [jfs] [<c0118714>] default_wake_function+0x0/0x12 [<e097c99e>] __get_metapage+0x5e/0x3d0 [jfs] [<e097cedc>] release_metapage+0x7c/0x320 [jfs] [<e096ce50>] diWrite+0x190/0x5f0 [jfs] [<e0981473>] txCommit+0x1b3/0x320 [jfs] [<e096b02f>] xtInitRoot+0x2f/0x80 [jfs] [<e096392b>] jfs_create+0x28b/0x2a0 [jfs] [<c015dce7>] permission+0x67/0x79 [<c015dce7>] permission+0x67/0x79 [<c015f64b>] vfs_create+0xb9/0x146 [<c015feba>] open_namei+0x58f/0x5de [<c014fe3c>] filp_open+0x3e/0x64 [<c01500bb>] get_unused_fd+0x37/0xd9 [<c015022e>] sys_open+0x5b/0x8b [<c0105dfd>] sysenter_past_esp+0x52/0x71 fstest D C0348640 0 4598 4561 4599 4597 (NOTLB) dbc11d88 00000086 dbd57770 c0348640 00000000 00000000 00000000 00000000 00000000 df1ed83c d9912d50 00000000 d8dc7700 000f424a dbd57918 df7dd244 00000286 dbc10000 dbd57770 c026d7d7 df7dd24c 00000001 dbd57770 c0118714 Call Trace: [<c026d7d7>] __down+0x8b/0xfd [<c0118714>] default_wake_function+0x0/0x12 [<c026d984>] __down_failed+0x8/0xc [<e0970663>] .text.lock.jfs_imap+0x205/0x422 [jfs] [<c0169722>] alloc_inode+0xf2/0x19b [<e097bccb>] ialloc+0x5b/0x1b0 [jfs] [<e096399d>] jfs_mkdir+0x5d/0x2f0 [jfs] [<c0117c03>] recalc_task_prio+0x93/0x188 [<c015dce7>] permission+0x67/0x79 [<c015eeca>] link_path_walk+0xccf/0xdb2 [<c015de3d>] cached_lookup+0x23/0x85 [<c015dce7>] permission+0x67/0x79 [<c0160379>] vfs_mkdir+0xad/0x137 [<c01604ba>] sys_mkdir+0xb7/0xf6 [<c01181fd>] schedule_tail+0x41/0x4d [<c0105dfd>] sysenter_past_esp+0x52/0x71 fstest D C0348640 0 4599 4561 4600 4598 (NOTLB) dc31bf14 00000086 dbd571e0 c0348640 fffffff5 dce97c68 c015eeca dce97c68 00000001 dc31bf70 d9940780 00000000 d8dc7700 000f424a dbd57388 dce97cd0 00000282 dc31a000 dbd571e0 c026d7d7 dce97cd8 00000001 dbd571e0 c0118714 Call Trace: [<c015eeca>] link_path_walk+0xccf/0xdb2 [<c026d7d7>] __down+0x8b/0xfd [<c0118714>] default_wake_function+0x0/0x12 [<c026d984>] __down_failed+0x8/0xc [<c0162261>] .text.lock.namei+0x109/0x168 [<c01068f1>] error_code+0x2d/0x38 [<c01181fd>] schedule_tail+0x41/0x4d [<c0105dfd>] sysenter_past_esp+0x52/0x71 |