[SSI-devel] CFS i_sem deadlock with do_truncate() - Still needs to be fixed
Brought to you by:
brucewalker,
rogertsang
From: Vladimir R. <one...@gm...> - 2006-05-12 21:19:30
|
A race condition between do_truncate() and cfs_setattr() for down(inode->i_sem) was discussed in November 2005, and was fixed on 2005/10/20 with rev. 1.37 of cluster/ssi/cfs/inode.c. The fix in cfs_setattr() function was simple, strightforward and naive - the two calls for up(&inode->i_sem) and down(&inode->i_sem) were just commented out. That fixes the deadlock between do_truncate() and cfs_setattr() processes fighting for inode->i_sem and inode->i_alloc_sem semaphores, but re-creates the original problem with a race for inode->i_sem and cfs token. With a very simple test - two processes on a client node are writing into the same file which is cfs-mounted on a master node - I was able to created the deadlock situation: d1 process holds the cfs token, and waits on inode->i_sem d2 process owns inode->i_sem and waits for the cfs token That situation is very bad, because the cfs token got stucked on a node which runs d1 and d2, and all other nodes might be effected by that. What's the proper way to address the issue? Thanks Vladimir ---------------------------------------------------------------------------= -------------------------------------------------------- Stack traceback for pid 136918 0xdfd30cb0 136918 136874 0 2 D 0xdfd30e70 d1 EBP EIP Function (args) 0xf5f8bdd4 0xc03de58c schedule+0x3fc (0xdfd30cb0, 0x1, 0xdfd30cb0, 0xc0121e00, 0xf5a1db30) 0xf5f8be0c 0xc03ddf75 __down+0x75 (0x0, 0x0) 0xf5f8be1c 0xc03de126 __down_failed+0xa (0xc04039c8, 0x79, 0x0, 0x0, 0x7fff= fffe) 0xc0152799 .text.lock.filemap+0x22 0xf5f8be5c 0xc0151fef generic_file_aio_write+0x6f (0xf5f8bed8, 0x401ed000, 0x2f, 0x0, 0x0) 0xf5f8be98 0xc027d22e __cfs_file_write+0xce (0xf5f8bed8, 0x0, 0x401ed000, 0x2f, 0xf5f8beb4) 0xf5f8bebc 0xc027d318 cfs_file_aio_write+0x38 (0xf5f8bed8, 0x401ed000, 0x2f, 0x0, 0x0) 0xf5f8bf64 0xc0170749 do_sync_write+0xa9 (0xf6af7780, 0x401ed000, 0x2f, 0xf5f8bfa0, 0xf6af7780) 0xf5f8bf88 0xc0170850 vfs_write+0xb0 (0xf6af7780, 0x401ed000, 0x2f, 0xf5f8bfa0, 0x0) 0xf5f8bfb4 0xc017099b sys_write+0x4b 0xc0106ae1 syscall_call+0x7 ---------------------------------------------------------------------------= ------------------------------------------------------- Stack traceback for pid 138576 0xf668eeb0 138576 137812 0 1 S 0xf668f070 d2 EBP EIP Function (args) 0xf602bd54 0xc03de58c schedule+0x3fc (0x3218f60, 0xdfd02ed4, 0x0, 0x0, 0xdfd02ecc) 0xf602bd8c 0xc027411a tok_wait+0xba (0xf5a1d9fc, 0xdfd02ec0, 0x0, 0x2, 0x2) 0xf602be0c 0xc028c94e cfstok_req+0x20e (0xf5a1daa4, 0x1, 0x4, 0x3, 0x0) 0xf602be54 0xc027bbb8 cfs_setattr+0x58 (0xf5eaccb0, 0xf602be8c, 0xf5a1daa4, 0x4464b83e, 0x0) 0xf602be7c 0xc0190730 notify_change+0x280 (0xf5eaccb0, 0xf602be8c, 0x48, 0xf5a1daa4, 0x1) 0xf602bed8 0xc016e4b0 do_truncate+0x70 (0xf5eaccb0, 0x0, 0x0, 0xf5eaccb0, 0xffffffeb) 0xf602befc 0xc0182e31 may_open+0x1f1 (0xf602bf50, 0x2, 0x8242, 0xc04d6080, 0xf5eaccb0) 0xf602bf38 0xc01833d9 open_namei+0x589 (0xdfbf7000, 0x8242, 0x1b6, 0xf602bf50, 0xf5eaccb0) 0xf602bf94 0xc016f75a filp_open+0x3a (0xdfbf7000, 0x8241, 0x1b6, 0x80c8610, 0x8241) 0xf602bfb4 0xc016fb85 sys_open+0x55 0xc0106ae1 syscall_call+0x7 ---------------------------------------------------------------------------= ---------------------------------------------------- |