I think I finally fixed this.  cross-fingers.  I actually noticed that there is no livelock.  During high IO writeout to unstable and/or chard, cfs async write skips these inodes at the server.  So the unstable list gets flooded at the server and a lot of processes spend time in balance_dirty_pages().


On 10/20/05, Roger Tsang <roger.tsang@gmail.com> wrote:
btw background_writeout() code path also leads to sync_sb_inodes()...  We have to fix sync_sb_inodes().


On 10/20/05, Roger Tsang < roger.tsang@gmail.com> wrote:
Apparently the current workaround is insufficient.  I just hit this again.  This also has to do with cfs_async doing balance_dirty_pages() which eventually calls sync_sb_inodes() and the livelock avoidance code interfering - sync_sb_inodes() just quits without writing the inode.  pdflush is not enough so eventually the system load goes way up.


On 10/16/05, Roger Tsang <roger.tsang@gmail.com > wrote:
For now I'll just going along with my hunch that OpenSSI can break the vanilla livelock avoidance code in sync_sb_inodes() and the counters never get incremented in balance_dirty_pages() when this happens - sync_sb_inodes() quits.  So my workaround is leave those pages for background writeout.


On 10/15/05, Roger Tsang <roger.tsang@gmail.com > wrote:

In sync_sb_inodes() there is livelock avoidance code based on local node's jiffies.  Perhaps there is a race with CFS section in writeback_inodes() when balance_dirty_pages() and background_writeout() or wb_kupdate() are running simultaneously on different nodes.


On 10/15/05, Roger Tsang <roger.tsang@gmail.com > wrote:
* CFS hang while remote copying very large files. (believed to be fixed)

This one just hit again.  I think this has to with CFS writeback somewhere.  A lot of processes with the same backtrace stuck waiting for IO.  The condition is cleared by doing a `sync` on the node that is experiencing this.  Similar to what Andy Philips originally reported with getting stuck while doing large writes.

These processes were stuck in an infinite for loop in balance_dirty_pages().  It means the if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh) test always fails.  This could happen when !nr_reclaimable and nr_writeback > dirty_thresh.  Doing a `sync` fixes the problem because it flushes the entire writeback.  Any more ideas?


0xd8c6c550   160716   160712  0    0   D  0xd8c6c710  httpd
EBP        EIP        Function (args)
0xd802bbb8 0xc03b6b23 schedule+0x2b3 (0xd802bbcc, 0x268b84b3, 0xc011fb15, 0xc0617c40, 0xcc9bfbcc)
0xd802bbf4 0xc03b7126 schedule_timeout+0x76
0xd802bbfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xd8c6c550, 0xc01305f0, 0xd802bc34)
0xd802bc54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, 0xd802bc74, 0xc04afb24, 0x1)
0xd802bcc4 0xc013f593 balance_dirty_pages+0x93 (0xd8508660)
0xd802bcd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 (0xd8508660, 0xc1499030, 0xcd3, 0x81, 0xd54)
0xd802bd80 0xc013bc1e generic_file_buffered_write+0x2ee (0xd802bed8, 0xd802be50, 0x1, 0xddbd54, 0x0)
0xd802bdf8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xd802bed8, 0xd802be50, 0x1, 0xd802bf14, 0xd8508548)
0xd802be2c 0xc013c4d8 generic_file_aio_write_nolock+0x48 (0xd802bed8, 0xd802be50, 0x1, 0xd802bf14, 0x0)
0xd802be64 0xc013c715 generic_file_aio_write+0x75 (0xd802bed8, 0xbfffe7e0, 0x81, 0xddbcd3, 0x0)
0xd802bea0 0xc026c28a __cfs_file_write+0xda (0xd802bed8, 0x0, 0xbfffe7e0, 0x81, 0xd802bed0)
0xd802bebc 0xc026c33e cfs_file_aio_write+0x2e (0xd802bed8, 0xbfffe7e0, 0x81, 0xddbcd3, 0x0)
0xd802bf64 0xc015854b do_sync_write+0xab (0xdcf4e920, 0xbfffe7e0, 0x81, 0xd802bfa8, 0x558004)
0xd802bf90 0xc0158688 vfs_write+0xe8 (0xdcf4e920, 0xbfffe7e0, 0x81, 0xd802bfa8, 0xddbcd3)
0xd802bfbc 0xc015879b sys_write+0x4b
           0xc0103c65 sysenter_past_esp+0x52

0xc8237550   133910   133897  0    0   D  0xc8237710  ndbd
EBP        EIP        Function (args)
0xc81d3bb8 0xc03b6b23 schedule+0x2b3 (0xc81d3bcc, 0x268aeeb5, 0x1, 0xc8de7bcc, 0xcaac3bcc)
0xc81d3bf4 0xc03b7126 schedule_timeout+0x76
0xc81d3bfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xc8237550, 0xc01305f0, 0xc81d3c34)
0xc81d3c54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, 0xc81d3c74, 0xdf8f7684, 0x2e)
0xc81d3cc4 0xc013f593 balance_dirty_pages+0x93 (0xdebe61e0)
0xc81d3cd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 (0xdebe61e0, 0xc13540d0, 0x0, 0x1000, 0x1000)
0xc81d3d80 0xc013bc1e generic_file_buffered_write+0x2ee (0xc81d3ed8, 0xc81d3e50, 0x1, 0x6c6000, 0x0)
0xc81d3df8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xc81d3ed8, 0xc81d3e50, 0x1, 0xc81d3f14, 0xdebe60c8)
0xc81d3e2c 0xc013c4d8 generic_file_aio_write_nolock+0x48 (0xc81d3ed8, 0xc81d3e50, 0x1, 0xc81d3f14, 0x0)
0xc81d3e64 0xc013c715 generic_file_aio_write+0x75 (0xc81d3ed8, 0xa768a008, 0x8000, 0x6c0000, 0x0)
0xc81d3ea0 0xc026c28a __cfs_file_write+0xda (0xc81d3ed8, 0x0, 0xa768a008, 0x8000, 0xc81d3ed0)
0xc81d3ebc 0xc026c33e cfs_file_aio_write+0x2e (0xc81d3ed8, 0xa768a008, 0x8000, 0x6c0000, 0x0)
0xc81d3f64 0xc015854b do_sync_write+0xab (0x0, 0xc8237550, 0x0, 0x0, 0x6c0000)
           0xc0363700 ip_rcv_finish (0xd446cce0, 0xa768a008, 0x8000, 0xc81d3fa8, 0x8)
           0xc0158688 vfs_write+0xe8 (0xd446cce0, 0xa768a008, 0x8000, 0xc81d3fa8, 0x6c0000)
0xc81d3fbc 0xc015879b sys_write+0x4b
           0xc0103c65 sysenter_past_esp+0x52

0xdfeaeaa0   133079        1  0    0   D  0xdfeaec60  ntpd
EBP        EIP        Function (args)
0xdcdd9bb8 0xc03b6b23 schedule+0x2b3 (0xdcdd9bcc, 0x26868821, 0xdcdd9c28, 0xdc4bbbcc, 0xdd541b5c)
0xdcdd9bf4 0xc03b7126 schedule_timeout+0x76
0xdcdd9bfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xdfeaeaa0, 0xc01305f0, 0xdcdd9c34)
0xdcdd9c54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, 0xdcdd9c74, 0xc04afb24, 0x5)
0xdcdd9cc4 0xc013f593 balance_dirty_pages+0x93 (0xc6dd2660)
0xdcdd9cd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 (0xc6dd2660, 0xc10c6430, 0x0, 0x1, 0x1)
0xdcdd9d80 0xc013bc1e generic_file_buffered_write+0x2ee (0xdcdd9ed8, 0xdcdd9e50, 0x1, 0x1, 0x0)
0xdcdd9df8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xdcdd9ed8, 0xdcdd9e50, 0x1, 0xdcdd9f14, 0xc6dd2548)
0xdcdd9e2c 0xc013c4d8 generic_file_aio_write_nolock+0x48 (0xdcdd9ed8, 0xdcdd9e50, 0x1, 0xdcdd9f14, 0x0)
0xdcdd9e64 0xc013c715 generic_file_aio_write+0x75 (0xdcdd9ed8, 0x800fef0f, 0x1, 0x0, 0x0)
0xdcdd9ea0 0xc026c28a __cfs_file_write+0xda (0xdcdd9ed8, 0x0, 0x800fef0f, 0x1, 0xdcdd9ed0)
0xdcdd9ebc 0xc026c33e cfs_file_aio_write+0x2e (0xdcdd9ed8, 0x800fef0f, 0x1, 0x0, 0x0)
0xdcdd9f64 0xc015854b do_sync_write+0xab (0xdf517920, 0x800fef0f, 0x1, 0xdcdd9fa8, 0x26)
0xdcdd9f90 0xc0158688 vfs_write+0xe8 (0xdf517920, 0x800fef0f, 0x1, 0xdcdd9fa8, 0x0)
0xdcdd9fbc 0xc015879b sys_write+0x4b
           0xc0103c65 sysenter_past_esp+0x52