[SSI-devel] Re: [SSI-users] hangs on write()
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2005-09-18 14:01:14
|
Further a lot of processes at the client are waiting at=20 cfs_wait_on_request() but not necessarily doing generic_file_aio_read(). It= =20 looks like the server was overwhelmed because processes at the server also= =20 became unresponsive and load average goes up on both sides. I won't know=20 which processes were stuck on the server until later. I was able to trigger= =20 this with the cp command instead of just mc alone - by doing parallel cp's= =20 to a remote CFS filesystem and starting localview top. I haven't been able= =20 to reproduce this by doing a single massive cp job on the cluster. Roger On 9/17/05, Roger Tsang <rog...@gm...> wrote: >=20 > Alright I can reproduce this by doing the large file copy. It gets stuck= =20 > here... >=20 > Stack traceback for pid 136777 > 0xc57f3a80 136777 136733 0 0 D 0xc57f3c40 mc > EBP EIP Function (args) > 0xd8e01cc0 0xc03b4103 schedule+0x2b3 > 0xd8e01cc8 0xc03b462e io_schedule+0xe (0xc15001b0) > 0xd8e01cd4 0xc0136745 sync_page+0x35 (0xc1251ea8, 0x0, 0xc0136710,=20 > 0xc57f3a80, 0xd8e01d24) > 0xd8e01cf4 0xc03b49a9 __wait_on_bit_lock+0x49 (0x2, 0xc1251ea8,=20 > 0xc1251ea8, 0x0, 0x0) > 0xd8e01d50 0xc0136f5a __lock_page+0x8a (0xd666c8a0, 0x1d838, 0xda0e96a0,= =20 > 0x1d838, 0x2) > 0xd8e01de8 0xc013764b do_generic_mapping_read+0x3db (0xd666c8a0,=20 > 0xda0e96e8, 0xda0e96a0, 0xd8e01f14, 0xd8e01e1c) > 0xd8e01e38 0xc0137b24 __generic_file_aio_read+0x194 (0xd8e01ed8,=20 > 0xd8e01e50, 0x1, 0xd8e01f14, 0x8135998) > 0xd8e01e64 0xc0137be2 generic_file_aio_read+0x52 (0xd8e01ed8, 0x8135998,= =20 > 0x2000, 0x1d838000, 0x0) > 0xd8e01ea0 0xc0268f40 __cfs_file_read+0xc0 (0xd8e01ed8, 0x0, 0x8135998,= =20 > 0x2000, 0xd8e01ed0) > 0xd8e01ebc 0xc0268ffe cfs_file_aio_read+0x2e (0xd8e01ed8, 0x8135998,=20 > 0x2000, 0x1d838000, 0x0) > 0xd8e01f64 0xc015561b do_sync_read+0xab (0xda0e96a0, 0x8135998, 0x2000,= =20 > 0xd8e01fa8, 0x0) > 0xd8e01f90 0xc0155758 vfs_read+0xe8 (0xda0e96a0, 0x8135998, 0x2000,=20 > 0xd8e01fa8, 0x1d838000) > 0xd8e01fbc 0xc0155a1b sys_read+0x4b > 0xc0103c55 sysenter_past_esp+0x52 >=20 > On 9/17/05, Roger Tsang <rog...@gm...> wrote:=20 > >=20 > > Okay I ran into this hang just a moment ago while copying a very large= =20 > > file from node 2 to the init node. It hangs at the very end of the file= .=20 > > Then if I do "sync" as you have suggested, the copy completes. I guess = next=20 > > time I see this I'll do a backtrace on the copy process. My guess is it= 's=20 > > probably waiting in CFS wait_for_congestion(). > >=20 > > Have you tried a different IO scheduler? Try deadline if you were using= =20 > > cfq. > >=20 > > Roger > >=20 > >=20 > > On 8/25/05, John Byrne <joh...@hp... > wrote: > > >=20 > > > Andy Phillips wrote: > > > > Following on; > > > > > > > > It appears that if I remount the file system with the > > > > "sync" option then this problem goes away. But performance > > > > is bad. Shutting down the other node in the cluster does=20 > > > > not seem to affect this at all. > > > > > > > > Would SSI or the CFS cause issues with async i/o? Would > > > > that follow a different path to a normal kernel? > > > > > > > > Andy > > > > > > >=20 > > > There can certainly be bugs and I do note that your hanging is rather= =20 > > > large. Maybe that is the cause of the problem. Maybe you could make a > > > simple test case with 256k writes and see if that hangs. > > >=20 > > > John > > >=20 > > >=20 > > > ------------------------------------------------------- > > > SF.Net email is Sponsored by the Better Software Conference & EXPO > > > September 19-22, 2005 * San Francisco, CA * Development Lifecycle=20 > > > Practices > > > Agile & Plan-Driven Development * Managing Projects & Teams * Testing= =20 > > > & QA=20 > > > Security * Process Improvement & Measurement *=20 > > > http://www.sqe.com/bsce5sf > > > _______________________________________________=20 > > > Ssic-linux-users mailing list > > > Ssi...@li... > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users=20 > > >=20 > >=20 > >=20 > |