Re: [SSI-devel] cfs_wait - anyone else seeing this from CVS?
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2005-11-03 01:12:44
|
Hi John, Can you see if you can trigger the problem sooner with a lower vm.dirty_ratio? Tweak the other vm knobs in the kernel too and see what you get. Roger On 11/1/05, J.H. <war...@ea...> wrote: > > Roger, > > I did have slightly more success with my rsyncing: > > started the rsync on the local drives with a bwlimit of 500kbps, this > worked for a while and I decided to let it loose again and removed the > bandwidth limiting. It has been working, seemingly fine for quite a > while (read: probably about an hour or so of constant rsync traffic) > until I got some messages being tossed out on a terminal concerning > highmem and DMA, which eventually leads to rsync being killed off with > and "Out of Memory: Killed process <pid> (rsync)" > > The killed process seems a much happier thing than the write just > hanging, however I cannot seem to restart the rsync as it's erroring > out. (a reboot should clear that up) > > I'm including the messages from the killing of rsync below if they help > at all. > > - John > > -------- > from /var/log/messages > -------- > Nov 1 16:23:23 gir kernel: oom-killer: gfp_mask=3D0xd0 > Nov 1 16:23:23 gir kernel: DMA per-cpu: > Nov 1 16:23:23 gir kernel: cpu 0 hot: low 2, high 6, batch 1 > Nov 1 16:23:23 gir kernel: cpu 0 cold: low 0, high 2, batch 1 > Nov 1 16:23:23 gir kernel: Normal per-cpu: > Nov 1 16:23:23 gir kernel: cpu 0 hot: low 32, high 96, batch 16 > Nov 1 16:23:23 gir kernel: cpu 0 cold: low 0, high 32, batch 16 > Nov 1 16:23:23 gir kernel: HighMem per-cpu: > Nov 1 16:23:23 gir kernel: cpu 0 hot: low 14, high 42, batch 7 > Nov 1 16:23:23 gir kernel: cpu 0 cold: low 0, high 14, batch 7 > Nov 1 16:23:23 gir kernel: > Nov 1 16:23:23 gir kernel: Free pages: 7208kB (3444kB HighMem) > Nov 1 16:23:23 gir kernel: Active:25105 inactive:5754 dirty:22 > writeback:1 unstable:0 free:1802 slab:222371 mapped:24508 pagetables:215 > Nov 1 16:23:23 gir kernel: DMA free:68kB min:68kB low:84kB high:100kB > active:0kB inactive:4kB present:16384kB pages_scanned:1 > all_unreclaimable? yes > Nov 1 16:23:23 gir kernel: protections[]: 0 0 0 > Nov 1 16:23:23 gir kernel: Normal free:3696kB min:3756kB low:4692kB > high:5632kB active:280kB inactive:580kB present:901120kB > pages_scanned:722 all_unreclaimable? yes > Nov 1 16:23:23 gir kernel: protections[]: 0 0 0 > Nov 1 16:23:23 gir kernel: HighMem free:3444kB min:128kB low:160kB > high:192kB active:100140kB inactive:22432kB present:131008kB > pages_scanned:0 all_unreclaimable? no > Nov 1 16:23:23 gir kernel: protections[]: 0 0 0 > Nov 1 16:23:23 gir kernel: DMA: 1*4kB 2*8kB 1*16kB 1*32kB 0*64kB > 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 68kB > Nov 1 16:23:23 gir kernel: Normal: 0*4kB 68*8kB 29*16kB 0*32kB 0*64kB > 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB =3D 3696kB > Nov 1 16:23:23 gir kernel: HighMem: 459*4kB 187*8kB 3*16kB 0*32kB > 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 3444kB > Nov 1 16:23:23 gir kernel: Swap cache: add 18222, delete 17167, find > 2737/3394, race 0+0 > Nov 1 16:23:23 gir kernel: Out of Memory: Killed process 71294 > (dbus-daemon-1). > Nov 1 16:23:28 gir kernel: oom-killer: gfp_mask=3D0xd0 > Nov 1 16:23:28 gir kernel: DMA per-cpu: > Nov 1 16:23:28 gir kernel: cpu 0 hot: low 2, high 6, batch 1 > Nov 1 16:23:28 gir kernel: cpu 0 cold: low 0, high 2, batch 1 > Nov 1 16:23:28 gir kernel: Normal per-cpu: > Nov 1 16:23:28 gir kernel: cpu 0 hot: low 32, high 96, batch 16 > Nov 1 16:23:28 gir kernel: cpu 0 cold: low 0, high 32, batch 16 > Nov 1 16:23:28 gir kernel: HighMem per-cpu: > Nov 1 16:23:28 gir kernel: cpu 0 hot: low 14, high 42, batch 7 > Nov 1 16:23:28 gir kernel: cpu 0 cold: low 0, high 14, batch 7 > Nov 1 16:23:28 gir kernel: > Nov 1 16:23:28 gir kernel: Free pages: 4324kB (392kB HighMem) > Nov 1 16:23:28 gir kernel: Active:25088 inactive:6388 dirty:5 > writeback:2 unstable:0 free:1081 slab:222473 mapped:24511 pagetables:209 > Nov 1 16:23:28 gir kernel: DMA free:68kB min:68kB low:84kB high:100kB > active:0kB inactive:4kB present:16384kB pages_scanned:14 > all_unreclaimable? yes > Nov 1 16:23:29 gir kernel: protections[]: 0 0 0 > Nov 1 16:23:29 gir kernel: Normal free:3864kB min:3756kB low:4692kB > high:5632kB active:12kB inactive:324kB present:901120kB > pages_scanned:343 all_unreclaimable? yes > Nov 1 16:23:29 gir kernel: protections[]: 0 0 0 > Nov 1 16:23:29 gir kernel: HighMem free:392kB min:128kB low:160kB > high:192kB active:100340kB inactive:25224kB present:131008kB > pages_scanned:0 all_unreclaimable? no > Nov 1 16:23:29 gir kernel: protections[]: 0 0 0 > Nov 1 16:23:29 gir kernel: DMA: 1*4kB 2*8kB 1*16kB 1*32kB 0*64kB > 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 68kB > Nov 1 16:23:29 gir kernel: Normal: 12*4kB 75*8kB 33*16kB 0*32kB 0*64kB > 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB =3D 3864kB > Nov 1 16:23:29 gir kernel: HighMem: 50*4kB 10*8kB 3*16kB 0*32kB 1*64kB > 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 392kB > Nov 1 16:23:29 gir kernel: Swap cache: add 18301, delete 17181, find > 2753/3420, race 0+0 > Nov 1 16:23:29 gir kernel: Out of Memory: Killed process 72338 (rsync). > Nov 1 16:34:40 gir kernel: oom-killer: gfp_mask=3D0xd0 > Nov 1 16:34:40 gir kernel: DMA per-cpu: > Nov 1 16:34:40 gir kernel: cpu 0 hot: low 2, high 6, batch 1 > Nov 1 16:34:40 gir kernel: cpu 0 cold: low 0, high 2, batch 1 > Nov 1 16:34:40 gir kernel: Normal per-cpu: > Nov 1 16:34:40 gir kernel: cpu 0 hot: low 32, high 96, batch 16 > Nov 1 16:34:40 gir kernel: cpu 0 cold: low 0, high 32, batch 16 > Nov 1 16:34:40 gir kernel: HighMem per-cpu: > Nov 1 16:34:40 gir kernel: cpu 0 hot: low 14, high 42, batch 7 > Nov 1 16:34:40 gir kernel: cpu 0 cold: low 0, high 14, batch 7 > Nov 1 16:34:40 gir kernel: > Nov 1 16:34:40 gir kernel: Free pages: 21032kB (17220kB HighMem) > Nov 1 16:34:41 gir kernel: Active:20995 inactive:6523 dirty:5 > writeback:0 unstable:0 free:5258 slab:222283 mapped:20264 pagetables:176 > Nov 1 16:34:41 gir kernel: DMA free:68kB min:68kB low:84kB high:100kB > active:4kB inactive:0kB present:16384kB pages_scanned:1748 > all_unreclaimable? yes > Nov 1 16:34:41 gir kernel: protections[]: 0 0 0 > Nov 1 16:34:41 gir kernel: Normal free:3744kB min:3756kB low:4692kB > high:5632kB active:520kB inactive:816kB present:901120kB > pages_scanned:1266 all_unreclaimable? yes > Nov 1 16:34:41 gir kernel: protections[]: 0 0 0 > Nov 1 16:34:41 gir kernel: HighMem free:17220kB min:128kB low:160kB > high:192kB active:83456kB inactive:25276kB present:131008kB > pages_scanned:0 all_unreclaimable? no > Nov 1 16:34:41 gir kernel: protections[]: 0 0 0 > Nov 1 16:34:41 gir kernel: DMA: 1*4kB 2*8kB 1*16kB 1*32kB 0*64kB > 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 68kB > Nov 1 16:34:41 gir kernel: Normal: 0*4kB 54*8kB 39*16kB 0*32kB 0*64kB > 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB =3D 3744kB > Nov 1 16:34:41 gir kernel: HighMem: 3*4kB 5*8kB 1*16kB 0*32kB 110*64kB > 67*128kB 6*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 17220kB > Nov 1 16:34:41 gir kernel: Swap cache: add 18436, delete 17850, find > 2794/3482, race 0+0 > Nov 1 16:34:41 gir kernel: Out of Memory: Killed process 73408 (rsync). > > On Tue, 2005-11-01 at 17:35 -0500, Roger Tsang wrote: > > Hi, > > > > I think this could be an IO scheduler problem. If you don't see > > cfs_wait_xxx() anymore then that means the patch worked. :) > > blk_congestion_wait() is another problem. > > > > I'm dealing with blk_congestion_wait() on my AMD64 3000+ as well, but > > the stack trace shows it is stuck in balance_dirty_pages() during > > write. It seems the memory is congested with dirty pages that need to > > be written to disk before anything else can be written. It looks like > > the kernel can't cope with a large flood of write requests - > > particularly with filesizes larger than your physical memory. That's > > one possible explanation. > > > > I'm going to try lowering the nsc_daemons (cfs_async, icssvr_daemon, > > etc.) to non-realtime priorities and see if that helps. Next time you > > run into this can you do a localview ps -axo ... for processes stuck > > in IO? That way it will report any local kernel daemons stuck in IO > > too. > > > > Roger > > > > > > On 11/1/05, J.H. <war...@ea...> wrote: > > Roger, > > > > I've gotten the patch up and running, however the problem > > seems to have > > possibly gotten worse. Doing an rsync caused the process to > > end up in > > another D+ state waiting on blk_congestion_wait: > > > > ps -axo pid,fname,user,stat,wchan > > 69471 rsync root S+ ssidev_do_poll_nodes > > 69472 rsync root S+ ssidev_do_pool_nodes > > 69499 rsync root D+ blk_congestion_wait > > > > also the loads on the box have spun off and are currently at > > about 7.00 > > with the rsync as the only process (this is an athlon64 3000 > > box as well > > so it's a fair amount of processing) > > > > Also to note that with the patch things don't seem to recover > > after a > > few minutes, they just seem to sit there. > > > > - John > > > > On Thu, 2005-10-27 at 13:17 -0400, Roger Tsang wrote: > > > Hi, > > > > > > Try this patch against linux-ssi/ Let me know how well it > > works for > > > you I'll consider adding this into CVS. > > > > > > -Roger > > > > > > > > > On 10/26/05, J.H. <war...@ea...> wrote: > > > Upgraded to CVS (OPENSSI-FC) yesterday and am trying > > to tweak > > > the kernel > > > a little (mainly just recompiling it), and noticed > > that > > > untaring the > > > linux-2.6.10.tar.bz2 file was taking an inordinate > > amount of > > > time. > > > Doing some checking I noticed that it's (seemingly) > > getting > > > stuck in the > > > kernel inside of cfs_wait in a D+ state (from ps > > -alx). This > > > didn't > > > seem to be the case in 1.9.1. It eventually > > continues without > > > issue > > > (maybe a minute or so later) but I thought I'd see > > if anyone > > > else is > > > seeing this. > > > > > > - John > > > > > > > > > -----BEGIN PGP SIGNATURE----- > > > Version: GnuPG v1.4.2 (GNU/Linux) > > > > > > > > iD8DBQBDX/m9/E3kyWU9dicRAhgrAJ9xiewC1HyJ1qIj7jDJ2CUgOsjb5wCgghK3 > > > pkVaKHerj/bzc24yStaY8gw=3D > > > =3D7v2Z > > > -----END PGP SIGNATURE----- > > > > > > > > > > > > > > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2 (GNU/Linux) > > > > iD8DBQBDZ/D > > +/E3kyWU9dicRAmsQAJ9pxlPheqQSEodoyTwz7FsAZcYNdgCfUQyJ > > w/eBXcGAplxISWp7OLhDCeI=3D > > =3DXlKI > > -----END PGP SIGNATURE----- > > > > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2 (GNU/Linux) > > iD8DBQBDaBm0/E3kyWU9dicRAupbAJ4tFDQk2xbE4ubUY038NRNoqwMk0QCeLLhm > A7mRNxCC6RZ8te13sTDvVT8=3D > =3DnYqn > -----END PGP SIGNATURE----- > > > |