From: Struan B. <st...@pr...> - 2004-11-07 17:35:00
|
Posted to SourceForge: For the record, this is not entirely fixed. I'm running a Debian host with 2.4.25 kernel and host-skas3-2.4.25.patch, and a Debian guest with 2.4.27 kernel and uml-patch-2.4.26-3. In one xterm I ran "while true; do ls -la /dev/*; done" (see also bug report 617709). In the other, "cat /dev/urandom". After 5-10 seconds, the 'linux' host process crashes completely with: Scheduling in interrupt Kernel panic: kernel BUG at sched.c:564! In interrupt handler - not syncing <6>SysRq : Show Regs EIP: 0023:[<400e4b38>] CPU: 0 Not tainted ESP: 002b:bffffc68 EFLAGS: 00000246 Not tainted EAX: ffffffda EBX: 00000001 ECX: 0804cdf0 EDX: 00001000 ESI: 00001000 EDI: 0804cdf0 EBP: bffffc88 DS: 002b ES: 002b Call Trace: [<a0010e5f>] [<a00cae60>] [<a018db51>] [<a018dade>] [<a000dc93>] [<a00cade9>] [<a00b526f>] [<a0010e5f>] [<a001d2ec>] [<a012d637>] [<a017561e>] [<a0010e5f>] [<a017561e>] [<a0010405>] [<a00103f1>] [<a0010e5f>] [<a000d965>] [<a017561e>] [<a0175616>] [<a00b26ec>] [<a00bd0d1>] [<a0015e2f>] [<a00bd268>] [<a00e1fc4>] [<a00b4caf>] [<a00e0057>] [<a00c32d5>] [<a00b4caf>] [<a00c3a6a>] [<a012f024>] [<a00c42e2>] [<a00b2816>] [<a00b2855>] [<a00b277e>] [<a00c3035>] [<a00c8bc0>] [<a001622f>] [<a001a8e3>] [<a0016153>] [<a001607b>] [<a0016037>] [<a00b26ec>] [<a0015de1>] [<a0015dcf>] [<a00ada41>] [<a00ada52>] [<a00b3dfe>] [<a00b4108>] [<a00bb278>] [<a00bb211>] [<a00bb211>] [<a00bb278>] [<a00b4bed>] [<a00b4ba8>] [<a012ef38>] [<a014639d>] [<a014639d>] [<a012ef38>] [<a00c8db0>] [<a00bb56e>] [<a00bb3e0>] [<a00bb7f0>] [<a00b0404>] [<a00bb7f0>] [<a00bb304>] [<a0053f2f>] [<a00bb64c>] [<a00ab920>] [<a00b0404>] [<a00bb3d4>] [<a00c8c7b>] [<a00bb304>] [<a00bb458>] [<a00bb59c>] [<a00bb64c>] [<a012f024>] [<a00bb59c>] [<a00bb3d4>] [<a00b0404>] [<a00bb64c>] [<a00bb458>] [<a00bb59c>] [<a00bb59c>] [<a00b597f>] [<a00b2855>] [<a00bb458>] [<a00be776>] [<a00be776>] [<a00be81d>] [<a0146380>] [<a00b0658>] [<a00be81d>] [<a00e031b>] [<a00e05d6>] [<a00bb59c>] [<a00e1458>] [<a00bb637>] [<a00e0057>] [<a00e0057>] [<a00c3a29>] [<a00c32d5>] [<a00c5f3d>] [<a00c5e11>] [<a00c0e91>] [<a00c5d50>] [<a0033e1a>] [<a00bad8e>] [<a00b35bb>] [<a00b36e3>] [<a00baddc>] [<a00badd3>] [<a00b9fb5>] [<a00ba0bd>] [<a00b01ba>] [<a00ba31a>] [<a00b26ec>] [<a00baa87>] [<a00baa71>] [<a012ef38>] [<a012efd1>] |
From: Struan B. <st...@pr...> - 2004-11-14 10:47:51
|
I've now tested for this bug using a simplified methodology on the latest patched-versions of the host (2.4.27 and 2.6.8) and guest (2.4.27 and 2.6.9). _New methodology_ My Debian guest system raises two xterms at startup. On each I log in as root. On one, I enter: while true; do echo >/dev/null; done On the second, I enter: cat /dev/urandom The devastating symptom: within 5 seconds the guest kernel has paniced and exited with the message (below) "kernel BUG at sched.c:564!". _Findings_ guest 2.6.9: patched with uml-2.6.9-bb2.patch.bz2 - I am pleased to say I have not so far been able to reproduce this bug on this configuration. guest 2.4.27: patched with uml-2.4.27-bs1.patch and uml-patch-2.4.24-1base.patch - no change; this bug is readily reproducible and so IS NOT FIXED, regardless of whether the guest is run on host 2.4.27 or 2.6.8 _Conclusion_ If such a simple pair of commands as this causes such devastating consequences, it seems the 2.4.27 kernel sched.c line 564 needs to be taken a look at - unless, that is, Jeff Dike's recent post about uml-patch-2.4.27-1 "a nasty scheduler race fixed" addresses this bug. Jeff: where can I get your uml-patch-2.4.27-1 ? Struan Struan Bartlett wrote: > Posted to SourceForge: > > For the record, this is not entirely fixed. I'm running a Debian host > with 2.4.25 kernel and host-skas3-2.4.25.patch, and a Debian > guest with 2.4.27 kernel and uml-patch-2.4.26-3. > > In one xterm I ran "while true; do ls -la /dev/*; done" (see > also bug report 617709). In the other, "cat /dev/urandom". > After 5-10 seconds, the 'linux' host process crashes > completely with: > > Scheduling in interrupt > Kernel panic: kernel BUG at sched.c:564! > > In interrupt handler - not syncing > <6>SysRq : Show Regs > > EIP: 0023:[<400e4b38>] CPU: 0 Not tainted ESP: 002b:bffffc68 > EFLAGS: 00000246 > Not tainted > EAX: ffffffda EBX: 00000001 ECX: 0804cdf0 EDX: 00001000 > ESI: 00001000 EDI: 0804cdf0 EBP: bffffc88 DS: 002b ES: 002b > Call Trace: [<a0010e5f>] [<a00cae60>] [<a018db51>] > [<a018dade>] [<a000dc93>] [<a00cade9>] [<a00b526f>] [<a0010e5f>] > [<a001d2ec>] > [<a012d637>] [<a017561e>] [<a0010e5f>] [<a017561e>] [<a0010405>] > [<a00103f1>] > [<a0010e5f>] [<a000d965>] [<a017561e>] [<a0175616>] [<a00b26ec>] > [<a00bd0d1>] > [<a0015e2f>] [<a00bd268>] [<a00e1fc4>] [<a00b4caf>] [<a00e0057>] > [<a00c32d5>] > [<a00b4caf>] [<a00c3a6a>] [<a012f024>] [<a00c42e2>] [<a00b2816>] > [<a00b2855>] > [<a00b277e>] [<a00c3035>] [<a00c8bc0>] [<a001622f>] [<a001a8e3>] > [<a0016153>] > [<a001607b>] [<a0016037>] [<a00b26ec>] [<a0015de1>] [<a0015dcf>] > [<a00ada41>] > [<a00ada52>] [<a00b3dfe>] [<a00b4108>] [<a00bb278>] [<a00bb211>] > [<a00bb211>] > [<a00bb278>] [<a00b4bed>] [<a00b4ba8>] [<a012ef38>] [<a014639d>] > [<a014639d>] > [<a012ef38>] [<a00c8db0>] [<a00bb56e>] [<a00bb3e0>] [<a00bb7f0>] > [<a00b0404>] > [<a00bb7f0>] [<a00bb304>] [<a0053f2f>] [<a00bb64c>] [<a00ab920>] > [<a00b0404>] > [<a00bb3d4>] [<a00c8c7b>] [<a00bb304>] [<a00bb458>] [<a00bb59c>] > [<a00bb64c>] > [<a012f024>] [<a00bb59c>] [<a00bb3d4>] [<a00b0404>] [<a00bb64c>] > [<a00bb458>] > [<a00bb59c>] [<a00bb59c>] [<a00b597f>] [<a00b2855>] [<a00bb458>] > [<a00be776>] > [<a00be776>] [<a00be81d>] [<a0146380>] [<a00b0658>] [<a00be81d>] > [<a00e031b>] > [<a00e05d6>] [<a00bb59c>] [<a00e1458>] [<a00bb637>] [<a00e0057>] > [<a00e0057>] > [<a00c3a29>] [<a00c32d5>] [<a00c5f3d>] [<a00c5e11>] [<a00c0e91>] > [<a00c5d50>] > [<a0033e1a>] [<a00bad8e>] [<a00b35bb>] [<a00b36e3>] [<a00baddc>] > [<a00badd3>] > [<a00b9fb5>] [<a00ba0bd>] [<a00b01ba>] [<a00ba31a>] [<a00b26ec>] > [<a00baa87>] > [<a00baa71>] [<a012ef38>] [<a012efd1>] > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel |
From: Blaisorblade <bla...@ya...> - 2004-11-15 17:19:44
|
On Sunday 14 November 2004 11:47, Struan Bartlett wrote: > I've now tested for this bug using a simplified methodology on the > latest patched-versions of the host (2.4.27 and 2.6.8) and guest (2.4.27 > and 2.6.9). > _New methodology_ > My Debian guest system raises two xterms at startup. On each I log in as > root. On one, I enter: > while true; do echo >/dev/null; done > On the second, I enter: > cat /dev/urandom > The devastating symptom: within 5 seconds the guest kernel has paniced > and exited with the message (below) "kernel BUG at sched.c:564!". > > _Findings_ > guest 2.6.9: patched with uml-2.6.9-bb2.patch.bz2 - I am pleased to say > I have not so far been able to reproduce this bug on this configuration. > guest 2.4.27: patched with uml-2.4.27-bs1.patch and > uml-patch-2.4.24-1base.patch - no change; this bug is readily > reproducible and so IS NOT FIXED, regardless of whether the guest is run > on host 2.4.27 or 2.6.8 > _Conclusion_ > > If such a simple pair of commands as this causes such devastating > consequences, it seems the 2.4.27 kernel sched.c line 564 needs to be > taken a look at - unless, that is, Jeff Dike's recent post about > uml-patch-2.4.27-1 "a nasty scheduler race fixed" addresses this bug. I may be wrong, but I think I should have picked that patch into 2.4.27-bs1. > Jeff: where can I get your uml-patch-2.4.27-1 ? It's at http://user-mode-linux.sourceforge.net/dl-ists.html I'll take a look at your report - there is one patch applied on 2.6 that seemed unneeded on 2.4 until now - this may be reconsidered... > Struan -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: Struan B. <st...@pr...> - 2004-11-16 15:45:20
|
I've compiled and ran the test (below) on a virgin 2.4.27 kernel patched with uml-patch-2.4.27-1.bz2. Result: the bug is still readily reproducible and _is not fixed_ in this patch. P.S. Jeff Dike: if CONFIG_NETFILTER=y is set in .config, the patched kernel fails to compile. Blaisorblade wrote: >On Sunday 14 November 2004 11:47, Struan Bartlett wrote: > > >>I've now tested for this bug using a simplified methodology on the >>latest patched-versions of the host (2.4.27 and 2.6.8) and guest (2.4.27 >>and 2.6.9). >> >> >>_New methodology_ >> >> >>My Debian guest system raises two xterms at startup. On each I log in as >>root. On one, I enter: >> >> >>while true; do echo >/dev/null; done >> >> >>On the second, I enter: >> >> >>cat /dev/urandom >> >> >>The devastating symptom: within 5 seconds the guest kernel has paniced >>and exited with the message (below) "kernel BUG at sched.c:564!". >> >>_Findings_ >> >> >>guest 2.4.27: patched with uml-2.4.27-bs1.patch and >>uml-patch-2.4.24-1base.patch - no change; this bug is readily >>reproducible and so IS NOT FIXED, regardless of whether the guest is run >>on host 2.4.27 or 2.6.8 >> >> >>_Conclusion_ >> >>If such a simple pair of commands as this causes such devastating >>consequences, it seems the 2.4.27 kernel sched.c line 564 needs to be >>taken a look at - unless, that is, Jeff Dike's recent post about >>uml-patch-2.4.27-1 "a nasty scheduler race fixed" addresses this bug. >> >> >I may be wrong, but I think I should have picked that patch into 2.4.27-bs1. > >I'll take a look at your report - there is one patch applied on 2.6 that >seemed unneeded on 2.4 until now - this may be reconsidered... > > |
From: Blaisorblade <bla...@ya...> - 2005-02-04 05:40:21
|
On Tuesday 16 November 2004 16:44, Struan Bartlett wrote: > I've compiled and ran the test (below) on a virgin 2.4.27 kernel patched > with uml-patch-2.4.27-1.bz2. > > Result: the bug is still readily reproducible and _is not fixed_ in this > patch. > P.S. Jeff Dike: if CONFIG_NETFILTER=y is set in .config, the patched > kernel fails to compile. This one was probably just a "clean-to-fix" bug. Ok, I have just one more clue now (not the solution yet). The first problem was that I went searching for "scheduler fixes" from 2.6: well, I merged some more, but without success. So, I've looked again at the infamous line 564, and what I realized (don't ask me why I missed this last time) is that this BUG line: if (unlikely(in_interrupt())) { printk("Scheduling in interrupt\n"); BUG(); } is triggered because of a bug in the urandom driver - which you use in both your reproducing methods. So, we are down to some bug related to code in urandom.c; since it works in mainline, I think the problem probably lies in code managing interrupts and / or timing (since they are related to random numbers generation). Btw, I just got the same panic by simply running "cat /dev/urandom"! > >>cat /dev/urandom > >>The devastating symptom: within 5 seconds the guest kernel has paniced > >>and exited with the message (below) "kernel BUG at sched.c:564!". > >>_Conclusion_ > >> > >>If such a simple pair of commands as this causes such devastating > >>consequences, it seems the 2.4.27 kernel sched.c line 564 needs to be > >>taken a look at - unless, that is, Jeff Dike's recent post about > >>uml-patch-2.4.27-1 "a nasty scheduler race fixed" addresses this bug. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade |