From: Daniel T. <d.t...@gm...> - 2005-04-19 20:39:50
|
I have a recently built 2.6.11 kernel built from the patches in http://www.gc-linux.org/down/isobel It gets its root partition via NFS and its swap from a combination of ARAM and NBD (I have used priorities to make ARAM the perfered swap device). The only reason I suspect swap is because the crashes always list the 'swapper' thread. Unfortunately my kernel is panicing due to one of the BUGONs in the EXI framework (the one that prevents us selecting an already selected EXI channel). The death messages are as follows (I have had to guess the first character since it was off the side of my TV): <2>kernel BUG in exi_cmd_select at drivers/exi/exi-hw.c:412! Oops: Exception in kernel mode. sig: 5 [#3] ?IP: C010C998 LR: C010CB58 SP: C020B580 REGS: c020b4d0 TRAP: 0700 Not tainted ?SR: 00021032 EE: 0: PR: 0: FP: 0 ME: 1 IR/DR: 11 TASK = c01d9470[0] 'swapper' THREAD: c020a000 Last syscall: 120 PR00: 00000001 C020B580 C01D9470 C020B5E0 00000000 00000000 C020B600 C0220000 PR08: 426563DA 00000000 C01E8610 00000001 F4826EC9 8130D720 CC006400 03000000 PR16: 00001808 00000000 00000000 00000000 FFFFFFFF 04120000 000000BA 800C0000 PR24: C0230000 C0230000 C02270A8 00000000 C020B5E0 C020B5E0 00001032 C01E8610 ?IP [c010c998] exi_run_command+0x278/0x2ac ?R [c010cb58] exi_set_time+0x1c/0x68 Call trace: [c010cb58] exi_select+0x44/0x58 [c000b9f8] rtc_set_time+0x1c/0x68 [c000bab8] gcn_set_rtc_time+0x30/0x44 [c0005354] timer_interrupt+0x168/0x210 [c00042cc] ret_from_except+0x0/0x14 [c000598c] __delay+0xc/0x14 [c0015580] complete_and_exit+0x0/0x28 [c0004590] _exception+0x0/0x74 [c0004600] _exception+0x70/0x74 [c0005280] ret_from_except_full+0x0/0x4c [c010c998] exi_run_command+0x278/0x2ac [c010cb58] exi_select+0x44/0x58 [c000b9f8] rtc_set_time+0x1c/0x68 [c000bab8] gcn_set_rtc_time+0x30/0x44 [c0005354] timer_interrupt+0x168/0x210 Kernel panic - not syncing: Aiee, killing interrupt handler! Any clues, ideas or experiments they think I should perform? -- Daniel Thompson (Merlin) <d.t...@gm...> If at first you don't succeed then sky diving is probably not for you. |
From: Albert H. <alb...@gm...> - 2005-04-19 23:12:59
|
Daniel Thompson <d.thompson <at> gmx.net> writes: > I have a recently built 2.6.11 kernel built from the patches in > http://www.gc-linux.org/down/isobel [...] > Unfortunately my kernel is panicing due to one of the BUGONs in the EXI > framework (the one that prevents us selecting an already selected EXI > channel). The death messages are as follows (I have had to guess the > first character since it was off the side of my TV): [...] > Any clues, ideas or experiments they think I should perform? Hi, Try cvs code (exi-hw.c) or disable preemption. IIRC, the patch you are using doesn't include a small fix for preemption enabled kernels. Cheers, Albert |
From: Daniel T. <da...@re...> - 2005-04-23 09:52:34
|
On Tue, 2005-04-19 at 23:08 +0000, Albert Herranz wrote: > Daniel Thompson <d.thompson <at> gmx.net> writes: > > I have a recently built 2.6.11 kernel built from the patches in > > http://www.gc-linux.org/down/isobel > [...] > > Unfortunately my kernel is panicing due to one of the BUGONs in the EXI > > framework (the one that prevents us selecting an already selected EXI > > channel). The death messages are as follows (I have had to guess the > > first character since it was off the side of my TV): > [...] > > Any clues, ideas or experiments they think I should perform? >=20 > Hi, >=20 > Try cvs code (exi-hw.c) or disable preemption. I wasn't running with pre-emption however I have updated to the CVS version since it appears to have quite a lot of changes compared to the one I was using. Unfortunately I am still seeing a crash when the 'cube starts to swap a little. The bulk of the crash info is the same as before but I have observed a different stack. Although the top is similar the bottom half looks less broken than the other. exi_run_command+0x288/2bc exi_select+0x44/0x58 rtc_set_time+0x1c/0x68 gcn_set_rtc_time+0x30/0x44 timer_interrupt+0x168 ret_from_except+0x0/0x14 ppc6xx_idel+0xe4/0xf0 cpu_idle+0x28/0x38 rest_init+0x24/0x34 start_kernel+0x170/0x1a8 --=20 Daniel Thompson (Merlin) <da...@re...> signature.asc? http://www.redfelineninja.dsl.pipex.com/signature.html Did Sigmund's wife wear Freudian slips? |
From: Albert H. <alb...@gm...> - 2005-04-24 15:23:25
|
Daniel Thompson <daniel <at> redfelineninja.org.uk> writes: > I wasn't running with pre-emption however I have updated to the CVS > version since it appears to have quite a lot of changes compared to the > one I was using. > > Unfortunately I am still seeing a crash when the 'cube starts to swap a > little. Could you please post your .config and describe a bit when the crash happens? (do you run an ntpd deamon, or something?) From your traces, the crash seems to have nothing to do with swap, but with the rtc code. It gets called from interrupt context, something that the rtc driver currently does not support (it does not protect the select/deselect region). I would like to be able to get enough information about your environment to reproduce your crash in my system. Cheers, Albert |
From: Daniel T. <da...@re...> - 2005-04-24 17:58:16
Attachments:
.config
|
On Sun, 2005-04-24 at 15:17 +0000, Albert Herranz wrote: > Daniel Thompson <daniel <at> redfelineninja.org.uk> writes: > > I wasn't running with pre-emption however I have updated to the CVS > > version since it appears to have quite a lot of changes compared to the > > one I was using. > > > > Unfortunately I am still seeing a crash when the 'cube starts to swap a > > little. > > Could you please post your .config and describe a bit when the crash happens? > (do you run an ntpd deamon, or something?) See attached for my .config though note that it is almost exactly the same as the one in your download area on the web site. I do run ntp: drt@cube:~$ dpkg -l | grep ntp ii ntp 4.2.0a+stable- Network Time Protocol: network utilities ii ntp-server 4.2.0a+stable- Network Time Protocol: common server tools ii ntp-simple 4.2.0a+stable- Network Time Protocol: daemon for simple sys rc ntpdate 4.1.0-8 The ntpdate client for setting system time f Reading the timer_interrupt() code you might find that the following file which stores the rate of drift of the RTC might also help with reproduction: drt@cube:~$ more /etc/adjtime -0.412188 1114242043 0.000000 1114242043 UTC > >From your traces, the crash seems to have nothing to do with swap, but with the > rtc code. It gets called from interrupt context, something that the rtc driver > currently does not support (it does not protect the select/deselect region). Agreed. After reading this I tried using another means to stress the BBA and therefore the EXI bus. The following command issued from the 'cube at my main Linux box caused a crash in less than a minute (with swap to /dev/aram only): drt@cube:~$ ssh starfish dd if=/dev/zero > /dev/null -- Daniel Thompson (Merlin) <da...@re...> signature.asc? http://www.redfelineninja.dsl.pipex.com/signature.html Did Sigmund's wife wear Freudian slips? |
From: Albert H. <alb...@gm...> - 2005-04-24 18:45:17
|
Daniel Thompson <daniel <at> redfelineninja.org.uk> writes: > See attached for my .config though note that it is almost exactly the > same as the one in your download area on the web site. > > I do run ntp: I tried running a ntp server too and the bug happened after a few seconds. So, yes, the bug is indeed triggered by the ntpd. The problem is that at interrupt time we can't always guarantee that the exi channel needed to get the current time from the RTC is free, specially at timer interrupt time. If the channel is busy, we could derive a time value from the last true value read and the jiffies since then. I'll look on it. Meanwhile, you'll need to stop the ntp deamon. Cheers, Albert |
From: Albert H. <alb...@gm...> - 2005-04-25 18:54:44
|
Daniel Thompson <daniel <at> redfelineninja.org.uk> writes: > Agreed. After reading this I tried using another means to stress the BBA > and therefore the EXI bus. The following command issued from the 'cube > at my main Linux box caused a crash in less than a minute (with swap > to /dev/aram only): > > drt <at> cube:~$ ssh starfish dd if=/dev/zero > /dev/null > I took a closer look to it, and it seems better than expected. The get_rtc_time hook is not called from interrupt context, so we do not need to protect it. It will succeed. The set_rtc_time hook is called from user and interrupt context, depending on if you use the /dev/rtc interface or the adjtimex syscall. The good news is that the system is prepared for set_rtc_time to fail, so we can simply return an error if the exi channel is busy while trying to write to the rtc. I'll merge a patch on CVS to fix this. Cheers, Albert |