From: Andrew M. <ak...@li...> - 2009-05-16 23:20:36
|
On Fri, 15 May 2009 14:41:26 +0200 Andi Kleen <an...@fi...> wrote: > "Brandeburg, Jesse" <jes...@in...> writes: > > Hi Jesse, > > > when starting a profile run on the latest net-next kernel, I'm currently > > trying to reproduce on 2.6.30-rc5 stock. > > Were you able to reproduce it? > > > > > config available upon request, arch=x86_64, recent (F10 or newer) oprofile > > userspace. > > it looks like two bugs: oprofile didn't catch a NMI that belongs to > it (most likely) and the NMI watchdog referenced a NULL pointer > while processing an NMI. > > Did you have the nmi watchdog enabled on the command line? > > > > > BUG: unable to handle kernel NULL pointer dereference at (null) > > IP: [<ffffffff8066080a>] nmi_watchdog_tick+0xa1/0x1d6 > > I don't get the same code as you. But the oopsing instruction in your > oops is > > 2b:* 44 0f a3 28 bt %r13d,(%rax) <-- trapping instruction > > with rax == 0 and I suspect it's one of the new cpu mask checks > I would try reverting > > fcc5c4a2feea3886dc058498b28508b2731720d5 > 2f537a9f8e82f55c241b002c8cfbf34303b45ada > fcef8576d8a64fc603e719c97d423f9f6d4e0e8b > > and see which one causes it. That would only fix the NMI watchdog bug > of course. > > The oprofile not catching a event problem would be still open then. > I think the checks for overflowed counters are not 100% perfect > so that could happen. I have some patches in the works to use the new > global status register on arch perfmon 2, with that the overflow > check is somewhat more reliable. But that's more work. > Ping? This is in Rafael's regression list but I suspect that it's a linux-next-only thing? |