From: Steven J N. <st...@sn...> - 2008-11-08 21:27:37
|
On Sat, 2008-11-08 at 10:27 -0500, Robert Noland wrote: > On Fri, 2008-11-07 at 22:00 +0000, Steven J Newbury wrote: > > On Fri, 2008-11-07 at 21:44 +0000, Steven J Newbury wrote: > > > On Fri, 2008-11-07 at 20:45 +0000, Steven J Newbury wrote: > > > > On Fri, 2008-11-07 at 11:11 -0800, Eric Anholt wrote: > > > > > On Fri, 2008-11-07 at 14:01 +0000, Steven J Newbury wrote: > > > > > > > > > I'm on 965GM and I'm having a serious interrupt problem since this patch > > > > > > went into for-review: > > > > > > > > > > > > Nov 7 04:20:22 infinity irq 16: nobody cared (try booting with the > > > > > > "irqpoll" option) > > > > > > Nov 7 04:20:22 infinity Pid: 0, comm: swapper Not tainted > > > > > > 2.6.28-rc3-00236-g1d7eff8 #23 > > > > > > Nov 7 04:20:22 infinity Call Trace: > > > > > > Nov 7 04:20:22 infinity <IRQ> [<ffffffff80491a25>] ? > > > > > > i915_driver_irq_handler+0x53/0x186 > > > > > > Nov 7 04:20:22 infinity [<ffffffff80270b55>] __report_bad_irq+0x3d/0x8c > > > > > > Nov 7 04:20:22 infinity [<ffffffff80270cb7>] note_interrupt+0x113/0x178 > > > > > > Nov 7 04:20:22 infinity [<ffffffff802713db>] handle_fasteoi_irq > > > > > > +0x99/0xc3 > > > > > > Nov 7 04:20:22 infinity [<ffffffff8020ee5f>] do_IRQ+0x9c/0x11d > > > > > > Nov 7 04:20:22 infinity [<ffffffff8020c826>] ret_from_intr+0x0/0xa > > > > > > Nov 7 04:20:22 infinity <EOI> [<ffffffff804572c0>] ? > > > > > > acpi_idle_enter_simple+0x175/0x1a8 > > > > > > Nov 7 04:20:22 infinity [<ffffffff804572b6>] ? acpi_idle_enter_simple > > > > > > +0x16b/0x1a8 > > > > > > Nov 7 04:20:22 infinity [<ffffffff8052af56>] ? cpuidle_idle_call > > > > > > +0xa6/0xe0 > > > > > > Nov 7 04:20:22 infinity [<ffffffff8020b47a>] ? cpu_idle+0x4c/0xb0 > > > > > > Nov 7 04:20:22 infinity [<ffffffff80614551>] ? rest_init+0x75/0x77 > > > > > > Nov 7 04:20:22 infinity handlers: > > > > > > Nov 7 04:20:22 infinity [<ffffffff804919d2>] (i915_driver_irq_handler > > > > > > +0x0/0x186) > > > > > > Nov 7 04:20:22 infinity Disabling IRQ #16 > > > > > > > > > > > > This happens after a random amount of time in X, athough never very > > > > > > long. From this point on there are no interrupts generated unless I > > > > > > switch vts away from X and back again. > > > I'm wrong here. Switching vts only "fixes" the second problem below. > > > > > > > This gets interrupts working > > > > > > again for a short while. > > > > > > > > > > Can you get /proc/dri/0/i915_gem_interrupt from before and just after > > > > > the problem occurs? > > > > > > > > > I'll fire up a for-review kernel and see what it says. > > > > > > Before X: > > > > > > Interrupt enable: 00000000 > > > Interrupt identity: 00000000 > > > Interrupt mask: fffedfff > > > Pipe A stat: 00000203 > > > Pipe B stat: 80000206 > > > Interrupts received: 0 > > > Current sequence: 0 > > > Waiter sequence: 0 > > > IRQ sequence: 0 > > > > > > After X has started: > > > > > > Interrupt enable: 00000051 > > > Interrupt identity: 00000002 > > > Interrupt mask: fffedfac > > > Pipe A stat: 00020204 > > > Pipe B stat: 00000206 > > > Interrupts received: 1327 > > > Current sequence: 1742 > > > Waiter sequence: 0 > > > IRQ sequence: 1738 > > > > > > Interrupt enable: 00000051 > > > Interrupt identity: 00000002 > > > Interrupt mask: fffedfac > > > Pipe A stat: 00020204 > > > Pipe B stat: 00000206 > > > Interrupts received: 33424 > > > Current sequence: 43154 > > > Waiter sequence: 0 > > > IRQ sequence: 43132 > > > > > > Interrupt enable: 00000051 > > > Interrupt identity: 00000002 > > > Interrupt mask: fffedfac > > > Pipe A stat: 00020204 > > > Pipe B stat: 00020000 > > > Interrupts received: 42250 > > > Current sequence: 58442 > > > Waiter sequence: 0 > > > IRQ sequence: 58434 > > > ____ > > > > > > After interrupt failure: > > > > > > Interrupt enable: 00000051 > > > Interrupt identity: 00000000 > > > Interrupt mask: fffedfac > > > Pipe A stat: 00020204 > > > Pipe B stat: 00000206 > > > Interrupts received: 200097 > > > Current sequence: 96282 > > > Waiter sequence: 0 > > > IRQ sequence: 96282 > > > > > > Output of 'cat /proc/interrupts' : > > > CPU0 CPU1 > > > 0: 309831 301848 IO-APIC-edge timer > > > 1: 964 1747 IO-APIC-edge i8042 > > > 4: 1 1 IO-APIC-edge > > > 8: 1 0 IO-APIC-edge rtc0 > > > 9: 0 1 IO-APIC-fasteoi acpi > > > 12: 11555 16280 IO-APIC-edge i8042 > > > 14: 0 0 IO-APIC-edge ata_piix > > > 15: 0 0 IO-APIC-edge ata_piix > > > 16: 99522 100479 IO-APIC-fasteoi i915@pci:0000:00:02.0 > > > 19: 6 9 IO-APIC-fasteoi yenta, firewire_ohci > > > 20: 75 63 IO-APIC-fasteoi uhci_hcd:usb1, > > > uhci_hcd:usb3, ehci_hcd:usb7 > > > 21: 204 216 IO-APIC-fasteoi uhci_hcd:usb2, > > > uhci_hcd:usb4, HDA Intel > > > 22: 352 644 IO-APIC-fasteoi uhci_hcd:usb5, > > > ehci_hcd:usb6 > > > 43: 4898 5996 PCI-MSI-edge ahci > > > NMI: 0 0 Non-maskable interrupts > > > LOC: 116278 86951 Local timer interrupts > > > RES: 27385 27476 Rescheduling interrupts > > > CAL: 91 32 Function call interrupts > > > TLB: 32 96 TLB shootdowns > > > TRM: 0 0 Thermal event interrupts > > > THR: 0 0 Threshold APIC interrupts > > > SPU: 0 0 Spurious interrupts > > > ERR: 0 > > > MIS: 0 > > > > Curiously, the i915_gem_interrupt count continues to rise despite no > > more interrupts being recorded in /proc/interrupts. Clearly interrupts > > are not working, X is very slow, and glxgears reports interrupts are not > > working correctly. > > > > Currently: > > cat /proc/dri/0/i915_gem_interrupt > > Interrupt enable: 00000051 > > Interrupt identity: 00000002 > > Interrupt mask: fffedfac > > Pipe A stat: 00000000 > > Pipe B stat: 00000206 > > Interrupts received: 615479 > > Current sequence: 308340 > > Waiter sequence: 0 > > IRQ sequence: 308338 > > Unless keithp's most recent patch moving BREADCRUMB_INDEX prevents some > internal brain damage, messing with IER often seems to be a bad idea, at > least on 965gm. I've spent most of the week fighting this issue on > FreeBSD. Last night, I flipped the logic back to setting up IER during > interrupt handler install and flipping bits in IMR to enable / disable > irqs and everything is working correctly again. I have made some other > code changes in the handler, but none of them resolved the issue. > Inverting the logic got everything working again, for both INTx and MSI. > I know that it is published that MSI should not be used on the 965gm, > but I've not seen any issues on my hardware. > > robert. Now this is really weird, if I suspend to RAM and then resume, from that point everything seems to work fine so far!?! My guess is the re-installation of the interrupt handler on resume occurs with different register values compared to the initial setup. |