From: Patrick M. <dia...@gm...> - 2004-09-04 09:17:14
Attachments:
config.gz
netconsole.log.gz
|
I'm currently using an r200 (specifically, an agp 'ATI Technologies Inc Radeon R200 QM [Radeon 9100]') on a uniproc Pentium 3 board equipped with an intel 440bx/piix4 type chipset (the agp controller is identified as 'Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03)' All of this was tested with a virgin 2.6.8.1 (with debug info and frame pointers enabled) and Debian's XFree86 4.3.0.1, using DarkPlaces and Twilight (both popular quakeGL engine forks) as test apps, unless otherwise noted. Thanks to wli (who I owe at least one beer for this, may be an entire case), we've been able to figure exactly whats going on. The driver is turning off interrupts, then deadlocking. (No sysrq, no sshing in, capslock's light doesn't work.) Turning the NMI watchdog on, it 'fixes' the deadlock (and thanks to the watchdog, ssh and sysrq now work, but capslock's light still doesn't work), but the app and X are still dead, but now I can ssh in and kill -9 them both, however, and quite obviously, I can't start another X, but I can reboot cleanly. Things already tested that don't effect bug: Turning SMP on or off Turning 4k stacks on or off Using new radeon fbcon, using old radeon fbcon, using no fbcon Turning Local APIC for uniproc and/or IO-APIC for uniproc on or off Turning preempt on or off Using mem=nopentium Waving a dead chicken over the box Things already tested for: Kernels as far back as 2.6.0 have this bug, haven't tested any earlier Thanks to netconsole (who I recommend to anyone that can't setup serial console stuff), I was able to capture the entire kernel output, including sysrq-t output right after my test app crashes. I'm including both the netconsole output and the .config. vmlinux and radeon.ko (and anything else you need) are available upon request. -- Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Dave A. <ai...@li...> - 2004-09-04 10:59:17
|
Can you insmod the radeon drm module with drm_opts=debug do the test and send on the trace, it may be getting wedged somewhere unexpected... Dave. > > All of this was tested with a virgin 2.6.8.1 (with debug info and > frame pointers enabled) and Debian's XFree86 4.3.0.1, using DarkPlaces > and Twilight (both popular quakeGL engine forks) as test apps, unless > otherwise noted. > > Thanks to wli (who I owe at least one beer for this, may be an entire > case), we've been able to figure exactly whats going on. The driver is > turning off interrupts, then deadlocking. (No sysrq, no sshing in, > capslock's light doesn't work.) > > Turning the NMI watchdog on, it 'fixes' the deadlock (and thanks to > the watchdog, ssh and sysrq now work, but capslock's light still > doesn't work), but the app and X are still dead, but now I can ssh in > and kill -9 them both, however, and quite obviously, I can't start > another X, but I can reboot cleanly. > > Things already tested that don't effect bug: > Turning SMP on or off > Turning 4k stacks on or off > Using new radeon fbcon, using old radeon fbcon, using no fbcon > Turning Local APIC for uniproc and/or IO-APIC for uniproc on or off > Turning preempt on or off > Using mem=nopentium > Waving a dead chicken over the box > > Things already tested for: > Kernels as far back as 2.6.0 have this bug, haven't tested any earlier > > Thanks to netconsole (who I recommend to anyone that can't setup > serial console stuff), I was able to capture the entire kernel output, > including sysrq-t output right after my test app crashes. > > I'm including both the netconsole output and the .config. vmlinux and > radeon.ko (and anything else you need) are available upon request. > > -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie pam_smb / Linux DECstation / Linux VAX / ILUG person |
From: Patrick M. <dia...@gm...> - 2004-09-05 08:41:07
Attachments:
netconsole.log.gz
|
On Sat, 4 Sep 2004 11:59:12 +0100 (IST), Dave Airlie <ai...@li...> wrote: > > Can you insmod the radeon drm module with drm_opts=debug do the test and > send on the trace, it may be getting wedged somewhere unexpected... Here you go, but it doesn't look like it has output anything interesting. -- Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Michel <mi...@da...> - 2004-09-04 18:14:44
|
On Sat, 2004-09-04 at 05:16 -0400, Patrick McFarland wrote: >=20 > All of this was tested with a virgin 2.6.8.1 (with debug info and > frame pointers enabled) and Debian's XFree86 4.3.0.1, [...] What version of the DRI driver? --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Patrick M. <dia...@gm...> - 2004-09-04 20:36:43
|
On Sat, 04 Sep 2004 14:14:55 -0400, Michel D=E4nzer <mi...@da...> wr= ote: > What version of the DRI driver? Where do I look for that? --=20 Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, w= e'd=20 all be running around in darkened rooms, munching magic pills and listening= to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Michel <mi...@da...> - 2004-09-05 06:34:44
|
On Sat, 2004-09-04 at 16:36 -0400, Patrick McFarland wrote: > On Sat, 04 Sep 2004 14:14:55 -0400, Michel D=C3=A4nzer <mi...@da...= t> wrote: > > What version of the DRI driver? >=20 > Where do I look for that? Where did you get r200_dri.so from? --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Patrick M. <dia...@gm...> - 2004-09-05 08:22:34
|
On Sun, 05 Sep 2004 02:34:59 -0400, Michel D=E4nzer <mi...@da...> wr= ote: > On Sat, 2004-09-04 at 16:36 -0400, Patrick McFarland wrote: > > On Sat, 04 Sep 2004 14:14:55 -0400, Michel D=E4nzer <mi...@da...= > wrote: > > > What version of the DRI driver? > > > > Where do I look for that? >=20 > Where did you get r200_dri.so from? From the one that comes with the Deb X I mentioned above. --=20 Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, w= e'd=20 all be running around in darkened rooms, munching magic pills and listening= to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Michel <mi...@da...> - 2004-09-05 17:40:38
|
On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote: > On Sun, 05 Sep 2004 02:34:59 -0400, Michel D=C3=A4nzer <mi...@da...= t> wrote: > >=20 > > Where did you get r200_dri.so from? >=20 > From the one that comes with the Deb X I mentioned above. Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary snapshot from dri.sf.net. --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Patrick M. <dia...@gm...> - 2004-09-05 20:18:48
|
On Sun, 05 Sep 2004 13:40:54 -0400, Michel D=E4nzer <mi...@da...> wr= ote: > On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote: > > On Sun, 05 Sep 2004 02:34:59 -0400, Michel D=E4nzer <mi...@da...= > wrote: > > > > > > Where did you get r200_dri.so from? > > > > From the one that comes with the Deb X I mentioned above. >=20 > Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary > snapshot from dri.sf.net. That shouldn't matter, should it? The userland stuff should never lock the machine up. I'll test it anyhow, though. --=20 Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, w= e'd=20 all be running around in darkened rooms, munching magic pills and listening= to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Michel <mi...@da...> - 2004-09-05 20:25:45
|
On Sun, 2004-09-05 at 16:18 -0400, Patrick McFarland wrote: > On Sun, 05 Sep 2004 13:40:54 -0400, Michel D=C3=A4nzer <mi...@da...= t> wrote: > > On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote: > > > On Sun, 05 Sep 2004 02:34:59 -0400, Michel D=C3=A4nzer <michel@daenze= r.net> wrote: > > > > > > > > Where did you get r200_dri.so from? > > > > > > From the one that comes with the Deb X I mentioned above. > >=20 > > Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary > > snapshot from dri.sf.net. >=20 > That shouldn't matter, should it? The userland stuff should never lock > the machine up. In an ideal world... Feel free to track down the cause and add code to the DRM to prevent it. --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Patrick M. <dia...@gm...> - 2004-09-05 21:47:24
|
On Sun, 05 Sep 2004 16:25:00 -0400, Michel D=E4nzer <mi...@da...> wr= ote: > On Sun, 2004-09-05 at 16:18 -0400, Patrick McFarland wrote: > > That shouldn't matter, should it? The userland stuff should never lock > > the machine up. >=20 > In an ideal world... Feel free to track down the cause and add code to > the DRM to prevent it. I would, except, as many have noted before, even looking at the r200 driver requires years of therapy to get rid of the nightmares. So, yeah, I'll check to see if today's dri cvs snapshot works. If it doesn't, I'm not sure what to do. --=20 Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, w= e'd=20 all be running around in darkened rooms, munching magic pills and listening= to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Lee R. <rlr...@jo...> - 2004-09-06 00:14:43
|
On Sun, 2004-09-05 at 16:18, Patrick McFarland wrote: > On Sun, 05 Sep 2004 13:40:54 -0400, Michel Dänzer <mi...@da...> wrote: > > On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote: > > > On Sun, 05 Sep 2004 02:34:59 -0400, Michel Dänzer <mi...@da...> wrote: > > > > > > > > Where did you get r200_dri.so from? > > > > > > From the one that comes with the Deb X I mentioned above. > > > > Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary > > snapshot from dri.sf.net. > > That shouldn't matter, should it? The userland stuff should never lock > the machine up. > I'll test it anyhow, though. No, it shouldn't. Anything that directly accesses hardware belongs in the kernel. How to fix this is a pretty hot topic now. Lee |
From: Felix <fx...@gm...> - 2004-09-06 10:49:02
|
On Sun, 05 Sep 2004 20:14:43 -0400 Lee Revell <rlr...@jo...> wrote: > On Sun, 2004-09-05 at 16:18, Patrick McFarland wrote: [snip] > >=20 > > That shouldn't matter, should it? The userland stuff should never lock > > the machine up. > > I'll test it anyhow, though. >=20 > No, it shouldn't. Anything that directly accesses hardware belongs in > the kernel. How to fix this is a pretty hot topic now. That's not the whole truth. There are just too many ways to lock up those 3D chips. For instance I fixed a lockup in the r100 driver where the order in which state changing commands were sent to the hardware would cause a lockup. Each individual state changing command is perfectly valid. Finding all permutations that trigger a lockup would have been too much of a hassle and may not even have been true for all supported hardware out there. So we made the user-space driver emit state changing commands in a fixed order, which seems to work everywhere. Regars, Felix >=20 > Lee >=20 | Felix K=FChling <fx...@gm...> http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | |
From: Mike M. <che...@ya...> - 2004-09-07 06:34:44
|
--- Felix Kühling <fx...@gm...> wrote: > On Sun, 05 Sep 2004 20:14:43 -0400 > Lee Revell <rlr...@jo...> wrote: > > > On Sun, 2004-09-05 at 16:18, Patrick McFarland wrote: > [snip] > > > > > > That shouldn't matter, should it? The userland stuff should never > lock > > > the machine up. > > > I'll test it anyhow, though. > > > > No, it shouldn't. Anything that directly accesses hardware belongs in > > the kernel. How to fix this is a pretty hot topic now. > > That's not the whole truth. There are just too many ways to lock up > those 3D chips. For instance I fixed a lockup in the r100 driver where > the order in which state changing commands were sent to the hardware > would cause a lockup. Each individual state changing command is > perfectly valid. Finding all permutations that trigger a lockup would > have been too much of a hassle and may not even have been true for all > supported hardware out there. So we made the user-space driver emit > state changing commands in a fixed order, which seems to work > everywhere. > Dose the DRM varify that the cmds are in this order? Why not just have the DRM 'sort' the cmds? A simple bouble sort would have no more overhead then the check for correct order, but it would fix missordered cmd streams. Once this is done the statement holds true, userland stuff should never... > Regars, > Felix > > > > > Lee > > > > | Felix Kühling <fx...@gm...> http://fxk.de.vu | > | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_idP47&alloc_id808&op=click > -- > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel > __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail |
From: Patrick M. <dia...@gm...> - 2004-09-06 11:01:54
|
On Sun, 05 Sep 2004 20:14:43 -0400, Lee Revell <rlr...@jo...> wrote: > How to fix this is a pretty hot topic now. Yow, I didn't mean to cause such an upset. ;) Currently, the dri cvs snapshot for 20040905 doesn't compile with 2.6.8.1 for me (I've sent a bug report to the dri-devel mailing list about this) so Lee and Michel, you'll have to wait until tomorrow (or maybe even the day after that) to see how the test goes. I'm hoping it does work, this bug is pretty nasty imho. Who knew Quake could take an entire box out in under 10 seconds. ;) -- Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Michel <mi...@da...> - 2004-09-06 18:11:59
|
On Mon, 2004-09-06 at 07:01 -0400, Patrick McFarland wrote: > On Sun, 05 Sep 2004 20:14:43 -0400, Lee Revell <rlr...@jo...> wro= te: > > How to fix this is a pretty hot topic now. >=20 > Yow, I didn't mean to cause such an upset. ;) >=20 > Currently, the dri cvs snapshot for 20040905 doesn't compile with > 2.6.8.1 for me (I've sent > a bug report to the dri-devel mailing list about this) so Lee and > Michel, you'll have to wait > until tomorrow (or maybe even the day after that) to see how the test goe= s. You can test the r200_dri.so from the snapshot with the DRM from the kernel... --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Patrick M. <dia...@gm...> - 2004-09-07 09:07:48
|
On Mon, 06 Sep 2004 14:12:08 -0400, Michel D=E4nzer <mi...@da...> wr= ote: > You can test the r200_dri.so from the snapshot with the DRM from the > kernel... And drum roll please... The dri cvs snapshot works fine on both it's own kernel module, and the one that comes with 2.6.8.1. So now what? (And does this mean it isn't a kernel bug?) <rant> Also, what happens to r200 users who happen to use Debian? Using dri cvs snapshots obviously isn't an option for everyone (though I don't mind at all) and upgrading to Xorg (when Xorg gets this fix if it doesn't already) is even less of an option. The official word from the Debian X Strike Force is not to switch to Xorg until debriX (modular X) gets somewhere. </rant> --=20 Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, w= e'd=20 all be running around in darkened rooms, munching magic pills and listening= to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Patrick M. <dia...@gm...> - 2004-09-07 09:09:31
|
On Tue, 7 Sep 2004 05:07:45 -0400, Patrick McFarland <dia...@gm...> wrote: > Lots of badly formatted text. I do apologize for anyone who had to read that. -- Patrick "Diablo-D3" McFarland || dia...@gm... "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 |
From: Alan C. <al...@lx...> - 2004-09-07 10:43:27
|
On Maw, 2004-09-07 at 10:07, Patrick McFarland wrote: > Also, what happens to r200 users who happen to use Debian? Using dri > cvs snapshots If Debian is currently shipping a buggy driver then Debian needs to ship a working driver. Same as anyone else. You'll also need the newest dri driver for Radeon IGP (most ATI chipset laptops) and the newer R2xx hardware. Alan |