From: Nick L. <nj...@ec...> - 2005-04-15 02:20:30
|
I'm trying to diagnose and hopefully fix a recurring hang with my 9200SE (details below) and all recent DRM/DRI drivers on my desktop Linux system. I have a Duron 700 on Gigabyte 7ZM (via based) motherboard, configured for AGP 1x and with e.g. current Fedora Core 3 kernel (2.6.11) and X.org (6.8.2) or with recent DRI snapshots the graphics will eventually (minutes, hours rarely longer) hang. From a remote machine in another room I can see that the process (when playing a game this will be the game process, if just scooting around it will be X) is spinning at 100% CPU. Typically the last process using the card, if it is killed, will spin in System time, uninterruptible forever. When this happens the machine must be rebooted... Some use of the debug=1 module parameter reduced performance, and sadly made the hang much less frequent, but suggested that somehow the card itself is "stuck" and DRM is waiting for it to become idle, which never happens. The 100% CPU is consumed repeatedly calling the same ioctl(), a getparam looking for the last frame # iirc, and the result never changes. As an experiment I tried writing a mini-client which opens /dev/dri/card0 and uses some of the dangerous-looking ioctl()s to try to "fix" a card in this hung state. Most of the experiments were failures, but I found that DRM_IOCTL_RADEON_CP_RESUME (used to restart a card after sw suspend) kicks it back into life enough that I can often continue with a fresh X session but without rebooting. Does the above symptom mean anything to anyone? Any suggestions where to look or what to try next? Presumably DRM_IOCTL_RADEON_CP_RESUME is a very rude thing to do while the machine is running, but it seems to help, does that provide any clues? Nick. |
From: Dave A. <ai...@li...> - 2005-04-15 04:29:39
|
> I'm trying to diagnose and hopefully fix a recurring hang with my 9200SE > (details below) and all recent DRM/DRI drivers on my desktop Linux system. > > I have a Duron 700 on Gigabyte 7ZM (via based) motherboard, configured for > AGP 1x and with e.g. current Fedora Core 3 kernel (2.6.11) and X.org (6.8.2) > or with recent DRI snapshots the graphics will eventually (minutes, hours > rarely longer) hang. I've been seeing these but I've no idea when it started, I get hangs on 8500LE and 9200 with CVS everything, I think turning Render accel off makes it happen less often but doesn't remove it completely... I just don't have the bandwidth to debug these at the moment (debugging these can take days of staring at debug dumps...), > Does the above symptom mean anything to anyone? Any suggestions where to > look or what to try next? Presumably DRM_IOCTL_RADEON_CP_RESUME is a very > rude thing to do while the machine is running, but it seems to help, does > that provide any clues? No it just restarts the card pretty much,, it might be nice to detect the hang and do something like that as it can't make things any worse.. Dave. > > Nick. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > -- > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel > -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie Linux kernel - DRI, VAX / pam_smb / ILUG |
From: khaqq <kh...@fr...> - 2005-04-16 11:08:52
|
On Fri, 15 Apr 2005 05:29:35 +0100 (IST) Dave Airlie <ai...@li...> wrote: > > > I'm trying to diagnose and hopefully fix a recurring hang with my 9200SE > > (details below) and all recent DRM/DRI drivers on my desktop Linux system. > > > > I have a Duron 700 on Gigabyte 7ZM (via based) motherboard, configured for > > AGP 1x and with e.g. current Fedora Core 3 kernel (2.6.11) and X.org (6.8.2) > > or with recent DRI snapshots the graphics will eventually (minutes, hours > > rarely longer) hang. > > I've been seeing these but I've no idea when it started, I get hangs on > 8500LE and 9200 with CVS everything, I think turning Render accel off > makes it happen less often but doesn't remove it completely... I just > don't have the bandwidth to debug these at the moment (debugging these can > take days of staring at debug dumps...), I had lockups a few months ago with my FireGL8800 with X/DRI, I gave up trying to find a stable X/DRI combination after trying 3 X versions and about 20 DRI snapshots. I'm *not* saying DRI isn't good, it used to work perfectly on my 7500. But the R200 was never stable here (I've had it for ~6 months now). I'm using ATI-drivers atm and I've not seen a single crash with those. If debug logs are interesting to you, or to any X/DRI developpers, I'll go back to DRI and try to generate them if you tell me how. I'm happy to say my PC doesn't seem to have any hardware problem (as it was suggested on dri-devel) since going to ati-drivers solved *all* the stability issues I had with X+DRI (and before switching from the 7500 to the 8800, it was stable ; and it was still stable in 3D apps in windows (yuck) after the switch). Maybe if all users seeing crashes generated a debug log and put it online for all DRI developpers to see, finding a common denominator would be easier... |
From: Geller S. <wi...@pe...> - 2005-04-18 12:45:25
|
On Sat, 16 Apr 2005, khaqq wrote: > I had lockups a few months ago with my FireGL8800 with X/DRI, I gave up trying > to find a stable X/DRI combination after trying 3 X versions and about 20 DRI > snapshots. I'm *not* saying DRI isn't good, it used to work perfectly on my 7500. > But the R200 was never stable here (I've had it for ~6 months now). I have to agree with you. I complained about the r200 driver some months ago. I was told to try to track down the time when the instability of the r200 driver started (some develorers suggested the end of September, 2004).Unfortunately, I wasn't able to compile older CVS snapshots on my debian sid system. It's very strange, I don't think that X.org CVS was broken for months... (I tried compiling snapshots from 2004.08.31 to 2004.11.30 without any success...) > I'm using ATI-drivers atm and I've not seen a single crash with those. If debug logs are > interesting to you, or to any X/DRI developpers, I'll go back to DRI and try to > generate them if you tell me how. I'm happy to say my PC doesn't seem to have > any hardware problem (as it was suggested on dri-devel) since going to ati-drivers > solved *all* the stability issues I had with X+DRI (and before switching from > the 7500 to the 8800, it was stable ; and it was still stable in 3D apps in windows > (yuck) after the switch). > Maybe if all users seeing crashes generated a debug log and put it online for all DRI > developpers to see, finding a common denominator would be easier... I use Descent3 for testing. With recent snapshots it's getting harder to crash the game. Some months ago a simple X restart was enough to crash Descent3 (and of course the X server) in a few seconds. Nowadays I have to switch to the microwave gun and fire some dozen waves to crash X. It's 100% reproducible, so I offered my help to create gdb backtraces, but one of the DRI developers (Michael Daenzer, if I remember correctly) pointed out that gdb backtraces won't help diagnosing GPU lockups. Geller Sandor <wi...@pe...> |
From: Michel <mi...@da...> - 2005-04-18 16:09:34
|
On Mon, 2005-18-04 at 14:45 +0200, Geller Sandor wrote: > On Sat, 16 Apr 2005, khaqq wrote: >=20 > > I had lockups a few months ago with my FireGL8800 with X/DRI, I gave up= trying > > to find a stable X/DRI combination after trying 3 X versions and about = 20 DRI > > snapshots. I'm *not* saying DRI isn't good, it used to work perfectly o= n my 7500. > > But the R200 was never stable here (I've had it for ~6 months now). >=20 > I have to agree with you. I complained about the r200 driver some months > ago. I was told to try to track down the time when the instability of the > r200 driver started (some develorers suggested the end of > September, 2004).Unfortunately, I wasn't able to compile older CVS > snapshots on my debian sid system. It's very strange, I don't think that > X.org CVS was broken for months... (I tried compiling snapshots from > 2004.08.31 to 2004.11.30 without any success...) Did you post the problems you encountered? > > I'm using ATI-drivers atm and I've not seen a single crash with those. = If debug logs are > > interesting to you, or to any X/DRI developpers, I'll go back to DRI an= d try to > > generate them if you tell me how. I'm happy to say my PC doesn't seem t= o have > > any hardware problem (as it was suggested on dri-devel) since going to = ati-drivers > > solved *all* the stability issues I had with X+DRI (and before switchi= ng from > > the 7500 to the 8800, it was stable ; and it was still stable in 3D app= s in windows > > (yuck) after the switch). > > Maybe if all users seeing crashes generated a debug log and put it onli= ne for all DRI > > developpers to see, finding a common denominator would be easier... >=20 > I use Descent3 for testing. With recent snapshots it's getting harder to > crash the game. Some months ago a simple X restart was enough to crash > Descent3 (and of course the X server) in a few seconds.=20 Define 'a simple X restart'. Do you mean running Descent 3 right after restarting the X server? > Nowadays I have to switch to the microwave gun and fire some dozen waves=20 > to crash X. It's 100% reproducible, so I offered my help to create gdb=20 > backtraces, but one of the DRI developers (Michael Daenzer, if I remember= =20 > correctly) pointed out that gdb backtraces won't help diagnosing GPU lock= ups. Yeah, but maybe he was thinking of DRM debugging output or something. That can be useful, but is still tedious to wade through in the best case. --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Geller S. <wi...@pe...> - 2005-04-18 16:44:30
|
On Mon, 18 Apr 2005, Michel [ISO-8859-1] D=E4nzer wrote: > Did you post the problems you encountered? Yes, on Mon, 14 Feb 2005. You were one of the recipients :)) There was a thread 'OpenGL apps causes frequent system locks' on dri-devel. > Define 'a simple X restart'. Do you mean running Descent 3 right after > restarting the X server? Issued startx, exited X. Issued startx again, started D3, played for a while. I wrote my step-by-step test in one of the mails I sent to this thread. > > Nowadays I have to switch to the microwave gun and fire some dozen wave= s > > to crash X. It's 100% reproducible, so I offered my help to create gdb > > backtraces, but one of the DRI developers (Michael Daenzer, if I rememb= er > > correctly) pointed out that gdb backtraces won't help diagnosing GPU lo= ckups. > > Yeah, but maybe he was thinking of DRM debugging output or something. > That can be useful, but is still tedious to wade through in the best > case. Nice to see that you refer to yourself in 3rd person :)) If I can help with backtraces/ debug info, don't hesitate, tell me, what kind of information are you interested in! Geller Sandor <wi...@pe...> |
From: khaqq <kh...@fr...> - 2005-04-18 17:00:42
|
On Mon, 18 Apr 2005 18:44:17 +0200 (CEST) Geller Sandor <wi...@pe...> wrote: > On Mon, 18 Apr 2005, Michel [ISO-8859-1] D=E4nzer wrote: >=20 > > Did you post the problems you encountered? >=20 > Yes, on Mon, 14 Feb 2005. You were one of the recipients :)) There was a > thread 'OpenGL apps causes frequent system locks' on dri-devel. >=20 > > Define 'a simple X restart'. Do you mean running Descent 3 right after > > restarting the X server? >=20 > Issued startx, exited X. Issued startx again, started D3, played for a > while. I wrote my step-by-step test in one of the mails I sent to this > thread. >=20 > > > Nowadays I have to switch to the microwave gun and fire some dozen wa= ves > > > to crash X. It's 100% reproducible, so I offered my help to create gdb > > > backtraces, but one of the DRI developers (Michael Daenzer, if I reme= mber > > > correctly) pointed out that gdb backtraces won't help diagnosing GPU = lockups. > > > > Yeah, but maybe he was thinking of DRM debugging output or something. > > That can be useful, but is still tedious to wade through in the best > > case. >=20 > Nice to see that you refer to yourself in 3rd person :)) If I can help > with backtraces/ debug info, don't hesitate, tell me, what kind > of information are you interested in! Same here, really. |
From: Michel <mi...@da...> - 2005-04-18 18:55:50
|
On Mon, 2005-18-04 at 18:44 +0200, Geller Sandor wrote: > On Mon, 18 Apr 2005, Michel [ISO-8859-1] D=C3=A4nzer wrote: >=20 > > Did you post the problems you encountered? >=20 > Yes, on Mon, 14 Feb 2005. You were one of the recipients :)) There was a > thread 'OpenGL apps causes frequent system locks' on dri-devel. I mean the problems compiling older snapshots. > > Define 'a simple X restart'. Do you mean running Descent 3 right after > > restarting the X server? >=20 > Issued startx, exited X. Issued startx again, started D3, played for a > while.=20 But it doesn't happen if you only start the X server once? > I wrote my step-by-step test in one of the mails I sent to this > thread. I don't see that in your other post to *this* thread. Do you mean the thread you started months ago? > > > Nowadays I have to switch to the microwave gun and fire some dozen wa= ves > > > to crash X. It's 100% reproducible, so I offered my help to create gd= b > > > backtraces, but one of the DRI developers (Michael Daenzer, if I reme= mber > > > correctly) pointed out that gdb backtraces won't help diagnosing GPU = lockups. > > > > Yeah, but maybe he was thinking of DRM debugging output or something. > > That can be useful, but is still tedious to wade through in the best > > case. >=20 > Nice to see that you refer to yourself in 3rd person :))=20 No, I was referring to the part of the post that started this thread that I quoted just above what you quoted here. > If I can help with backtraces/ debug info, don't hesitate, tell me, what=20 > kind of information are you interested in! If you can reproduce the problem with DRM debugging output enabled, send the output, e.g. But don't put your hopes too high, this is highly non-trivial stuff. --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Geller S. <wi...@pe...> - 2005-04-19 06:43:13
|
On Mon, 18 Apr 2005, Michel [ISO-8859-1] D=E4nzer wrote: > On Mon, 2005-18-04 at 18:44 +0200, Geller Sandor wrote: > > On Mon, 18 Apr 2005, Michel [ISO-8859-1] D=C3=A4nzer wrote: > > > > > Did you post the problems you encountered? > > > > Yes, on Mon, 14 Feb 2005. You were one of the recipients :)) There was = a > > thread 'OpenGL apps causes frequent system locks' on dri-devel. > > I mean the problems compiling older snapshots. The tail of the build logs were attached to that mail. The usual 'parse error...' messages, which were caused IMHO by several missing headers. > > > Define 'a simple X restart'. Do you mean running Descent 3 right afte= r > > > restarting the X server? > > > > Issued startx, exited X. Issued startx again, started D3, played for a > > while. > > But it doesn't happen if you only start the X server once? Unfortunately, I can't remember, whether the X server was stable or it took only some minutes to freeze, that was 2 months ago. I remember that when I used Mesa CVS, then X was quite unstable. Without Mesa CVS it was stable until I restarted X. It rings about that the first X session seemed to be stable... > > I wrote my step-by-step test in one of the mails I sent to this > > thread. > > I don't see that in your other post to *this* thread. Do you mean the > thread you started months ago? Yes, I meant the other thread (it wasn't started by me). > If you can reproduce the problem with DRM debugging output enabled, send > the output, e.g. But don't put your hopes too hig OK, I will continue testing the CVS version this weekend. I'll report back the results. Geller Sandor <wi...@pe...> |
From: Geller S. <wi...@pe...> - 2005-04-25 14:00:48
Attachments:
kern.log.bz2
|
On Tue, 19 Apr 2005, Geller Sandor wrote: > > If you can reproduce the problem with DRM debugging output enabled, send > > the output, e.g. But don't put your hopes too hig > > OK, I will continue testing the CVS version this weekend. I'll report > back the results. Attached the debug output to this mail. Uncompressed size of the log is nearly 10MB... This crash happened while I was navigating the main menu in Descent 3. The last two message ( [drm:radeon_cp_cmdbuf] DONE [drm:drm_ioctl] pid=4875, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1 ) was repeated until the shutdown. Unfortunately, Descent3 still crashes, but it's getting harder to reproduce. I continued testing, and it took 10-15 minutes to crash the game second time. I haven't captured the log of this crash, because my machine can't handle the amount of data which drm generates... Maybe that was the cause of the first crash. Geller Sandor <wi...@pe...> |