From: Andreas S. <A.S...@gm...> - 2003-03-02 17:55:46
|
Hello! The radeon.o kernelmodule from the drm-filp-0-1-branch works well. xmms with different opengl-based visual-plugins works, even with vtxfmt enabled. I activated, deactivated the plugins often and nothing bad happend: no "-22" and no bad entrys in /var/log/messages. I got a segfault from xmms once, but no entry in the log. (maybe a vtxfmt-only related problem?) To the recycle-problem of the xserver: I was able to hang the box completely (no ping/ssh possible) even with the old radeon.o from dri-trunk (feb 2003 and mar.2003) and current dri-driver. Has someone tried to run xfree86 4.3.0 (with the mesa 4 based dri-driver) with the radeon.o from the drm-filp-0-1-branch? And does that show the same recycle-problem? How I triggered the recyle-problem with old radeon.o: start a multi-threaded opengl-app (xmms with vis-plugins) to trigger the "-22" close your session -> gdm in the xmessages-box you see, that gdm restarted the Xserver (maybe there was already a problem recycling the server...) login to a new session close session -> screen goes black, no ping, no ssh possible -> reset Since then I wasnt able to use the graphical login anymore, as everytime I logout from a session -> lockup I pulled the powercable, waited, plugged the cable, startet the box up again and tried without dri: Xserver recycles well! Any hints? Whats about the updated radeon-module in the -ac Kernel? best regards, Andreas |
From: Linus T. <tor...@tr...> - 2003-03-02 18:37:15
|
On Sun, 2 Mar 2003, Andreas Stenglein wrote: > > I pulled the powercable, waited, plugged the cable, > startet the box up again and tried without dri: > Xserver recycles well! I have apparently seen something like this even on 2.5.x. What kernels have you tried? The symptoms I saw were kernel oopses in totally unrelated pieces of code when re-starting the X server. The times I was able to reproduce it I could re-start a non-DRI X server several times (by just specifying "-depth 8"), but then when I restarted a DRI one it would cause "impossible" oopses (where "impossible" means that they were in totally normal code-paths in the kernel that had nothing to do with DRI, and looked like major internal data-corruption). I have my kernel DRM modules compiled in, and to me it really looked like something had free'd the resources on the first X session shutdown, but then left a pointer around to the free'd resources, so that when the second DRI session was started it used the long-since-free'd resources and obviously started corrupting things. But that's just a wild guess from the behaviour I saw (which was not entirely reproducible, btw - I recompiled my kernel with slab and spinlock debugging to try to catch it better, and I wasn't able to make it happen again). I don't actually have ay such code in DRI that I could really point to. In short: non-dri X setups seemed to work well, even with a setup that did seem to be able to reproduce the problem reliably. So it really looked like something DRI did. The first X startup had no problems, which means that this can have been going on for a long time as far as I'm concerned (I usually don't cycle out from X: I don't use XDM, and I reboot the kernel more often than I have reason to exit X ;) The _second_ DRI-enabled X startup caused problems, even if I had done multiple non-DRI X sessions in between. This is what makes me think that the DRI kernel modules keep some history around that they shouldn't. And maybe the problem is hidden if you actually unload and re-load the modules (is that what most people do?) Linus |
From: Linus T. <tor...@tr...> - 2003-03-02 18:59:21
|
On Sun, 2 Mar 2003, Linus Torvalds wrote: > > The _second_ DRI-enabled X startup caused problems, even if I had done > multiple non-DRI X sessions in between. This is what makes me think that > the DRI kernel modules keep some history around that they shouldn't. And > maybe the problem is hidden if you actually unload and re-load the > modules (is that what most people do?) Ok, I went in and looked for suspicious behaviour, and I found some. Look at AGP and MTRR behaviour: both of them are initialized by drm_init() at module load time. Both of them are _de-initialized_ by the "DRM(takedown)()" code, and never re-initialized by the "DRM(setup)()" code. So an example of badness would be: - load DRM modules (in my case as part of kernel bootup, since they are compiled in): - initialize MTRR and AGP mappings - run X with DRI. - Everything is happy. - exit DRI X - we are the "last close" case for DRI, so DRM(release)() calls DRM(takedown)(), which frees AGP and MTRR - restart non-DRI X - nothing happens - kill non-DRI X - nothing happens - run X with DRI again - oops. We now have neither AGP nor MTRR's set up, even though the code looks like it is assuming it. Yeah, maybe I'm missing where somebody else re-initializes AGP and MTRR, but my point is that these things do not seem to nest correctly. That mtrr_del() in particular seems to be wrong, and I do indeed get a mtrr: MTRR 2 not used when shutting down X normally. Comments? I haven't really gone through the whole path of what happens at open()/release() time, and these are really nothing more than "that looks suspicious", maybe somebody who knows the code better than I can take a better look at it. Linus |
From: Jens O. <je...@tu...> - 2003-03-06 04:47:45
|
Linus Torvalds wrote: > On Sun, 2 Mar 2003, Linus Torvalds wrote: > >>The _second_ DRI-enabled X startup caused problems, even if I had done >>multiple non-DRI X sessions in between. This is what makes me think that >>the DRI kernel modules keep some history around that they shouldn't. And >>maybe the problem is hidden if you actually unload and re-load the >>modules (is that what most people do?) > > > Ok, I went in and looked for suspicious behaviour, and I found some. > > Look at AGP and MTRR behaviour: both of them are initialized by drm_init() > at module load time. > > Both of them are _de-initialized_ by the "DRM(takedown)()" code, and never > re-initialized by the "DRM(setup)()" code. > > So an example of badness would be: > > - load DRM modules (in my case as part of kernel bootup, since they are > compiled in): > > - initialize MTRR and AGP mappings > > - run X with DRI. > > - Everything is happy. > > - exit DRI X > > - we are the "last close" case for DRI, so DRM(release)() calls > DRM(takedown)(), which frees AGP and MTRR > > - restart non-DRI X > > - nothing happens > > - kill non-DRI X > > - nothing happens > > - run X with DRI again > > - oops. We now have neither AGP nor MTRR's set up, even though > the code looks like it is assuming it. > > Yeah, maybe I'm missing where somebody else re-initializes AGP and MTRR, > but my point is that these things do not seem to nest correctly. That > mtrr_del() in particular seems to be wrong, and I do indeed get a > > mtrr: MTRR 2 not used > > when shutting down X normally. > > Comments? I haven't really gone through the whole path of what happens at > open()/release() time, and these are really nothing more than "that looks > suspicious", maybe somebody who knows the code better than I can take a > better look at it. Linus, Sounds like you may be on to something. Server recycles are common with display managers. An easy way to test (and debug) this scenario would be to bring up a raw X server (no window manager, display manager or clients), then run glxinfo, xdpyinfo or start and stop any X client (this will cause the server to recycle when they exit...if they are the only client), and then you should be in a state where you can try running gears or some other simple 3D application on an X server that has recycled. -- /\ Jens Owen / \/\ _ je...@tu... / \ \ \ Steamboat Springs, Colorado |
From: Keith W. <ke...@tu...> - 2003-03-08 12:54:00
|
Linus Torvalds wrote: > On Sun, 2 Mar 2003, Linus Torvalds wrote: > >>The _second_ DRI-enabled X startup caused problems, even if I had done >>multiple non-DRI X sessions in between. This is what makes me think that >>the DRI kernel modules keep some history around that they shouldn't. And >>maybe the problem is hidden if you actually unload and re-load the >>modules (is that what most people do?) > > > Ok, I went in and looked for suspicious behaviour, and I found some. > > Look at AGP and MTRR behaviour: both of them are initialized by drm_init() > at module load time. > > Both of them are _de-initialized_ by the "DRM(takedown)()" code, and never > re-initialized by the "DRM(setup)()" code. > > So an example of badness would be: > > - load DRM modules (in my case as part of kernel bootup, since they are > compiled in): > > - initialize MTRR and AGP mappings > > - run X with DRI. > > - Everything is happy. > > - exit DRI X > > - we are the "last close" case for DRI, so DRM(release)() calls > DRM(takedown)(), which frees AGP and MTRR > > - restart non-DRI X > > - nothing happens > > - kill non-DRI X > > - nothing happens > > - run X with DRI again > > - oops. We now have neither AGP nor MTRR's set up, even though > the code looks like it is assuming it. > > Yeah, maybe I'm missing where somebody else re-initializes AGP and MTRR, > but my point is that these things do not seem to nest correctly. That > mtrr_del() in particular seems to be wrong, and I do indeed get a > > mtrr: MTRR 2 not used > > when shutting down X normally. > > Comments? I haven't really gone through the whole path of what happens at > open()/release() time, and these are really nothing more than "that looks > suspicious", maybe somebody who knows the code better than I can take a > better look at it. Yes it looks suspicious, but I don't think it's the cause of the lockups on X recycle. Evidence for this: - The lockup is new, while the code has been suspicious forever... - I can exit and restart X just fine, it's only recycle that locks. From the kernel point of view, these should be the same. - In the Mesa embedded branch, I have a demo that closes & reopens its connections to the kernel without exiting. Again this works fine. I've also verified that this lockup wasn't introduced in the filp work, ie. it had already sneaked into the trunk somehow. More & more I want to clean up the drm_*.h files. Starting by removing code that isn't widely used... At this time my eyes turn towards the gamma driver, which is the hook a lot of the bogus code in those files hangs on -- does *anyone* use this in the current tree? Keith |
From: Andreas S. <A.S...@gm...> - 2003-03-08 14:31:24
|
Hello, It looks like there is different behavior if you are using builtin radeon (and agpgart) instead of using modules radeon.o and agpgart.o: If I start X from command-line, exit session, startx again, X and DRI seems to work fine, at least there is no lockup. (radeon and agpgart as modul) As I understand Linus, he has a problem even in this case. I grepped /var/log/messages for mtrr, but found nothing (kernel 2.4.20 with v4l2) MY box only locks up when exiting a session if I am using a graphical login. (maybe it would be interesting to see what happens when using radeon.o modul and builtin agpgart, or radeon.o modul _without_ agpgart, or radeon compiled-in without agpgart) to the History of the "bug": I remember that I had to use gdm, because kdm "didnt work". That was about a year ago when I bought this box with the radeon7500. gdm just startet the Xserver again if it exited with an error. So "something" seems to be there for over a year (maybe since xfree 4.2.0 or even before?) but it "only" crashed the Xserver, not the whole box. And since that wasnt really hurting someone, nobody cared about it... best regards, Andreas Am 2003.03.08 13:31:15 +0100 schrieb(en) Keith Whitwell: > Linus Torvalds wrote: >> On Sun, 2 Mar 2003, Linus Torvalds wrote: >>=20 >>> The _second_ DRI-enabled X startup caused problems, even if I had=20 >>> done >>> multiple non-DRI X sessions in between. This is what makes me=20 >>> think that >>> the DRI kernel modules keep some history around that they=20 >>> shouldn't. And >>> maybe the problem is hidden if you actually unload and re-load the >>> modules (is that what most people do?) >>=20 >>=20 >> Ok, I went in and looked for suspicious behaviour, and I found=20 >> some. >>=20 >> Look at AGP and MTRR behaviour: both of them are initialized by=20 >> drm_init() at module load time. >>=20 >> Both of them are _de-initialized_ by the "DRM(takedown)()" code,=20 >> and never >> re-initialized by the "DRM(setup)()" code. >>=20 >> So an example of badness would be: >>=20 >> - load DRM modules (in my case as part of kernel bootup, since=20 >> they are compiled in): >>=20 >> - initialize MTRR and AGP mappings >>=20 >> - run X with DRI. >>=20 >> - Everything is happy. >>=20 >> - exit DRI X >>=20 >> - we are the "last close" case for DRI, so DRM(release)()=20 >> calls DRM(takedown)(), which frees AGP and MTRR >>=20 >> - restart non-DRI X >>=20 >> - nothing happens >>=20 >> - kill non-DRI X >>=20 >> - nothing happens >>=20 >> - run X with DRI again >>=20 >> - oops. We now have neither AGP nor MTRR's set up, even though >> the code looks like it is assuming it. >>=20 >> Yeah, maybe I'm missing where somebody else re-initializes AGP and=20 >> MTRR, but my point is that these things do not seem to nest=20 >> correctly. That mtrr_del() in particular seems to be wrong, and I=20 >> do indeed get a >>=20 >> mtrr: MTRR 2 not used >>=20 >> when shutting down X normally. >>=20 >> Comments? I haven't really gone through the whole path of what=20 >> happens at open()/release() time, and these are really nothing=20 >> more than "that looks suspicious", maybe somebody who knows the=20 >> code better than I can take a better look at it. >=20 > Yes it looks suspicious, but I don't think it's the cause of the=20 > lockups on X recycle. >=20 > Evidence for this: > - The lockup is new, while the code has been suspicious=20 > forever... > - I can exit and restart X just fine, it's only recycle that=20 > locks. From the kernel point of view, these should be the same. > - In the Mesa embedded branch, I have a demo that closes &=20 > reopens its connections to the kernel without exiting. Again this=20 > works fine. >=20 > I've also verified that this lockup wasn't introduced in the filp=20 > work, ie. it had already sneaked into the trunk somehow. =20 > More & more I want to clean up the drm_*.h files. Starting by=20 > removing code that isn't widely used... At this time my eyes turn=20 > towards the gamma driver, which is the hook a lot of the bogus code=20 > in those files hangs on -- does *anyone* use this in the current=20 > tree? >=20 > Keith |
From: Keith W. <ke...@tu...> - 2003-03-11 13:41:56
|
Keith Whitwell wrote: > Linus Torvalds wrote: > >> On Sun, 2 Mar 2003, Linus Torvalds wrote: >> >>> The _second_ DRI-enabled X startup caused problems, even if I had done >>> multiple non-DRI X sessions in between. This is what makes me think that >>> the DRI kernel modules keep some history around that they shouldn't. >>> And >>> maybe the problem is hidden if you actually unload and re-load the >>> modules (is that what most people do?) >> >> >> >> Ok, I went in and looked for suspicious behaviour, and I found some. >> >> Look at AGP and MTRR behaviour: both of them are initialized by >> drm_init() at module load time. >> >> Both of them are _de-initialized_ by the "DRM(takedown)()" code, and >> never >> re-initialized by the "DRM(setup)()" code. >> >> So an example of badness would be: >> >> - load DRM modules (in my case as part of kernel bootup, since they >> are compiled in): >> >> - initialize MTRR and AGP mappings >> >> - run X with DRI. >> >> - Everything is happy. >> >> - exit DRI X >> >> - we are the "last close" case for DRI, so DRM(release)() calls >> DRM(takedown)(), which frees AGP and MTRR >> >> - restart non-DRI X >> >> - nothing happens >> >> - kill non-DRI X >> >> - nothing happens >> >> - run X with DRI again >> >> - oops. We now have neither AGP nor MTRR's set up, even though >> the code looks like it is assuming it. >> >> Yeah, maybe I'm missing where somebody else re-initializes AGP and >> MTRR, but my point is that these things do not seem to nest >> correctly. That mtrr_del() in particular seems to be wrong, and I do >> indeed get a >> >> mtrr: MTRR 2 not used >> >> when shutting down X normally. >> >> Comments? I haven't really gone through the whole path of what happens >> at open()/release() time, and these are really nothing more than "that >> looks suspicious", maybe somebody who knows the code better than I can >> take a better look at it. > > > Yes it looks suspicious, but I don't think it's the cause of the lockups > on X recycle. > > Evidence for this: > - The lockup is new, while the code has been suspicious forever... > - I can exit and restart X just fine, it's only recycle that locks. > From the kernel point of view, these should be the same. > - In the Mesa embedded branch, I have a demo that closes & reopens > its connections to the kernel without exiting. Again this works fine. > > I've also verified that this lockup wasn't introduced in the filp work, > ie. it had already sneaked into the trunk somehow. OK, I've had some time to track this down. It comes down to the changes introduced to radeon_driver.c around 29 Oct last year. ---------------------------- revision 1.45 date: 2002/10/29 13:49:25; author: mdaenzer; state: Exp; lines: +30 -19 * preserve CRTC{,2}_OFFSET_CNTL in 2D driver to avoid bad effects when pageflipping after a mode switch * take current page into account in AdjustFrame(); writing the CRTC offset via the CP was probably a bad idea as this can happen asynchronously, reverted * take frame offset into account when flipping pages * handle CRTC2 as well for pageflipping (untested) * preserve GEN_INT_CNTL on mode switches to prevent interrupts from getting disabled ---------------------------- Michel, have you got time to look into why this is causing server recycles to hang? I can't reproduce it on all machines, so it's possible that your test box is one of the ones unaffected by this lockup -- if you can't reproduce, let me know... I'll poke around in the meantime & try & figure it out. Keith |
From: Michel <mi...@da...> - 2003-03-11 15:25:20
|
On Die, 2003-03-11 at 14:41, Keith Whitwell wrote: > Keith Whitwell wrote: > > > > Evidence for this: > > - The lockup is new, while the code has been suspicious forever... > > - I can exit and restart X just fine, it's only recycle that locks. > > From the kernel point of view, these should be the same. > > - In the Mesa embedded branch, I have a demo that closes & reopens > > its connections to the kernel without exiting. Again this works fine. > > > > I've also verified that this lockup wasn't introduced in the filp work, > > ie. it had already sneaked into the trunk somehow. > > OK, I've had some time to track this down. It comes down to the changes > introduced to radeon_driver.c around 29 Oct last year. > > > ---------------------------- > revision 1.45 > date: 2002/10/29 13:49:25; author: mdaenzer; state: Exp; lines: +30 -19 > * preserve CRTC{,2}_OFFSET_CNTL in 2D driver to avoid bad effects when > pageflipping after a mode switch > * take current page into account in AdjustFrame(); writing the CRTC offset > via the CP was probably a bad idea as this can happen asynchronously, > reverted > * take frame offset into account when flipping pages > * handle CRTC2 as well for pageflipping (untested) > * preserve GEN_INT_CNTL on mode switches to prevent interrupts from getting > disabled > ---------------------------- Whoops. I plead guilty. :\ > Michel, have you got time to look into why this is causing server recycles to > hang? Unfortunately not really right now... I'll try to, anyway. > I can't reproduce it on all machines, so it's possible that your test > box is one of the ones unaffected by this lockup -- if you can't reproduce, > let me know... I haven't been able to reproduce it yet, but then I haven't tried very hard (gdm defaults to restarting the server these days). -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Keith W. <ke...@tu...> - 2003-03-12 09:55:28
|
Keith Whitwell wrote: > Keith Whitwell wrote: > >> Linus Torvalds wrote: >> >>> On Sun, 2 Mar 2003, Linus Torvalds wrote: >>> >>>> The _second_ DRI-enabled X startup caused problems, even if I had done >>>> multiple non-DRI X sessions in between. This is what makes me think >>>> that >>>> the DRI kernel modules keep some history around that they >>>> shouldn't. And >>>> maybe the problem is hidden if you actually unload and re-load the >>>> modules (is that what most people do?) >>> >>> >>> >>> >>> Ok, I went in and looked for suspicious behaviour, and I found some. >>> >>> Look at AGP and MTRR behaviour: both of them are initialized by >>> drm_init() at module load time. >>> >>> Both of them are _de-initialized_ by the "DRM(takedown)()" code, and >>> never >>> re-initialized by the "DRM(setup)()" code. >>> >>> So an example of badness would be: >>> >>> - load DRM modules (in my case as part of kernel bootup, since they >>> are compiled in): >>> >>> - initialize MTRR and AGP mappings >>> >>> - run X with DRI. >>> >>> - Everything is happy. >>> >>> - exit DRI X >>> >>> - we are the "last close" case for DRI, so DRM(release)() calls >>> DRM(takedown)(), which frees AGP and MTRR >>> >>> - restart non-DRI X >>> >>> - nothing happens >>> >>> - kill non-DRI X >>> >>> - nothing happens >>> >>> - run X with DRI again >>> >>> - oops. We now have neither AGP nor MTRR's set up, even though >>> the code looks like it is assuming it. >>> >>> Yeah, maybe I'm missing where somebody else re-initializes AGP and >>> MTRR, but my point is that these things do not seem to nest >>> correctly. That mtrr_del() in particular seems to be wrong, and I do >>> indeed get a >>> >>> mtrr: MTRR 2 not used >>> >>> when shutting down X normally. >>> >>> Comments? I haven't really gone through the whole path of what >>> happens at open()/release() time, and these are really nothing more >>> than "that looks suspicious", maybe somebody who knows the code >>> better than I can take a better look at it. >> >> >> >> Yes it looks suspicious, but I don't think it's the cause of the >> lockups on X recycle. >> >> Evidence for this: >> - The lockup is new, while the code has been suspicious forever... >> - I can exit and restart X just fine, it's only recycle that >> locks. From the kernel point of view, these should be the same. >> - In the Mesa embedded branch, I have a demo that closes & reopens >> its connections to the kernel without exiting. Again this works fine. >> >> I've also verified that this lockup wasn't introduced in the filp >> work, ie. it had already sneaked into the trunk somehow. > > > OK, I've had some time to track this down. It comes down to the changes > introduced to radeon_driver.c around 29 Oct last year. > > > ---------------------------- > revision 1.45 > date: 2002/10/29 13:49:25; author: mdaenzer; state: Exp; lines: +30 -19 > * preserve CRTC{,2}_OFFSET_CNTL in 2D driver to avoid bad effects when > pageflipping after a mode switch > * take current page into account in AdjustFrame(); writing the CRTC offset > via the CP was probably a bad idea as this can happen asynchronously, > reverted > * take frame offset into account when flipping pages > * handle CRTC2 as well for pageflipping (untested) > * preserve GEN_INT_CNTL on mode switches to prevent interrupts from getting > disabled > ---------------------------- > > Michel, have you got time to look into why this is causing server > recycles to hang? I can't reproduce it on all machines, so it's > possible that your test box is one of the ones unaffected by this lockup > -- if you can't reproduce, let me know... > > I'll poke around in the meantime & try & figure it out. In fact the lockup comes down to this one line: --- radeon_driver.c 28 Oct 2002 02:21:14 -0000 1.44 +++ radeon_driver.c 29 Oct 2002 13:49:25 -0000 1.45 @@ -4639,6 +4639,7 @@ save->cap0_trig_cntl = 0; save->cap1_trig_cntl = 0; save->bus_cntl = info->BusCntl; + save->gen_int_cntl = info->gen_int_cntl; /* * If bursts are enabled, turn on discards * Radeon doesn't have write bursts Michel, what are the consequences of removing this? Keith |
From: Keith W. <ke...@tu...> - 2003-03-12 10:51:24
Attachments:
l4.diff
|
> > In fact the lockup comes down to this one line: > > > --- radeon_driver.c 28 Oct 2002 02:21:14 -0000 1.44 > +++ radeon_driver.c 29 Oct 2002 13:49:25 -0000 1.45 > @@ -4639,6 +4639,7 @@ > save->cap0_trig_cntl = 0; > save->cap1_trig_cntl = 0; > save->bus_cntl = info->BusCntl; > + save->gen_int_cntl = info->gen_int_cntl; > /* > * If bursts are enabled, turn on discards > * Radeon doesn't have write bursts > > > Michel, what are the consequences of removing this? Hmm. Things are slightly compilcated by the fact that this code has been reworked since this change was made. To get rid of the lockup on the dri trunk I have to use the attached patch. It's a little heavy handed... Anyone have any better ideas? Otherwise I'm going to commit this here as it at least it resolves the lockup. Keith |
From: Michel <mi...@da...> - 2003-03-12 11:07:44
|
On Mit, 2003-03-12 at 11:51, Keith Whitwell wrote: > > > > In fact the lockup comes down to this one line: > > > > > > --- radeon_driver.c 28 Oct 2002 02:21:14 -0000 1.44 > > +++ radeon_driver.c 29 Oct 2002 13:49:25 -0000 1.45 > > @@ -4639,6 +4639,7 @@ > > save->cap0_trig_cntl = 0; > > save->cap1_trig_cntl = 0; > > save->bus_cntl = info->BusCntl; > > + save->gen_int_cntl = info->gen_int_cntl; > > /* > > * If bursts are enabled, turn on discards > > * Radeon doesn't have write bursts > > > > > > Michel, what are the consequences of removing this? > > Hmm. Things are slightly compilcated by the fact that this code has been > reworked since this change was made. To get rid of the lockup on the dri > trunk I have to use the attached patch. It's a little heavy handed... It basically disables interrupts AFAICS. -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Keith W. <ke...@tu...> - 2003-03-12 11:29:35
|
Michel D=E4nzer wrote: > On Mit, 2003-03-12 at 11:51, Keith Whitwell wrote: >=20 >>>In fact the lockup comes down to this one line: >>> >>> >>>--- radeon_driver.c 28 Oct 2002 02:21:14 -0000 1.44 >>>+++ radeon_driver.c 29 Oct 2002 13:49:25 -0000 1.45 >>>@@ -4639,6 +4639,7 @@ >>> save->cap0_trig_cntl =3D 0; >>> save->cap1_trig_cntl =3D 0; >>> save->bus_cntl =3D info->BusCntl; >>>+ save->gen_int_cntl =3D info->gen_int_cntl; >>> /* >>> * If bursts are enabled, turn on discards >>> * Radeon doesn't have write bursts >>> >>> >>>Michel, what are the consequences of removing this? >> >>Hmm. Things are slightly compilcated by the fact that this code has be= en=20 >>reworked since this change was made. To get rid of the lockup on the d= ri=20 >>trunk I have to use the attached patch. It's a little heavy handed... >=20 >=20 > It basically disables interrupts AFAICS. No - they still seem to work, which makes sense as they are turned on in = the=20 drm module (which also writes to RADEON_GEN_INT_CONTROL). Keith |
From: Michel <mi...@da...> - 2003-03-12 13:10:52
Attachments:
radeon_dri.c.diff
|
On Mit, 2003-03-12 at 12:29, Keith Whitwell wrote: > Michel Dänzer wrote: > > On Mit, 2003-03-12 at 11:51, Keith Whitwell wrote: > > > >>>In fact the lockup comes down to this one line: > >>> > >>> > >>>--- radeon_driver.c 28 Oct 2002 02:21:14 -0000 1.44 > >>>+++ radeon_driver.c 29 Oct 2002 13:49:25 -0000 1.45 > >>>@@ -4639,6 +4639,7 @@ > >>> save->cap0_trig_cntl = 0; > >>> save->cap1_trig_cntl = 0; > >>> save->bus_cntl = info->BusCntl; > >>>+ save->gen_int_cntl = info->gen_int_cntl; > >>> /* > >>> * If bursts are enabled, turn on discards > >>> * Radeon doesn't have write bursts > >>> > >>> > >>>Michel, what are the consequences of removing this? > >> > >>Hmm. Things are slightly compilcated by the fact that this code has been > >>reworked since this change was made. To get rid of the lockup on the dri > >>trunk I have to use the attached patch. It's a little heavy handed... > > > > > > It basically disables interrupts AFAICS. > > No - they still seem to work, which makes sense as they are turned on in the > drm module (which also writes to RADEON_GEN_INT_CONTROL). But they stop working when you switch modes, don't they? Does this patch (against the XFree86 trunk) help instead? -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Keith W. <ke...@tu...> - 2003-03-12 13:53:07
|
Michel D=E4nzer wrote: > On Mit, 2003-03-12 at 12:29, Keith Whitwell wrote: >=20 >>Michel D=E4nzer wrote: >> >>>On Mit, 2003-03-12 at 11:51, Keith Whitwell wrote: >>> >>> >>>>>In fact the lockup comes down to this one line: >>>>> >>>>> >>>>>--- radeon_driver.c 28 Oct 2002 02:21:14 -0000 1.44 >>>>>+++ radeon_driver.c 29 Oct 2002 13:49:25 -0000 1.45 >>>>>@@ -4639,6 +4639,7 @@ >>>>> save->cap0_trig_cntl =3D 0; >>>>> save->cap1_trig_cntl =3D 0; >>>>> save->bus_cntl =3D info->BusCntl; >>>>>+ save->gen_int_cntl =3D info->gen_int_cntl; >>>>> /* >>>>> * If bursts are enabled, turn on discards >>>>> * Radeon doesn't have write bursts >>>>> >>>>> >>>>>Michel, what are the consequences of removing this? >>>> >>>>Hmm. Things are slightly compilcated by the fact that this code has = been=20 >>>>reworked since this change was made. To get rid of the lockup on the= dri=20 >>>>trunk I have to use the attached patch. It's a little heavy handed... >>> >>> >>>It basically disables interrupts AFAICS. >> >>No - they still seem to work, which makes sense as they are turned on i= n the=20 >>drm module (which also writes to RADEON_GEN_INT_CONTROL). >=20 >=20 > But they stop working when you switch modes, don't they? >=20 > Does this patch (against the XFree86 trunk) help instead? >=20 >=20 >=20 >=20 > -----------------------------------------------------------------------= - >=20 > Index: programs/Xserver/hw/xfree86/drivers/ati/radeon_dri.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: /cvs/xc/programs/Xserver/hw/xfree86/drivers/ati/radeon_dri.c,= v > retrieving revision 1.32 > diff -p -u -r1.32 radeon_dri.c > --- programs/Xserver/hw/xfree86/drivers/ati/radeon_dri.c 2003/02/19 09:= 17:30 1.32 > +++ programs/Xserver/hw/xfree86/drivers/ati/radeon_dri.c 2003/03/12 13:= 08:30 > @@ -1585,6 +1585,7 @@ void RADEONDRICloseScreen(ScreenPtr pScr > if (info->irq) { > drmCtlUninstHandler(info->drmFD); > info->irq =3D 0; > + info->ModeReg.gen_int_cntl =3D 0; > } > =20 > /* De-allocate vertex buffers */ Yes, committed. Keith |
From: Michel <mi...@da...> - 2003-03-12 11:03:06
|
On Mit, 2003-03-12 at 10:55, Keith Whitwell wrote: > > In fact the lockup comes down to this one line: > > > --- radeon_driver.c 28 Oct 2002 02:21:14 -0000 1.44 > +++ radeon_driver.c 29 Oct 2002 13:49:25 -0000 1.45 > @@ -4639,6 +4639,7 @@ > save->cap0_trig_cntl = 0; > save->cap1_trig_cntl = 0; > save->bus_cntl = info->BusCntl; > + save->gen_int_cntl = info->gen_int_cntl; > /* > * If bursts are enabled, turn on discards > * Radeon doesn't have write bursts > > > Michel, what are the consequences of removing this? Well, the idea of this line is to preserve the interrupts the chip generates over mode changes. What does this get set to before the lockup? Something else than during the first server generation? Thanks for tracking this down Keith, I'd love to dive into this, but I'm supposed to be learning for the exam tomorrow. :( I'll hopefully find the time during the weekend at the latest. -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Keith W. <ke...@tu...> - 2003-03-12 13:57:00
|
OK, now that the recycle lockup has been found & fixed, I don't see any problems with this patch. Has anyone got any objections to merging it to the trunk? Eric, do you think this will be impossible to support on bsd? It seems to fix some fundamental braino's in the orignal drm... Keith |
From: Charl P. B. <c.p...@it...> - 2003-03-12 16:47:31
|
On Wed, Mar 12, 2003 at 01:57:03PM +0000, Keith Whitwell wrote: > OK, now that the recycle lockup has been found & fixed, I don't see any > problems with this patch. Has anyone got any objections to merging it to > the trunk? FW(L)IW, you have my vote. As mentioned earlier, your filp work fixes all kinds of nasty problems on my setup. -- charl p. botha http://cpbotha.net/ http://visualisation.tudelft.nl/ |
From: Eric A. <et...@lc...> - 2003-03-13 00:55:09
|
On Wed, 2003-03-12 at 05:57, Keith Whitwell wrote: > OK, now that the recycle lockup has been found & fixed, I don't see any > problems with this patch. Has anyone got any objections to merging it to the > trunk? > > Eric, do you think this will be impossible to support on bsd? It seems to fix > some fundamental braino's in the orignal drm... It's the second thing on my list, after some mopup after the XFree86 4.3.0 update. I haven't looked at it seriously enough yet. Please give me a few days to get it at least building again on BSD. -- Eric Anholt et...@lc... http://people.freebsd.org/~anholt/ anholt@FreeBSD.org |
From: Keith W. <ke...@tu...> - 2003-03-13 08:51:22
|
Eric Anholt wrote: > On Wed, 2003-03-12 at 05:57, Keith Whitwell wrote: > >>OK, now that the recycle lockup has been found & fixed, I don't see any >>problems with this patch. Has anyone got any objections to merging it to the >>trunk? >> >>Eric, do you think this will be impossible to support on bsd? It seems to fix >>some fundamental braino's in the orignal drm... > > > It's the second thing on my list, after some mopup after the XFree86 > 4.3.0 update. I haven't looked at it seriously enough yet. Please give > me a few days to get it at least building again on BSD. OK, No worries. Keith |
From: Eric A. <et...@lc...> - 2003-03-14 05:36:41
|
On Thu, 2003-03-13 at 00:51, Keith Whitwell wrote: > Eric Anholt wrote: > > On Wed, 2003-03-12 at 05:57, Keith Whitwell wrote: > > > >>OK, now that the recycle lockup has been found & fixed, I don't see any > >>problems with this patch. Has anyone got any objections to merging it to the > >>trunk? > >> > >>Eric, do you think this will be impossible to support on bsd? It seems to fix > >>some fundamental braino's in the orignal drm... > > > > > > It's the second thing on my list, after some mopup after the XFree86 > > 4.3.0 update. I haven't looked at it seriously enough yet. Please give > > me a few days to get it at least building again on BSD. > > OK, No worries. Okay, it's possible on FreeBSD, and will also clean up some other ugliness in the BSD DRM. However, it's going to require some significant work. Haven't looked at NetBSD yet. -- Eric Anholt et...@lc... http://people.freebsd.org/~anholt/ anholt@FreeBSD.org |
From: Felix <fx...@gm...> - 2003-03-02 19:10:08
|
On Sun, 2 Mar 2003 10:34:35 -0800 (PST) Linus Torvalds <tor...@tr...> wrote: > > On Sun, 2 Mar 2003, Andreas Stenglein wrote: > > > > I pulled the powercable, waited, plugged the cable, > > startet the box up again and tried without dri: > > Xserver recycles well! > > I have apparently seen something like this even on 2.5.x. What kernels > have you tried? > > The symptoms I saw were kernel oopses in totally unrelated pieces of code > when re-starting the X server. The times I was able to reproduce it I > could re-start a non-DRI X server several times (by just specifying > "-depth 8"), but then when I restarted a DRI one it would cause > "impossible" oopses (where "impossible" means that they were in totally > normal code-paths in the kernel that had nothing to do with DRI, and > looked like major internal data-corruption). > > I have my kernel DRM modules compiled in, and to me it really looked like > something had free'd the resources on the first X session shutdown, but > then left a pointer around to the free'd resources, so that when the > second DRI session was started it used the long-since-free'd resources and > obviously started corrupting things. > > But that's just a wild guess from the behaviour I saw (which was not > entirely reproducible, btw - I recompiled my kernel with slab and spinlock > debugging to try to catch it better, and I wasn't able to make it happen > again). I don't actually have ay such code in DRI that I could really > point to. > > In short: non-dri X setups seemed to work well, even with a setup that did > seem to be able to reproduce the problem reliably. So it really looked > like something DRI did. > > The first X startup had no problems, which means that this can have been > going on for a long time as far as I'm concerned (I usually don't cycle > out from X: I don't use XDM, and I reboot the kernel more often than I > have reason to exit X ;) > > The _second_ DRI-enabled X startup caused problems, even if I had done > multiple non-DRI X sessions in between. This is what makes me think that > the DRI kernel modules keep some history around that they shouldn't. And > maybe the problem is hidden if you actually unload and re-load the > modules (is that what most people do?) I have seen recycle problems as well, but only when using gdm. It never crashed when restarting X via startx several times. Anyway, that experience is 2 months old as I had some important stuff running lately so I didn't want to risk crashing my box. I was using 2.4.19 from kernel.org with preempt patch and drm kernel modules from DRI CVS head at the time the crashes happened. The first time I saw this problem was shortly after the big merge from XFree86 CVS (4.2.99.2). It was never completely reproducible. Sometimes logging out to gdm crashed the box, sometimes it worked fine. It happened several times in December, that I thought, the problem had gone away, and just then - boom! - it crashed again :-/ > > Linus ------------ __\|/__ ___ ___ ------------------------- Felix ___\_e -_/___/ __\___/ __\_____ You can do anything, Kühling (_____\Ä/____/ /_____/ /________) just not everything fx...@gm... \___/ \___/ U at the same time. |
From: Andreas S. <A.S...@gm...> - 2003-03-02 20:08:58
|
Am 2003.03.02 19:34:35 +0100 schrieb(en) Linus Torvalds: >=20 > On Sun, 2 Mar 2003, Andreas Stenglein wrote: > > > > I pulled the powercable, waited, plugged the cable, > > startet the box up again and tried without dri: > > Xserver recycles well! >=20 > I have apparently seen something like this even on 2.5.x. What > kernels > have you tried? >=20 I used a 2.4.20 kernel with v4l2 patches from kraxel (bytesex.org) A workaround to the recycle-problem "works": start only to runlevel 3 (without graphical login), and start the Xserver manually: startx when logging off from the X session, you are back on the console. no lockup. startx again... and glxgears works hw-accelerated again best regards, Andreas |