From: Nicolai H. <pre...@gm...> - 2005-02-18 11:53:52
|
Hi everybody, As reported earlier, I had a perfectly repeatable lockup in VB mode that=20 always happened after the exact same number of frames in glxgears. I can't= =20 explain everything about the lockup, mostly because I still don't know what= =20 the two registers in the begin3d/end3d sequence actually mean, but here's=20 what I know: It turns out that after the first 4 DMA buffers had been used to completion= ,=20 r300FlushCmdBuf() was called from r300RefillCurrentDmaRegion(). This only=20 caused simple state setting commands as well as an upload of the current=20 vertex program into the VAP. There was no rendering going on, and neither=20 the begin3d nor the end3d sequence was part of the commands that were sent= =20 to the card. However for some reason, it was this sequence that caused the lockup. This leads me to believe that there's somehow more "magic" to the=20 begin3d/end3d sequence than just cache control as I originally assumed (or= =20 maybe it *is* cache control, but there's something weird going on in=20 connection with it, I simply don't know). In any case, what I did is *always* emit the begin3d sequence at the top of= =20 r300_do_cp_cmdbuf and end3d at the bottom of r300_do_cp_cmdbuf (it is also= =20 emitted in the case of an error). This works for me, I can run glxgears for= =20 several minutes, even doing some stuff that sometimes tends to produce=20 lockups without any problems. Please, everybody, get the latest CVS (anonymous will take some time to=20 catch up...) and test vertex buffer mode with it (go to r300_run_render()=20 in r300_render.c and change the #if so that r300_vb_run_render() is=20 called). I want to be really sure that this fixes it for other people as=20 well (after all, there may be other causes for lockups that haven't occured= =20 on my machine yet), and that there are no regressions for those who already= =20 had working VB mode. Once we can be fairly certain that VB mode is stable (i.e. crash and=20 lockup-free), let's talk about removing any mention of the begin3d and=20 end3d sequence from the userspace driver. This is really far too subtle an= =20 issue to allow userspace to mess with it. This counts for the X server as=20 well - if anybody feels like implementing Render acceleration, which I=20 doubt at this stage, please leave the begin3d/end3d handling to the kernel= =20 module, as it's the only instance that really knows what's going on. cu, Nicolai |
From: Ben S. <dar...@ii...> - 2005-02-18 14:50:59
|
Nicolai Haehnle wrote: >Hi everybody, > >As reported earlier, I had a perfectly repeatable lockup in VB mode that >always happened after the exact same number of frames in glxgears. I can't >explain everything about the lockup, mostly because I still don't know what >the two registers in the begin3d/end3d sequence actually mean, but here's >what I know: > > --snip-- Thankyou for the explanation! >Please, everybody, get the latest CVS (anonymous will take some time to >catch up...) and test vertex buffer mode with it (go to r300_run_render() >in r300_render.c and change the #if so that r300_vb_run_render() is >called). I want to be really sure that this fixes it for other people as >well (after all, there may be other causes for lockups that haven't occured >on my machine yet), and that there are no regressions for those who already >had working VB mode. > > I wasn't able to produce lockups before this patch (with an RV350 AP). I didn't notice any regressions with you're patch. The most stressful app that I can run is ut2004-demo, and didn't see any lockups after running around DM-Rankin for 20-30 minutes, I also ran glxgears,tuxracer and neverball briefly without problems. >Once we can be fairly certain that VB mode is stable (i.e. crash and >lockup-free), let's talk about removing any mention of the begin3d and >end3d sequence from the userspace driver. This is really far too subtle an >issue to allow userspace to mess with it. This counts for the X server as >well - if anybody feels like implementing Render acceleration, which I >doubt at this stage, please leave the begin3d/end3d handling to the kernel >module, as it's the only instance that really knows what's going on. > > I still have a 100% reproducable bug which I need to find the cause of, but time is once again a problem for me. If I move a window over the top of a glxgears window my machine locks up immediately, but sysrq still works fine. Hopefully I'll have time to take a look at this on Sunday. Cheers, Ben Skeggs. |
From: Ben S. <dar...@ii...> - 2005-02-18 14:56:02
|
Hello again. > I still have a 100% reproducable bug which I need to find the cause of, > but time is once again a problem for me. If I move a window over the top > of a glxgears window my machine locks up immediately, but sysrq still > works > fine. I just discovered (and should've checked before), that I can ssh in and successfuly kill glxgears, then X returns to normal. I can have a partially covered glxgears window and everything is fine, but as soon as the entire window (not incl. window decorations) is covered, it seems that the 2d driver is unable to update the screen. Ben Skeggs. |
From: Keith W. <kei...@ya...> - 2005-02-18 15:04:00
|
Ben Skeggs wrote: > Hello again. > >> I still have a 100% reproducable bug which I need to find the cause of, >> but time is once again a problem for me. If I move a window over the top >> of a glxgears window my machine locks up immediately, but sysrq still >> works >> fine. > > > I just discovered (and should've checked before), that I can ssh in and > successfuly > kill glxgears, then X returns to normal. I can have a partially covered > glxgears > window and everything is fine, but as soon as the entire window (not > incl. window > decorations) is covered, it seems that the 2d driver is unable to update > the screen. I think some of the other drivers do a 'sched_yeild()' or 'usleep(0)' in the zero cliprect case to get away from this sort of behaviour. Keith |
From: Nicolai H. <pre...@gm...> - 2005-02-18 16:50:06
|
On Friday 18 February 2005 16:03, Keith Whitwell wrote: > Ben Skeggs wrote: > >> I still have a 100% reproducable bug which I need to find the cause of, > >> but time is once again a problem for me. If I move a window over the= =20 top > >> of a glxgears window my machine locks up immediately, but sysrq still= =20 > >> works > >> fine. > >=20 > >=20 > > I just discovered (and should've checked before), that I can ssh in and= =20 > > successfuly > > kill glxgears, then X returns to normal. I can have a partially covere= d=20 > > glxgears > > window and everything is fine, but as soon as the entire window (not=20 > > incl. window > > decorations) is covered, it seems that the 2d driver is unable to updat= e=20 > > the screen. >=20 > I think some of the other drivers do a 'sched_yeild()' or 'usleep(0)' in= =20 > the zero cliprect case to get away from this sort of behaviour. Well, I can reproduce this bug and I tracked it down. There are a number of= =20 problems here, and they all have to do with DMA buffer accounting. The first (trivial) problem is that nr_released_bufs was never reset to 0.= =20 I've already fixed that in CVS. The real problem is that the following situation can occur when we have zer= o=20 cliprects: 1. The command buffer contains a DISCARD command for a DMA buffer. 2. We simply drop that command buffer because there are no cliprects, i.e.= =20 nothing can be drawn. 3. As a consequence, DMA buffers aren't freed. 4. The rendering loop continues even though DMA buffers have been leaked,=20 which eventually causes all DMA buffers to be exhausted, and this causes an= =20 infinite loop in r300RefillCurrentDmaRegion. The root cause is that we drop the command buffers with the DISCARD. I can= =20 see two possible solutions to this problem: 1. Wait until we have a cliprect again before submitting command buffers. 2. Submit command buffers even when we have no cliprects. The kernel module= =20 would basically ignore everything but the DISCARD commands. 3. Something else? I don't like option (1) because it somehow assumes that the user program=20 only cares about OpenGL (and that's quite selfish). There are many use=20 cases where it is plainly the incorrect thing to do: =2D A user running something like Quake in listenserver mode; if they switc= h=20 away from Quake for some reason (incoming messages, whatever), the server=20 will stop and eventuall all clients will timeout. =2D Imagine a chat application that uses some fancy 3D graphics for whateve= r=20 reason (glitz, for example). Now this application may just be in the middle= =20 of drawing something when the user moves some other application above it.=20 The end result will be that the applications essentially becomes locked up= =20 until it becomes visible again; in the mean time, the chat might time out=20 and disconnect the user. So (1) clearly isn't a good solution. Option (2) is more correct, but it does seem a little bit hackish. Any better ideas? Perhaps tracking which buffers were discarded? That's not= =20 exactly beautiful either. cu, Nicolai >=20 > Keith |
From: Keith W. <kei...@ya...> - 2005-02-18 17:13:53
|
Nicolai Haehnle wrote: > On Friday 18 February 2005 16:03, Keith Whitwell wrote: > >>Ben Skeggs wrote: >> >>>>I still have a 100% reproducable bug which I need to find the cause of, >>>>but time is once again a problem for me. If I move a window over the > > top > >>>>of a glxgears window my machine locks up immediately, but sysrq still >>>>works >>>>fine. >>> >>> >>>I just discovered (and should've checked before), that I can ssh in and >>>successfuly >>>kill glxgears, then X returns to normal. I can have a partially covered >>>glxgears >>>window and everything is fine, but as soon as the entire window (not >>>incl. window >>>decorations) is covered, it seems that the 2d driver is unable to update >>>the screen. >> >>I think some of the other drivers do a 'sched_yeild()' or 'usleep(0)' in >>the zero cliprect case to get away from this sort of behaviour. > > > Well, I can reproduce this bug and I tracked it down. There are a number of > problems here, and they all have to do with DMA buffer accounting. > The first (trivial) problem is that nr_released_bufs was never reset to 0. > I've already fixed that in CVS. > The real problem is that the following situation can occur when we have zero > cliprects: > 1. The command buffer contains a DISCARD command for a DMA buffer. > 2. We simply drop that command buffer because there are no cliprects, i.e. > nothing can be drawn. > 3. As a consequence, DMA buffers aren't freed. > 4. The rendering loop continues even though DMA buffers have been leaked, > which eventually causes all DMA buffers to be exhausted, and this causes an > infinite loop in r300RefillCurrentDmaRegion. > > The root cause is that we drop the command buffers with the DISCARD. I can > see two possible solutions to this problem: > 1. Wait until we have a cliprect again before submitting command buffers. > 2. Submit command buffers even when we have no cliprects. The kernel module > would basically ignore everything but the DISCARD commands. > 3. Something else? > > I don't like option (1) because it somehow assumes that the user program > only cares about OpenGL (and that's quite selfish). There are many use > cases where it is plainly the incorrect thing to do: > - A user running something like Quake in listenserver mode; if they switch > away from Quake for some reason (incoming messages, whatever), the server > will stop and eventuall all clients will timeout. > - Imagine a chat application that uses some fancy 3D graphics for whatever > reason (glitz, for example). Now this application may just be in the middle > of drawing something when the user moves some other application above it. > The end result will be that the applications essentially becomes locked up > until it becomes visible again; in the mean time, the chat might time out > and disconnect the user. > So (1) clearly isn't a good solution. > > Option (2) is more correct, but it does seem a little bit hackish. Note that well written applications will notice when there are no cliprects and stop rendering. Probably gears will do this too, but it can't guarentee that you'll never see rendering in the zero-cliprect case. Option 2 plus a yeild/usleep/whatever is pretty much standard. Keith |
From: Mike M. <che...@ya...> - 2005-02-26 16:08:54
|
--- Nicolai Haehnle <pre...@gm...> wrote: > On Friday 18 February 2005 16:03, Keith Whitwell wrote: > > Ben Skeggs wrote: > > >> I still have a 100% reproducable bug which I need to find the > cause of, > > >> but time is once again a problem for me. If I move a window over > the > top > > >> of a glxgears window my machine locks up immediately, but sysrq > still > > >> works > > >> fine. > > > > > > > > > I just discovered (and should've checked before), that I can ssh > in and > > > successfuly > > > kill glxgears, then X returns to normal. I can have a partially > covered > > > glxgears > > > window and everything is fine, but as soon as the entire window > (not > > > incl. window > > > decorations) is covered, it seems that the 2d driver is unable to > update > > > the screen. > > > > I think some of the other drivers do a 'sched_yeild()' or > 'usleep(0)' in > > the zero cliprect case to get away from this sort of behaviour. > > Well, I can reproduce this bug and I tracked it down. There are a > number of > problems here, and they all have to do with DMA buffer accounting. > The first (trivial) problem is that nr_released_bufs was never reset > to 0. > I've already fixed that in CVS. > The real problem is that the following situation can occur when we > have zero > cliprects: > 1. The command buffer contains a DISCARD command for a DMA buffer. > 2. We simply drop that command buffer because there are no cliprects, > i.e. > nothing can be drawn. > 3. As a consequence, DMA buffers aren't freed. > 4. The rendering loop continues even though DMA buffers have been > leaked, > which eventually causes all DMA buffers to be exhausted, and this > causes an > infinite loop in r300RefillCurrentDmaRegion. > > The root cause is that we drop the command buffers with the DISCARD. I > can > see two possible solutions to this problem: > 1. Wait until we have a cliprect again before submitting command > buffers. > 2. Submit command buffers even when we have no cliprects. The kernel > module > would basically ignore everything but the DISCARD commands. > 3. Something else? > > I don't like option (1) because it somehow assumes that the user > program > only cares about OpenGL (and that's quite selfish). There are many use > > cases where it is plainly the incorrect thing to do: > - A user running something like Quake in listenserver mode; if they > switch > away from Quake for some reason (incoming messages, whatever), the > server > will stop and eventuall all clients will timeout. There are acctualy more common reasons why video games NEED a renderer that dose not block or they should do all there rendering in another thread ?Doom3?. A video game typicaly lookes like this... Get user input/Network Input Process Draw/Play/Network Responce Loop If the Draw part above dose not return ASAP then user input and network pings will suffer grately! What I mean is the player knowes about a target(from a previous frame or sound) and he is stuck waiting 1/nth a second for the nFPS OpenGL driver to return. This is not the first time I have brought this up and I'm sad to see that the point still has not been visable, must be getting cliped by gethostbyname. > - Imagine a chat application that uses some fancy 3D graphics for > whatever > reason (glitz, for example). Now this application may just be in the > middle > of drawing something when the user moves some other application above > it. > The end result will be that the applications essentially becomes > locked up > until it becomes visible again; in the mean time, the chat might time > out > and disconnect the user. > So (1) clearly isn't a good solution. > > Option (2) is more correct, but it does seem a little bit hackish. > > Any better ideas? Perhaps tracking which buffers were discarded? > That's not > exactly beautiful either. > > cu, > Nicolai > > > > > Keith > > ATTACHMENT part 2 application/pgp-signature __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo |
From: Adam K K. <ad...@vo...> - 2005-02-18 14:57:23
|
Nicolai Haehnle wrote: >Hi everybody, > >As reported earlier, I had a perfectly repeatable lockup in VB mode that >always happened after the exact same number of frames in glxgears. I can't >explain everything about the lockup, mostly because I still don't know what >the two registers in the begin3d/end3d sequence actually mean, but here's >what I know: > >It turns out that after the first 4 DMA buffers had been used to completion, >r300FlushCmdBuf() was called from r300RefillCurrentDmaRegion(). This only >caused simple state setting commands as well as an upload of the current >vertex program into the VAP. There was no rendering going on, and neither >the begin3d nor the end3d sequence was part of the commands that were sent >to the card. >However for some reason, it was this sequence that caused the lockup. > >This leads me to believe that there's somehow more "magic" to the >begin3d/end3d sequence than just cache control as I originally assumed (or >maybe it *is* cache control, but there's something weird going on in >connection with it, I simply don't know). > >In any case, what I did is *always* emit the begin3d sequence at the top of >r300_do_cp_cmdbuf and end3d at the bottom of r300_do_cp_cmdbuf (it is also >emitted in the case of an error). This works for me, I can run glxgears for >several minutes, even doing some stuff that sometimes tends to produce >lockups without any problems. > >Please, everybody, get the latest CVS (anonymous will take some time to >catch up...) and test vertex buffer mode with it (go to r300_run_render() >in r300_render.c and change the #if so that r300_vb_run_render() is >called). I want to be really sure that this fixes it for other people as >well (after all, there may be other causes for lockups that haven't occured >on my machine yet), and that there are no regressions for those who already >had working VB mode. > >Once we can be fairly certain that VB mode is stable (i.e. crash and >lockup-free), let's talk about removing any mention of the begin3d and >end3d sequence from the userspace driver. This is really far too subtle an >issue to allow userspace to mess with it. This counts for the X server as >well - if anybody feels like implementing Render acceleration, which I >doubt at this stage, please leave the begin3d/end3d handling to the kernel >module, as it's the only instance that really knows what's going on. > >cu, >Nicolai > > I had fairly reproducable lockups prior to this fix. I'm at work at the moment, but I'll be able to test it this weekend. Adam |
From: Ben S. <dar...@ii...> - 2005-02-18 17:14:55
|
Nicolai Haehnle wrote: >On Friday 18 February 2005 16:03, Keith Whitwell wrote: > > >>Ben Skeggs wrote: >> >> >>>>I still have a 100% reproducable bug which I need to find the cause of, >>>>but time is once again a problem for me. If I move a window over the >>>> >>>> >top > > >>>>of a glxgears window my machine locks up immediately, but sysrq still >>>>works >>>>fine. >>>> >>>> >>>I just discovered (and should've checked before), that I can ssh in and >>>successfuly >>>kill glxgears, then X returns to normal. I can have a partially covered >>>glxgears >>>window and everything is fine, but as soon as the entire window (not >>>incl. window >>>decorations) is covered, it seems that the 2d driver is unable to update >>>the screen. >>> >>> >>I think some of the other drivers do a 'sched_yeild()' or 'usleep(0)' in >>the zero cliprect case to get away from this sort of behaviour. >> >> > >Well, I can reproduce this bug and I tracked it down. There are a number of >problems here, and they all have to do with DMA buffer accounting. >The first (trivial) problem is that nr_released_bufs was never reset to 0. >I've already fixed that in CVS. >The real problem is that the following situation can occur when we have zero >cliprects: >1. The command buffer contains a DISCARD command for a DMA buffer. >2. We simply drop that command buffer because there are no cliprects, i.e. >nothing can be drawn. >3. As a consequence, DMA buffers aren't freed. >4. The rendering loop continues even though DMA buffers have been leaked, >which eventually causes all DMA buffers to be exhausted, and this causes an >infinite loop in r300RefillCurrentDmaRegion. > >The root cause is that we drop the command buffers with the DISCARD. I can >see two possible solutions to this problem: >1. Wait until we have a cliprect again before submitting command buffers. >2. Submit command buffers even when we have no cliprects. The kernel module >would basically ignore everything but the DISCARD commands. >3. Something else? > > I'm still rather new at this, so forgive me if this is a bad suggestion. How about going with option 2, but only submitting the command buffer anyway if nr_released_bufs != 0. Would this cause any unwanted side effects? It seems better than just always submitting buffers with no cliprects anyhow. Ben Skeggs. |
From: Keith W. <kei...@ya...> - 2005-02-18 17:17:54
|
Ben Skeggs wrote: >> > I'm still rather new at this, so forgive me if this is a bad suggestion. > How about going with option 2, but only submitting the command buffer > anyway if nr_released_bufs != 0. > > Would this cause any unwanted side effects? It seems better than just > always submitting buffers with no cliprects anyhow. Oh, btw - note that if you start thowing buffers out, you have to account for the fact that the hardware hasn't been programmed with the state that you thought it had - probably by setting a dirty flag or lost context flag. Keith |
From: Nicolai H. <pre...@gm...> - 2005-02-18 19:01:43
|
On Friday 18 February 2005 18:17, Keith Whitwell wrote: > Ben Skeggs wrote: > > I'm still rather new at this, so forgive me if this is a bad suggestion. > > How about going with option 2, but only submitting the command buffer > > anyway if nr_released_bufs !=3D 0. > >=20 > > Would this cause any unwanted side effects? It seems better than just > > always submitting buffers with no cliprects anyhow. >=20 > Oh, btw - note that if you start thowing buffers out, you have to=20 > account for the fact that the hardware hasn't been programmed with the=20 > state that you thought it had - probably by setting a dirty flag or lost= =20 > context flag. The command buffer is always sent to the kernel now (and clipping is used t= o=20 prevent any real rendering from happening), so this particular bug should=20 be gone. There's still at least one hardware lockup bug that can be triggered with=20 glxgears; unfortunately, this one doesn't seem to be so easily=20 reproducible. cu, Nicolai > Keith |
From: Peter Z. <pz...@po...> - 2005-02-18 22:18:41
Attachments:
p1.diff
|
Hi, I tested latest CVS of r300 driver. Card is Rageon 9700Pro (NE). I tested immediate and VB mode, both of them work ok - no locks (tested in glxgearx, tuxracer, nehe lesson06) . Even with x11perf -shmput 500 I can not lock machne. Some screenshots from enemy territory (VB mode): http://www.gaya.sk/~pzubaj/scr/et1.jpg http://www.gaya.sk/~pzubaj/scr/et2.jpg http://www.gaya.sk/~pzubaj/scr/et3.jpg http://www.gaya.sk/~pzubaj/scr/et4.jpg http://www.gaya.sk/~pzubaj/scr/et5.jpg http://www.gaya.sk/~pzubaj/scr/et6.jpg I trayed attached patch. With this patch still everything works ok - no lock. Then I removed all occurrence of (from r300 driver): reg_start(0x4f18,0); e32(0x00000003); Still everything works ok - no lock. Then I removed every reg_start(R300_RB3D_DSTCACHE_CTLSTAT,0); e32(0x0000000a); computer immediately lock. For me looks like (my opinions): R300_RB3D_DSTCACHE_CTLSTAT is needs to be regural inserted to stream. Register 0x4f18 is only need at begin and end of 3d drawing. Peter Zubaj |
From: Nicolai H. <pre...@gm...> - 2005-02-18 23:04:51
|
On Friday 18 February 2005 20:04, Nicolai Haehnle wrote: > There's still at least one hardware lockup bug that can be triggered with= =20 > glxgears; unfortunately, this one doesn't seem to be so easily=20 > reproducible. This bug can be triggered on my machine by a single instance of glxgears. I= t=20 seems to be unrelated to 2D activity. The lockup is a lot more likely to occur for me at high framerates. This is= =20 unfortunate, because it means that I need to turn down debug message=20 volume, otherwise the lockup is actually very unlikely to appear at all. However, the lockup seems to be unrelated to the use of "sync": It has=20 happened both with RADEON_DEBUG=3D empty and with RADEON_DEBUG=3Dsync. When I don't issue any of the magic "begin3d" sequences from the userspace= =20 driver, the lockup always happens just after one DMA block has been=20 discarded, but I can't find a pattern as to when it happens exactly. Emitting some of those magic sequences changes the time when the lockup=20 happens a bit, but I have no idea what's really going on. cu, Nicolai |
From: Vladimir D. <vo...@mi...> - 2005-02-18 22:34:18
|
On Fri, 18 Feb 2005, Nicolai Haehnle wrote: > Hi everybody, > > As reported earlier, I had a perfectly repeatable lockup in VB mode that > always happened after the exact same number of frames in glxgears. I can't > explain everything about the lockup, mostly because I still don't know what > the two registers in the begin3d/end3d sequence actually mean, but here's > what I know: > > It turns out that after the first 4 DMA buffers had been used to completion, > r300FlushCmdBuf() was called from r300RefillCurrentDmaRegion(). This only > caused simple state setting commands as well as an upload of the current > vertex program into the VAP. There was no rendering going on, and neither > the begin3d nor the end3d sequence was part of the commands that were sent > to the card. > However for some reason, it was this sequence that caused the lockup. Great ! Unfortunately, VB mode still locks up for me. (I did update to the very latest CVS and made sure there were no other changes). There could two "obvious" causes: 1. we do not emit sync_VAP() sequence after programming VAP registers 2. I am using MergedFB mode and in that case the cursor movement triggers an INREG() from RADEON_CRTC2_GEN_CNTL and RADEON_CRTC_GEN_CNTL. I would not expect INREG from these registers to cause immediate lockups, but if I comment it out the lockups do not occur as often (but when they do they are still hard). Any ideas ? best Vladimir Dergachev |
From: Adam K K. <ad...@vo...> - 2005-02-19 00:05:42
|
Nicolai Haehnle wrote: > >Please, everybody, get the latest CVS (anonymous will take some time to >catch up...) and test vertex buffer mode with it (go to r300_run_render() >in r300_render.c and change the #if so that r300_vb_run_render() is >called). I want to be really sure that this fixes it for other people as >well (after all, there may be other causes for lockups that haven't occured >on my machine yet), and that there are no regressions for those who already >had working VB mode. > > Correct me if I'm wrong, but to get the driver to automatically use vb mode, all you have to do is to change: #if 1 return r300_run_immediate_render(ctx, stage); #else return r300_run_vb_render(ctx, stage); #endif to #if 1 return r300_run_vb_render(ctx, stage); #else return r300_run_vb_render(ctx, stage); #endif Correct? If that's the case, I'm experiencing lockups with neverputt in both immediate and vb modes, though the symptoms are slightly different. In both cases, I have to ssh in and reboot. Simply killing neverputt doesn't bring back the machine. With immediate mode, the lockup seems to happen quicker. I can't get past the first hole. The mouse still responds.. I can move it around though, of course, it does no good. In vb mode, the mouse locks up, too. Any ideas? Adam |
From: Nicolai H. <pre...@gm...> - 2005-02-19 00:30:20
|
On Saturday 19 February 2005 01:05, Adam K Kirchhoff wrote: > Nicolai Haehnle wrote: > >Please, everybody, get the latest CVS (anonymous will take some time to= =20 > >catch up...) and test vertex buffer mode with it (go to r300_run_render(= )=20 > >in r300_render.c and change the #if so that r300_vb_run_render() is=20 > >called). I want to be really sure that this fixes it for other people as= =20 > >well (after all, there may be other causes for lockups that haven't=20 occured=20 > >on my machine yet), and that there are no regressions for those who=20 already=20 > >had working VB mode. > > =20 > > >=20 > Correct me if I'm wrong, but to get the driver to automatically use vb=20 > mode, all you have to do is to change: >=20 > #if 1 > return r300_run_immediate_render(ctx, stage); > #else > return r300_run_vb_render(ctx, stage); > #endif >=20 > to >=20 > #if 1 > return r300_run_vb_render(ctx, stage); > #else > return r300_run_vb_render(ctx, stage); > #endif >=20 > Correct? That's correct, although it would be easier to just change the 1 into a 0 ;) > If that's the case, I'm experiencing lockups with neverputt in both=20 > immediate and vb modes, though the symptoms are slightly different. In=20 > both cases, I have to ssh in and reboot. Simply killing neverputt=20 > doesn't bring back the machine. With immediate mode, the lockup seems=20 > to happen quicker. I can't get past the first hole. The mouse still=20 > responds.. I can move it around though, of course, it does no good. In= =20 > vb mode, the mouse locks up, too. > > Any ideas? Interesting, I didn't have lockups that hard for quite some time. Then=20 again, I'm only trying to get glxgears to run without lockups... So this could really be anything. The first rule of thumb is to run with the environment variable=20 RADEON_DEBUG=3Dall set and pipe stderr into a file (beware that this will=20 reduce performance a lot), make sure you capture the entire file and=20 examine that. The last line should be something like "R200 timed out...=20 exiting" in "normal" lockups. cu, Nicolai |
From: Adam K K. <ad...@vo...> - 2005-02-19 15:55:18
Attachments:
never2.txt
|
Nicolai Haehnle wrote: >On Saturday 19 February 2005 01:05, Adam K Kirchhoff wrote: > > >>Nicolai Haehnle wrote: >> >> >>>Please, everybody, get the latest CVS (anonymous will take some time to >>>catch up...) and test vertex buffer mode with it (go to r300_run_render() >>>in r300_render.c and change the #if so that r300_vb_run_render() is >>>called). I want to be really sure that this fixes it for other people as >>>well (after all, there may be other causes for lockups that haven't >>> >>> >occured > > >>>on my machine yet), and that there are no regressions for those who >>> >>> >already > > >>>had working VB mode. >>> >>> >>> >>> >>Correct me if I'm wrong, but to get the driver to automatically use vb >>mode, all you have to do is to change: >> >>#if 1 >> return r300_run_immediate_render(ctx, stage); >>#else >> return r300_run_vb_render(ctx, stage); >>#endif >> >>to >> >>#if 1 >> return r300_run_vb_render(ctx, stage); >>#else >> return r300_run_vb_render(ctx, stage); >>#endif >> >>Correct? >> >> > >That's correct, although it would be easier to just change the 1 into a 0 ;) > > > Yeah, if I had actually taken the time to look at and understand the code, I would have just done that :-) >>If that's the case, I'm experiencing lockups with neverputt in both >>immediate and vb modes, though the symptoms are slightly different. In >>both cases, I have to ssh in and reboot. Simply killing neverputt >>doesn't bring back the machine. With immediate mode, the lockup seems >>to happen quicker. I can't get past the first hole. The mouse still >>responds.. I can move it around though, of course, it does no good. In >>vb mode, the mouse locks up, too. >> >>Any ideas? >> >> > >Interesting, I didn't have lockups that hard for quite some time. Then >again, I'm only trying to get glxgears to run without lockups... >So this could really be anything. > >The first rule of thumb is to run with the environment variable >RADEON_DEBUG=all set and pipe stderr into a file (beware that this will >reduce performance a lot), make sure you capture the entire file and >examine that. The last line should be something like "R200 timed out... >exiting" in "normal" lockups. > So I updated my Xorg cvs, as per Vladimir's recent suggestion, and gave neverputt another shot. It locked up, including the mouse... It dies with: r300BindTexture( 0x831d050 ) unit=0 r300ResetHwState r300Flush r300FlushCmdBufLocked from r300Flush - 1 cliprects Syncing in r300FlushCmdBufLocked (from r300Flush) Error: R200 timed out... exiting I can upload the full debug log to a server at work, but it's about 62 megs and it's gonna take a while to upload. I'm attaching the last 200 lines or so. Adam |
From: Adam K K. <ad...@vo...> - 2005-02-19 17:11:35
|
Adam K Kirchhoff wrote: > Nicolai Haehnle wrote: > >> On Saturday 19 February 2005 01:05, Adam K Kirchhoff wrote: >> >> >>> Nicolai Haehnle wrote: >>> >>> >>>> Please, everybody, get the latest CVS (anonymous will take some >>>> time to catch up...) and test vertex buffer mode with it (go to >>>> r300_run_render() in r300_render.c and change the #if so that >>>> r300_vb_run_render() is called). I want to be really sure that this >>>> fixes it for other people as well (after all, there may be other >>>> causes for lockups that haven't >>> >> occured >> >>>> on my machine yet), and that there are no regressions for those who >>>> >>> >> already >> >>>> had working VB mode. >>>> >>>> >>>> >>> >>> Correct me if I'm wrong, but to get the driver to automatically use >>> vb mode, all you have to do is to change: >>> >>> #if 1 >>> return r300_run_immediate_render(ctx, stage); >>> #else >>> return r300_run_vb_render(ctx, stage); >>> #endif >>> >>> to >>> >>> #if 1 >>> return r300_run_vb_render(ctx, stage); >>> #else >>> return r300_run_vb_render(ctx, stage); >>> #endif >>> >>> Correct? >>> >> >> >> That's correct, although it would be easier to just change the 1 into >> a 0 ;) >> >> >> > Yeah, if I had actually taken the time to look at and understand the > code, I would have just done that :-) > >>> If that's the case, I'm experiencing lockups with neverputt in both >>> immediate and vb modes, though the symptoms are slightly different. >>> In both cases, I have to ssh in and reboot. Simply killing >>> neverputt doesn't bring back the machine. With immediate mode, the >>> lockup seems to happen quicker. I can't get past the first hole. >>> The mouse still responds.. I can move it around though, of course, >>> it does no good. In vb mode, the mouse locks up, too. >>> >>> Any ideas? >>> >> >> >> Interesting, I didn't have lockups that hard for quite some time. >> Then again, I'm only trying to get glxgears to run without lockups... >> So this could really be anything. >> >> The first rule of thumb is to run with the environment variable >> RADEON_DEBUG=all set and pipe stderr into a file (beware that this >> will reduce performance a lot), make sure you capture the entire file >> and examine that. The last line should be something like "R200 timed >> out... exiting" in "normal" lockups. >> > > So I updated my Xorg cvs, as per Vladimir's recent suggestion, and > gave neverputt another shot. It locked up, including the mouse... > It dies with: > > r300BindTexture( 0x831d050 ) unit=0 > r300ResetHwState > r300Flush > r300FlushCmdBufLocked from r300Flush - 1 cliprects > Syncing in r300FlushCmdBufLocked (from r300Flush) > > Error: R200 timed out... exiting > > I can upload the full debug log to a server at work, but it's about 62 > megs and it's gonna take a while to upload. > > I'm attaching the last 200 lines or so. > > Adam > > Same lockups with tuxracer, but it happened much quicker. You can view the full debug output from tuxracer at: http://go.visualtech.com/adam/tuxracer.txt It's about 6 megs in size. Adam |
From: Adam K K. <ad...@vo...> - 2005-02-19 21:11:16
|
Vladimir Dergachev wrote: >>> >>> Adam >>> >>> >> >> Same lockups with tuxracer, but it happened much quicker. You can >> view the full debug output from tuxracer at: >> >> http://go.visualtech.com/adam/tuxracer.txt >> >> It's about 6 megs in size. > > > I suggest you gzip it next time - this should work exceedingly well > with log files and most people can still view it within a webbrowser. > > Also, to cover the obvious, you did update the DRM driver, recompile > it and reloaded it ? Check that there are no stray binaries around.. > > Sorry 'bout that. Yeah, the DRM was updated as well. I've compared md5sums on the drm.ko and radeon.ko modules in /lib/modules/2.6.10/kernel/drivers/char/drm to the ones in the drm directory of r300_driver tree that I just built from this morning. Exact match. Even after doing a "make clean" in the drm source directory and rebuilding it just now, they match. I'm not sure what stray binaries would cause this.. However, libGL.so is definitely loading /usr/X11R6/lib/modules/dri/r300_dri.so, and I build and installed that this morning (and have since updated it this afternoon). Adam |
From: Yann V. <ya...@al...> - 2005-02-20 12:54:19
|
On Sat, Feb 19, 2005 at 04:11:08PM -0500, Adam K Kirchhoff wrote: > Vladimir Dergachev wrote: > >Also, to cover the obvious, you did update the DRM driver, recompile=20 > >it and reloaded it ? Check that there are no stray binaries around.. > Yeah, the DRM was updated as well. I've compared md5sums on the drm.ko= =20 > and radeon.ko modules in /lib/modules/2.6.10/kernel/drivers/char/drm to= =20 > the ones in the drm directory of r300_driver tree that I just built from= =20 > this morning. Exact match. Even after doing a "make clean" in the drm= =20 > source directory and rebuilding it just now, they match. >=20 > I'm not sure what stray binaries would cause this.. However, libGL.so=20 > is definitely loading /usr/X11R6/lib/modules/dri/r300_dri.so, and I=20 > build and installed that this morning (and have since updated it this=20 > afternoon). First post on this topic, sorry if it's messy. I have an amd64 portable (Mitac 8355) with a RV350 (Mobility Radeon 9600 M10). I just updated the drivers yesterday, and tried enabling VB mode by altering that #if 1. The system did a hard lockup pretty much immediately on running glxgears; at least one frame was drawn, then not even network logins worked. The update also brought another curious change. Back in immediate mode, there now seems to be some sort of waiting going on that shouldn't be. GL apps tend to freeze their display unless there are events for the X server to deal with, such as touchpad input. dmesg is full of this: [drm:radeon_cp_dispatch_swap] *ERROR* Engine timed out before swap buffer blit glxgears tends to resume for short bursts occasionally, but neverputt does not. I tried rolling back the Mesa and Xorg drivers separately, but it appears to me like this is caused by a DRM change. crack-attack does not suffer from this problem. glxgears reports normal framerates (>90 fps on 1400x1050, window size 1398x1030) despite few visual updates occurring. Tips on how to proceed isolating this would be very welcome. The previous update, if I am not mistaken, was from about four or five days ago. --=20 PGP fingerprint =3D 9242 DC15 2502 FEAB E15F 84C6 D538 EC09 5380 5746 |