From: Henry W. <haw...@at...> - 2002-07-18 06:09:54
Attachments:
r128-commit.diff
|
Michel, How do these changes for r128 COMMIT_RING look? With these I can run concurrent xine and glx programs in pci and agp mode with XV dma disabled. With XV dma enabled, in both pci and agp mode, I can run xine for some time without any hangs. But startup a glx program and X will hang until I kill the glx program, and then X and xine will recover. I guess sync is the next issue, I have a follow-on question on that below. The patch also includes a change to RING_SPACE_TEST..., based on the radeon version, adding a register read fallback for determining ring space. The current drm r128 does not have any WAIT_UNTIL_*_IDLE macros. I assume I'm going to need at least a general idle wait to address sync issues. The only WAIT_UNTIL mask bit defined in r128_drv.h is for page flip, are there any other bits available for wait functions? Henry |
From: Henry W. <haw...@at...> - 2002-07-18 07:30:56
|
Henry Worth wrote: > > Michel, > > How do these changes for r128 COMMIT_RING look? With these I can run > concurrent xine and glx programs in pci and agp mode with XV dma > disabled. Digging thru the X logs, I see that agp is failing to map the ring buffer with this drm module. It does with kernel module built from BenH's linuxppc dev tree, but I haven't tried the dri CVS before, so I don't know yet if it even works without the COMMIT_RING changes. Anyway, let me know if there are any obvious problems/omissions in that patch. Thanks, Henry |
From: Henry W. <haw...@at...> - 2002-07-18 07:46:36
|
Henry Worth wrote: > > Digging thru the X logs, I see that agp is failing to map the ring > buffer with > this drm module. It does with kernel module built from BenH's linuxppc > dev > tree, but I haven't tried the dri CVS before, so I don't know yet if > it even works > without the COMMIT_RING changes. [...] Decided to do a quick compile and check before calling it a night. Without the patch, agp also fails to map the ring buffer. I'll take a look at merging in whatever fixes are in BenH's src tree tomorrow. Henry |
From: Michel <mi...@da...> - 2002-07-18 08:02:37
|
On Thu, 2002-07-18 at 09:47, Henry Worth wrote: > Henry Worth wrote: >=20 > > > > Digging thru the X logs, I see that agp is failing to map the ring=20 > > buffer with > > this drm module. It does with kernel module built from BenH's linuxppc=20 > > dev > > tree, but I haven't tried the dri CVS before, so I don't know yet if=20 > > it even works > > without the COMMIT_RING changes. [...] >=20 > Decided to do a quick compile and check before calling it a night.=20 > Without the > patch, agp also fails to map the ring buffer. I'll take a look at merging > in whatever fixes are in BenH's src tree tomorrow. Like http://www.penguinppc.org/~daenzer/DRI/drm-ioremapagp.diff ? :) --=20 Earthling Michel D=E4nzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Henry W. <haw...@at...> - 2002-07-18 19:09:06
|
Michel Dänzer wrote: >On Thu, 2002-07-18 at 09:47, Henry Worth wrote: > >>Henry Worth wrote: >> >>>Digging thru the X logs, I see that agp is failing to map the ring >>>buffer with >>>this drm module. It does with kernel module built from BenH's linuxppc >>>dev >>>tree, but I haven't tried the dri CVS before, so I don't know yet if >>>it even works >>>without the COMMIT_RING changes. [...] >>> >>Decided to do a quick compile and check before calling it a night. >>Without the >>patch, agp also fails to map the ring buffer. I'll take a look at merging >>in whatever fixes are in BenH's src tree tomorrow. >> > >Like http://www.penguinppc.org/~daenzer/DRI/drm-ioremapagp.diff ? :) > Exactly like that... AGP is mapping the ring now. So to the results -- again this is with dual 450 G4's with SMP linux 2.4.19-rc1-ben0, DRI CVS from Monday on top of XFre86 2.4.0 with the Michel's XV dma patch, indirect buffering patch, ioremap patch, the r128 endiness patches and the COMMIT_RING changes. While it is still possible to hang X in any combination of PCI/AGP mode and XV dma enabled/disabled by mixing various pairs of XV, DRI, and concurrent X activity, like x11perf, in general it takes more effort. When it does hang I've yet to see a hard hang of the X server or of the system, as was common before. In all hang cases the offending processes could be killed and X would continue, if I was lucky enough to kill the right process, other processes would restart. Previously enabling XV dma, xine starting play would almost always hang in a very short time with SMP kernels. On occasions it did play for a while, trying to move another window was extremely jerky and slow, dispite a very low CPU load (10-15%), and would always result in an eventual hang. Hangs almost always required a reboot to recover. Now, xine with XV dma will play as long as there isn't much other X activity. X remains responsive with windows moving fairly smoothly. I was even able to run X11perf for a couple of minutes before problems occured. But, I've yet to see a literal hang, except when starting a GLX program, and killing the GLX program allowed xine to continue. The problem I now see repeatedly with XV dma is that eventually X11perf or moving a window will cause video sync to go haywire. Xine is playing audio and the keyboard is responsive, and I can ctl-alt-del. But video sync does not recover when X exits or is restarted, a reboot is required to reset the video card. Previously I had somethimes seen a problem like this, but never had keyb responsiveness and attempts to recover from a telnet session often resulted in a system hang when trying to kill X or perform a shutdown. Overall, pci mode and XV dma disabled is still the most stable mode, but agp mode with XV dma disabled is more usable. I was able to run the Mesa terrain demo with texture, two gears demos and x11perf in agp mode for 20minutes, with frequent moving and raising of windows, before a deadlock that was cleared by killing the glx processes. BTW, I do see quite a few "128(0): Idle timed out, resetting engine..." being logged by r128_accel.c's R128WaitForIdle() routine when running x11perf with both the old and new drivers. Henry > |
From: Henry W. <haw...@at...> - 2002-07-18 20:30:40
|
Henry Worth wrote: > > > BTW, I do see quite a few "128(0): Idle timed out, resetting engine..." > being logged by r128_accel.c's R128WaitForIdle() routine when running > x11perf with both the old and new drivers. Tripped a different hang running xine in agp mode with XV dma disabled. With xine running, started glxgears and X hung. Killed glxgears and kde appeared to be restarting the desktop and then xine stuttered a few times before X hung again. During this, the cpu monitor showed high system cycles and lots of the following messages where being logged by r128_accel.c's R128CCEGetBuffer: (EE) R128(0): R128CCEGetBuffer: CCE GetBuffer -1002 (EE) R128(0): GetBuffer timed out, resetting engine... And the kernel log has this a few thousand times: kernel: [drm:r128_freelist_get] *ERROR* returning NULL! I could kill X from a telnet session, but the screen wouldn't clear and the kybd didn't seem responsive, had to reboot. Introduced by the COMMIT_RING changes, an example of the sync problems, or a bit of both? I think I've seen a few of these messages in the past, but don't recall a flood of them. Henry |
From: Henry W. <haw...@at...> - 2002-07-18 23:49:51
|
I've been doing some more testing. With, and without, the commit_ring patches, even in pci mode, I can get X to hang without too much effort by firing up several glx demos and then x11perf. One of the glx programs will hang holding hw locks in r128WaitForFrameCompletion() from r128CopyBuffer(). The other glx programs are in the drmGetLock() ioctl from one of the r128render* functions, and x11perf in select from _XWaitForWritable(). Once the glx programs and x11perf are killed, the X server hang clears, but any attempt to start a glx program will hang the X server in the same r128WaitForFrameCompletion() from r128CopyBuffer() condition on it's first frame. A restart of X is required to reset drm. Henry |
From: Michel <mi...@da...> - 2002-07-22 12:32:07
|
On Thu, 2002-07-18 at 21:09, Henry Worth wrote:=20 > Michel D=E4nzer wrote: >=20 > >On Thu, 2002-07-18 at 09:47, Henry Worth wrote: > > > >>Henry Worth wrote: > >> > >>>Digging thru the X logs, I see that agp is failing to map the ring=20 > >>>buffer with > >>>this drm module. It does with kernel module built from BenH's linuxppc= =20 > >>>dev > >>>tree, but I haven't tried the dri CVS before, so I don't know yet if=20 > >>>it even works > >>>without the COMMIT_RING changes. [...] > >>> > >>Decided to do a quick compile and check before calling it a night.=20 > >>Without the > >>patch, agp also fails to map the ring buffer. I'll take a look at mergi= ng > >>in whatever fixes are in BenH's src tree tomorrow. > >> > > > >Like http://www.penguinppc.org/~daenzer/DRI/drm-ioremapagp.diff ? :) > > >=20 > Exactly like that... AGP is mapping the ring now. >=20 > So to the results -- again this is with dual 450 G4's with SMP linux=20 > 2.4.19-rc1-ben0, > DRI CVS from Monday on top of XFre86 2.4.0 with the Michel's XV dma patch= , > indirect buffering patch, ioremap patch, the r128 endiness patches and t= he > COMMIT_RING changes. >=20 > While it is still possible to hang X in any combination of PCI/AGP mode a= nd > XV dma enabled/disabled by mixing various pairs of XV, DRI, and concurre= nt > X activity, like x11perf, in general it takes more effort. When it does h= ang > I've yet to see a hard hang of the X server or of the system, as was=20 > common before. So should I commit the COMMIT_RING changes? --=20 Earthling Michel D=E4nzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Henry W. <haw...@at...> - 2002-07-22 18:40:53
|
Michel Dänzer wrote: > >So should I commit the COMMIT_RING changes? > The more testing I do, the less convinced I am that they make a really notable difference, all too often when it seems to be working well I discover that dri isn't actually enabled do to some module or lib loading problem. So I'm somewhat neutral at this point, at least until/if the graphics engine hangs can be addressed. Adding them would bring the r128 code back more in common with the radeon. |
From: Michel <mi...@da...> - 2002-07-24 14:03:49
|
On Mon, 2002-07-22 at 20:41, Henry Worth wrote: > Michel D=E4nzer wrote: >=20 > > > >So should I commit the COMMIT_RING changes? > > >=20 > The more testing I do, the less convinced I am that they make a really > notable difference, all too often when it seems to be working well I > discover that dri isn't actually enabled do to some module or lib loading > problem. So I'm somewhat neutral at this point, at least until/if the > graphics engine hangs can be addressed. Adding them would bring the r128 > code back more in common with the radeon. I'm still reluctant to commit changes without a clear benefit. As for the bus banging, AFAICS all MMIO busy loops already have udelay(1)s, you might want to play with higher values? --=20 Earthling Michel D=E4nzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Henry W. <haw...@at...> - 2002-07-24 19:36:15
|
Michel Dänzer wrote: > >I'm still reluctant to commit changes without a clear benefit. As for > Agreed. > >the bus banging, AFAICS all MMIO busy loops already have udelay(1)s, you >might want to play with higher values? > But the 2d driver doesn't have usleeps. The tight udelay() noop loop in the drm driver might have plaform issues, increasing the udelay to 10 didn't have as much impact is increasing the max number of retries. I've been looking at the AGP/PCI configuration. The r128 on both the PowerBook and PowerMac have a PCI latency timeout value of 255 (the max), is this consistent with r128 on other platforms, or a sign of problems? I tried restricting the AGP RQ depth to various values. A value of 0 show s significant improvements. Mixed GLX and XV DMA will still hang on both UP and SMP systems (both pci and agp mode). AGP mode GLX only or XV DMA only works well on the UP, but eventual hangs still occur on SMP. Hangs seem to be more consistent in behavior, so debugging locking problems might be more productive. RQ depth values between 0 and the UniNorth max of 7 show no improvment. But running multiple GLX demos on the SMP system would hang when starting a texture using demo after starting several gears demos and then clear after ~2 minutes (I didn't try the intermediate values on the UP system). During the hang, system cpu cycles where at 100% and a number of drm module cce idle timeouts are logged. The XV DMA and dri texture blit don't appear to obtain the drm hw lock. From the hang behavior, I suspect they are stomping on the DRM idle waits. But, such locking behavior should also impact x86. What is the state of agp-mode r128 on x86? How about SMP x86? Henry |
From: Benjamin H. <be...@ke...> - 2002-07-24 19:50:19
|
>I've been looking at the AGP/PCI configuration. The r128 on both the >PowerBook >and PowerMac have a PCI latency timeout value of 255 (the max), is this >consistent with r128 on other platforms, or a sign of problems? You may want to look at the values in MacOS. Also, the card has all sort of registers where you can set things like AGP clock skew values etc... You may want to compare what MacOS puts in there with what we have in there (inherited from the firmware). >I tried restricting the AGP RQ depth to various values. A value of 0 show s >significant improvements. Mixed GLX and XV DMA will still hang on both >UP and SMP systems (both pci and agp mode). AGP mode GLX only or >XV DMA only works well on the UP, but eventual hangs still occur on SMP. >Hangs seem to be more consistent in behavior, so debugging locking >problems might be more productive. > >RQ depth values between 0 and the UniNorth max of 7 show no improvment. >But running multiple GLX demos on the SMP system would hang when starting >a texture using demo after starting several gears demos and then clear >after ~2 >minutes (I didn't try the intermediate values on the UP system). During >the hang, >system cpu cycles where at 100% and a number of drm module cce idle >timeouts are logged. > >The XV DMA and dri texture blit don't appear to obtain the drm hw lock. From >the hang behavior, I suspect they are stomping on the DRM idle waits. >But, such >locking behavior should also impact x86. What is the state of agp-mode >r128 on >x86? How about SMP x86? |
From: Michel <mi...@da...> - 2002-07-24 20:24:03
|
On Wed, 2002-07-24 at 21:36, Henry Worth wrote: > Michel D=E4nzer wrote: >=20 > > > >the bus banging, AFAICS all MMIO busy loops already have udelay(1)s, you > >might want to play with higher values? > > > But the 2d driver doesn't have usleeps. That doesn't matter for the bus, does it? > The tight udelay() noop loop in the drm driver might have plaform issues, > increasing the udelay to 10 didn't have as much impact is increasing > the max number of retries. Platform issues? You mean the udelay doesn't work as it should? > The XV DMA and dri texture blit don't appear to obtain the drm hw lock. Are you sure? That would be a grave bug. > From the hang behavior, I suspect they are stomping on the DRM idle waits= .=20 Can you describe what you imagine happens? > But, such locking behavior should also impact x86. What is the state of > agp-mode r128 on x86? Stable per se, also unstable when mixing in 2D or Xv in particular. > How about SMP x86? No idea. --=20 Earthling Michel D=E4nzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Henry W. <haw...@at...> - 2002-07-25 16:26:42
|
Michel Dänzer wrote: >On Wed, 2002-07-24 at 21:36, Henry Worth wrote: > >>Michel Dänzer wrote: >> >>>the bus banging, AFAICS all MMIO busy loops already have udelay(1)s, you >>>might want to play with higher values? >>> >>But the 2d driver doesn't have usleeps. >> > >That doesn't matter for the bus, does it? > I still causes bus contention and graphics devices have been know to have problems updating status registers if polled too rapidly. > >>The tight udelay() noop loop in the drm driver might have plaform issues, >>increasing the udelay to 10 didn't have as much impact is increasing >>the max number of retries. >> > >Platform issues? You mean the udelay doesn't work as it should? > The timing of the loop may change significantly with different processors, some can execute such a tight loop out of prefetch queues without branch penalties. That a l0-fold increase in the udelay parm had less impact than increasing the number of retries is suspicious. > > >>The XV DMA and dri texture blit don't appear to obtain the drm hw lock. >> > >Are you sure? That would be a grave bug. > I've been able to check the texture, and it's ok. Checking XV is going to take some time I don't have right now to trace through the data structs. > >>From the hang behavior, I suspect they are stomping on the DRM idle waits. >> > >Can you describe what you imagine happens? > Something is interfering with the engines going idle when a program using textures starts up, yet they often clear after a couple of minutes without a cce reset being logged and the GLX programs continue to operate. XV dma causes a similar hang, but they don't clear, probably since the blits are continous. |
From: Michel <mi...@da...> - 2002-07-25 22:40:11
|
On Thu, 2002-07-25 at 18:26, Henry Worth wrote: > Michel D=E4nzer wrote: >=20 > >On Wed, 2002-07-24 at 21:36, Henry Worth wrote: > > > >>Michel D=E4nzer wrote: > >> > >>>the bus banging, AFAICS all MMIO busy loops already have udelay(1)s, y= ou > >>>might want to play with higher values? > >>> > >>But the 2d driver doesn't have usleeps. > >> > > > >That doesn't matter for the bus, does it? > > > I still causes bus contention and graphics devices have been know to > have problems updating status registers if polled too rapidly. Do two consequent bus cycles without a delay in between really cause contention? > >>The tight udelay() noop loop in the drm driver might have plaform issue= s, > >>increasing the udelay to 10 didn't have as much impact is increasing > >>the max number of retries. > >> > > > >Platform issues? You mean the udelay doesn't work as it should? > > > The timing of the loop may change significantly with different processors= , > some can execute such a tight loop out of prefetch queues without branch > penalties. That a l0-fold increase in the udelay parm had less impact tha= n > increasing the number of retries is suspicious. True, but then those problems should be fixed - I assume other parts of the kernel rely on them working as advertised. Have you tried even higher delays? > >>The XV DMA and dri texture blit don't appear to obtain the drm hw lock. > >> > > > >Are you sure? That would be a grave bug. > > > I've been able to check the texture, and it's ok. Checking XV is going=20 > to take some > time I don't have right now to trace through the data structs. Trace no more, it uses the texture blit ioctl. > >>From the hang behavior, I suspect they are stomping on the DRM idle wai= ts.=20 > >> > > > >Can you describe what you imagine happens? > > > Something is interfering with the engines going idle when a program=20 > using textures > starts up, yet they often clear after a couple of minutes without a cce=20 > reset being > logged and the GLX programs continue to operate. XV dma causes a similar=20 > hang, > but they don't clear, probably since the blits are continous. I see. Hopefully we 'just' have to prevent bus contention to avoid this. --=20 Earthling Michel D=E4nzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Keith W. <ke...@tu...> - 2002-07-18 14:40:13
|
Henry Worth wrote: > > Michel, > > How do these changes for r128 COMMIT_RING look? With these I can run > concurrent xine and glx programs in pci and agp mode with XV dma > disabled. > > With XV dma enabled, in both pci and agp mode, I can run xine for some > time without any hangs. But startup a glx program and X will hang until I > kill the glx program, and then X and xine will recover. I guess sync is the > next issue, I have a follow-on question on that below. > > The patch also includes a change to RING_SPACE_TEST..., based on > the radeon version, adding a register read fallback for determining ring > space. I have no idea why there is a need for the RING_SPACE_TEST macro. It's disabled in the r200 branch. > The current drm r128 does not have any WAIT_UNTIL_*_IDLE macros. > I assume I'm going to need at least a general idle wait to address sync > issues. The only WAIT_UNTIL mask bit defined in r128_drv.h is for page > flip, are there any other bits available for wait functions? What does the r128 currently do to synchronize access to the framebuffer? It may be that 2d & 3d are synchronized by the hardware automatically, but you'll always need to do something before accessing the framebuffer directly. Keith |
From: Michel <mi...@da...> - 2002-07-18 15:20:56
|
On Thu, 2002-07-18 at 16:40, Keith Whitwell wrote: > Henry Worth wrote: > >=20 > > Michel, > >=20 > > How do these changes for r128 COMMIT_RING look? With these I can run > > concurrent xine and glx programs in pci and agp mode with XV dma > > disabled. > >=20 > > With XV dma enabled, in both pci and agp mode, I can run xine for some > > time without any hangs. But startup a glx program and X will hang until= I > > kill the glx program, and then X and xine will recover. I guess sync is= the > > next issue, I have a follow-on question on that below. > >=20 > > The patch also includes a change to RING_SPACE_TEST..., based on > > the radeon version, adding a register read fallback for determining rin= g > > space. >=20 > I have no idea why there is a need for the RING_SPACE_TEST macro. It's=20 > disabled in the r200 branch. Besides, I've never hit the added code there. > > The current drm r128 does not have any WAIT_UNTIL_*_IDLE macros. > > I assume I'm going to need at least a general idle wait to address sync > > issues. The only WAIT_UNTIL mask bit defined in r128_drv.h is for page > > flip, are there any other bits available for wait functions? >=20 > What does the r128 currently do to synchronize access to the framebuffer? XAA handles that, and both drivers provide WaitForIdle() as the Sync function. > It may be that 2d & 3d are synchronized by the hardware automatically, I doubt that, e.g. the texture blit ioctl also flushes the pixel cache. Maybe there's more to do with the PC_{,N}GUI_* registers? > but you'll always need to do something before accessing the framebuffer > directly. I don't see where direct framebuffer access is involved when using DMA transfers for Xv. --=20 Earthling Michel D=E4nzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast |
From: Henry W. <haw...@at...> - 2002-07-18 19:18:14
|
Michel Dänzer wrote: >On Thu, 2002-07-18 at 16:40, Keith Whitwell wrote: > >>Henry Worth wrote: >> >>> >>I have no idea why there is a need for the RING_SPACE_TEST macro. It's >>disabled in the r200 branch. >> > >Besides, I've never hit the added code there. > Ok, I won't worry about that. >>>The current drm r128 does not have any WAIT_UNTIL_*_IDLE macros. >>>I assume I'm going to need at least a general idle wait to address sync >>>issues. The only WAIT_UNTIL mask bit defined in r128_drv.h is for page >>>flip, are there any other bits available for wait functions? >>> >>What does the r128 currently do to synchronize access to the framebuffer? >> > >XAA handles that, and both drivers provide WaitForIdle() as the Sync >function. > >>It may be that 2d & 3d are synchronized by the hardware automatically, >> > >I doubt that, e.g. the texture blit ioctl also flushes the pixel cache. >Maybe there's more to do with the PC_{,N}GUI_* registers? > >>but you'll always need to do something before accessing the framebuffer >>directly. >> > >I don't see where direct framebuffer access is involved when using DMA >transfers for Xv. > So where to go from here? I can only spend a couple more days on this in the near term, so I don't have time to deal with the NDA and digging thru the docs. Unless someone else has a good recommendation, I'll try to flesh out some sync code based on the radeon code and maybe someone else can correct the details later. Henry |