Re: [Dri-devel] r128 CCE issues

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Nov 03, 2000 at 01:24:32AM +1100, Gareth Hughes wrote:
> Recently, I moved all the Concurrent Command Engine (CCE) handling out
> of the 2D driver and into the kernel module on the ati-4-1-1 branch. 

Good.  I've done the same thing with the Radeon for the CP.

> I noticed that the interaction between the MMIO 2D acceleration and the
> new kernel module interface for starting and stopping the CCE caused
> noticeable lag in apps that where running at full hardware utilization
...
> Examples of apps that exhibit this behaviour are Mesa/demos/geartrain
> and the Q3 main menu.
> 
> I spent the first part of tonight reworking the CCE stop operation,
> which was causing the problem.  This basically involves flushing the
> ring, waiting for engine idle, stopping the CCE and then resetting the
> engine.  The thing is, you don't necessarily have to be pedantic about
> waiting for the engine to idle.  The downside of this is that you get
> flashing of triangles as the CCE is stopped as it is still rendering.

Exactly.

> I've fixed things enough that geartrain is much nicer (basically no
> lag), but the Q3 menu is still jerky.  I've had it so that it will drop
> triangles left right and center but be nice and smooth - this is
> obviously not particularly desirable.  Doing this will also cause the
> engine to lock up every now and then.  Running with "UseCCEFor2D", which
> disables 2D acceleration, is fine.  If anything, my work has cleaned
> things up enough that it's even smoother now.
> 
> Kevin, what are your thoughts on this?  I'd be happy with leaving a
> slightly jerky Q3 menu rather than allow the engine to lock up

Of course, slightly jerky is better than lock ups. :-)

I wish we had time to implement full CCE/CP based 2D accel code for the
Rage 128 and Radeon; however, there is a reasonable intermediate
solution...

Since we already have 2D code that directly programs the registers via
MMIO, all that we need to do is to program those same registers via the
CCE/CP.  We can use the CCE/CP register programming packets (Type-0 and
Type-1) to set the registers and then use the privileged SubmitPackets
routine to submit them to the ring buffer when a Sync is required or the
buffer is full.  It will not be the most efficient implementation, but I
believe it is a reasonable trade-off until we have the time and funding
for a full CCE/CP implementation.  We will obviously have to run xtest
and our other test programs at all screen depths to make sure that the
2D accel routines are correct.

To minimize the amount of work/testing and gain the largest improvement
in performance, I suggest that we only accelerate the two most common 2D
operations: solid fills and screen to screen copies.  This will also
allow us to use these accel functions for the DRIInitBuffers and
DRIMoveBuffers routines.  I have already done this for the Radeon as
part of my 3D/DRI work.

Once this work is done, we can go one step further and automatically
enable the "UseCCEfor2D" option when the DRI is enabled.  And, if the
user wants to restore the previous behavior, then they can set the
"UseCCEfor2D" option to be false.

Kevin