From: Kevin E M. <ma...@va...> - 2000-11-03 15:32:07
|
On Fri, Nov 03, 2000 at 01:24:32AM +1100, Gareth Hughes wrote: > Recently, I moved all the Concurrent Command Engine (CCE) handling out > of the 2D driver and into the kernel module on the ati-4-1-1 branch. Good. I've done the same thing with the Radeon for the CP. > I noticed that the interaction between the MMIO 2D acceleration and the > new kernel module interface for starting and stopping the CCE caused > noticeable lag in apps that where running at full hardware utilization ... > Examples of apps that exhibit this behaviour are Mesa/demos/geartrain > and the Q3 main menu. > > I spent the first part of tonight reworking the CCE stop operation, > which was causing the problem. This basically involves flushing the > ring, waiting for engine idle, stopping the CCE and then resetting the > engine. The thing is, you don't necessarily have to be pedantic about > waiting for the engine to idle. The downside of this is that you get > flashing of triangles as the CCE is stopped as it is still rendering. Exactly. > I've fixed things enough that geartrain is much nicer (basically no > lag), but the Q3 menu is still jerky. I've had it so that it will drop > triangles left right and center but be nice and smooth - this is > obviously not particularly desirable. Doing this will also cause the > engine to lock up every now and then. Running with "UseCCEFor2D", which > disables 2D acceleration, is fine. If anything, my work has cleaned > things up enough that it's even smoother now. > > Kevin, what are your thoughts on this? I'd be happy with leaving a > slightly jerky Q3 menu rather than allow the engine to lock up Of course, slightly jerky is better than lock ups. :-) I wish we had time to implement full CCE/CP based 2D accel code for the Rage 128 and Radeon; however, there is a reasonable intermediate solution... Since we already have 2D code that directly programs the registers via MMIO, all that we need to do is to program those same registers via the CCE/CP. We can use the CCE/CP register programming packets (Type-0 and Type-1) to set the registers and then use the privileged SubmitPackets routine to submit them to the ring buffer when a Sync is required or the buffer is full. It will not be the most efficient implementation, but I believe it is a reasonable trade-off until we have the time and funding for a full CCE/CP implementation. We will obviously have to run xtest and our other test programs at all screen depths to make sure that the 2D accel routines are correct. To minimize the amount of work/testing and gain the largest improvement in performance, I suggest that we only accelerate the two most common 2D operations: solid fills and screen to screen copies. This will also allow us to use these accel functions for the DRIInitBuffers and DRIMoveBuffers routines. I have already done this for the Radeon as part of my 3D/DRI work. Once this work is done, we can go one step further and automatically enable the "UseCCEfor2D" option when the DRI is enabled. And, if the user wants to restore the previous behavior, then they can set the "UseCCEfor2D" option to be false. Kevin |