[Dri-devel] New realfastpath in mga-0-0-3-branch...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

OK, for anyone who's interested:

I've committed (to the DRI tree under the mga-0-0-3-branch tag) a new even
faster path for MGA setupdma (indexed vertex buffers).  I see a 10-15% speedup
at the q3dm1 spawn point at 640x480 (more benchmarks later).

The path is pretty simple, and looks something like this:

	- do obj->clip transform
	- cliptest and project
	- walk the clipmask array 
		- for each unclipped vertex, emit that vertex to a waiting dma buffer

	- walk the element list and identify triangles to render
		- if triangle is unclipped, emit 3 indices to dma
		- if triangle is clipped:
			- build 3 clipspace vertices
			- perform clipping
			- project and emit any newly created vertices to dma
			- emit indices.

Buffers are organized so that indices are emitted from low addresses up (the
way you would expect them to), and vertices are emitted from the other end of
the buffer, growing downwards.

There are a couple of restrictions in the current implementation, which may be
difficult to remove:

	- indices are emitted to hardware as the physical addresses of the referenced
vertex in agp space.  In order to calculate these, I emit unclipped vertices
to a contiguous piece of agp space, allowing a simple relationship between
element and physical address.  In effect this requires that all the unclipped
vertices can fit in a single dma buffer.  Quake3 calls this path on arrays of
up to 1024 vertices, vertices are 10 dwords but must be 4-dword aligned, thus
effectively 12 dwords/vertex.  This means that dma buffers must be 64k or so
in size for this path to be useful for q3.

	- there is no way to perform the vertex manipulations necessary to draw
accelerated lines or points, or two-sided, flat-shaded, or other exotic
triangles with this path.

	- mundane assembly issues add a requirement that the ModelProject matrix be
"general" -- typically requiring a perspective transform.

These are fairly small limitations, but do justify the continued existance of
the fastpath mga hardware.

I've been able to reuse the assembly support from the "main" mesa path (the
slowpath?) in this code.  By writing some new assembly a small additional
speedup might be gained.

Keith