Thread: [Mesa3d-dev] New tdfx driver, locking, small device driver changes

Brought to you by: alanh, brianp, chadversary, keithw

mesa3d-dev

[Mesa3d-dev] New tdfx driver, locking, small device driver changes

From: Keith W. <ke...@va...> - 2000-10-09 19:54:11

OK, so in examining the newest tdfx driver, I got to wondering about the calls
to LOCK_HARDWARE inside the triangle functions.  In particular, after noticing
the slowdown after Brian added cliprect handling to those calls, I wondered
what would happen if I moved locking out of the triangle function.

The answer was suprising:

			old 		lock in		cliprect 	lock outside
			trunk		trifunc		in trifunc	trifunc
	
	gears		448		560		550		650	fps
	isosurf		56		60		60		85	fps
	trispd-50	520k		572k		567k		921k	tris/sec
	
on a celeron 400 with a v3-3000.  We are getting close to a 50% overall
speedup on this branch (and better for certain apps)...

So...  What's the catch?

Basically, to lock outside the trifuncs, I need somewhere to lock.  The
obvious place is in the RenderStart/RenderFinish driver callbacks.  The only
trouble with this is the span fallbacks: we lock in these on a per-spanline
basis.  We can remove locking from the span callbacks, and be fine on triangle
rendering.  However, the span fallbacks are also called from DrawPixels, etc. 

DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where
should the locking occur there?

To my mind, the obvious thing to do is:

	- Add RenderStart/RenderFinish calls around all possible calls to the
span/pixel functions
	- Do locking in RenderStart/RenderFinish in the tdfx driver
	- Remove locking from triangle and spanline functions in the tdfx driver

One potential problem with this is that in fallback cases we will hold the
hardware lock for the time it takes to render an entire vertex buffer of
triangles, one spanline at a time.  I propose to get around this by 'flashing'
the lock in the spanline and pixel functions, eg:

	UNLOCK_HARDWARE(fxMesa);
	LOCK_HARDWARE(fxMesa);
	...

To allow a (tiny) window for the X server or other clients to grab the lock.  

Keith

Re: [Mesa3d-dev] New tdfx driver, locking, small device driver changes

From: Brian P. <br...@va...> - 2000-10-09 20:43:38

Keith Whitwell wrote:
> 
> OK, so in examining the newest tdfx driver, I got to wondering about the calls
> to LOCK_HARDWARE inside the triangle functions.  In particular, after noticing
> the slowdown after Brian added cliprect handling to those calls, I wondered
> what would happen if I moved locking out of the triangle function.
> 
> The answer was suprising:
> 
>                         old             lock in         cliprect        lock outside
>                         trunk           trifunc         in trifunc      trifunc
> 
>         gears           448             560             550             650     fps
>         isosurf         56              60              60              85      fps
>         trispd-50       520k            572k            567k            921k    tris/sec
> 
> on a celeron 400 with a v3-3000.  We are getting close to a 50% overall
> speedup on this branch (and better for certain apps)...
> 
> So...  What's the catch?
> 
> Basically, to lock outside the trifuncs, I need somewhere to lock.  The
> obvious place is in the RenderStart/RenderFinish driver callbacks.  The only
> trouble with this is the span fallbacks: we lock in these on a per-spanline
> basis.  We can remove locking from the span callbacks, and be fine on triangle
> rendering.  However, the span fallbacks are also called from DrawPixels, etc.
> 
> DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where
> should the locking occur there?
> 
> To my mind, the obvious thing to do is:
> 
>         - Add RenderStart/RenderFinish calls around all possible calls to the
> span/pixel functions
>         - Do locking in RenderStart/RenderFinish in the tdfx driver
>         - Remove locking from triangle and spanline functions in the tdfx driver

That's what I would do.

I can add the RenderStart/Finish calls to Mesa (if you haven't already).
glClear also uses the span functions, BTW.


> One potential problem with this is that in fallback cases we will hold the
> hardware lock for the time it takes to render an entire vertex buffer of
> triangles, one spanline at a time.  I propose to get around this by 'flashing'
> the lock in the spanline and pixel functions, eg:
> 
>         UNLOCK_HARDWARE(fxMesa);
>         LOCK_HARDWARE(fxMesa);
>         ...
> 
> To allow a (tiny) window for the X server or other clients to grab the lock.

Good idea.


One more thing to consider: moving the locking to a higher level may make
debugging harder.  When the driver has the lock, the whole display is
locked so you'd have to debug from a different X display.  It would be nice
if we could choose between the two locking levels at compile time.  That
might be a bit ugly but could make life easier when debugging the driver.

-Brian

Re: [Mesa3d-dev] New tdfx driver, locking, small device driver changes

From: Keith W. <ke...@va...> - 2000-10-09 21:05:04

Brian Paul wrote:
> 

> 
> One more thing to consider: moving the locking to a higher level may make
> debugging harder.  When the driver has the lock, the whole display is
> locked so you'd have to debug from a different X display.  It would be nice
> if we could choose between the two locking levels at compile time.  That
> might be a bit ugly but could make life easier when debugging the driver.

This is feasible at compile-time, though I think that debugging remotely is
the better approach in any case.

Keith

Re: [Mesa3d-dev] New tdfx driver, locking, small device driver changes

From: Gareth H. <ga...@va...> - 2000-10-09 22:34:48

Keith Whitwell wrote:
> 
> Brian Paul wrote:
> >
> > One more thing to consider: moving the locking to a higher level may make
> > debugging harder.  When the driver has the lock, the whole display is
> > locked so you'd have to debug from a different X display.  It would be nice
> > if we could choose between the two locking levels at compile time.  That
> > might be a bit ugly but could make life easier when debugging the driver.
> 
> This is feasible at compile-time, though I think that debugging remotely is
> the better approach in any case.

I agree - I speak from personal experience that debugging on the same
machine as you're running the DRI on is typically very difficult to do. 
I think the code will be significantly cleaner if we don't allow this
option at all.

-- Gareth

Re: [Mesa3d-dev] New tdfx driver, locking, small device driver changes

From: Keith W. <ke...@va...> - 2000-10-09 21:05:57

Brian Paul wrote:
> 
> Keith Whitwell wrote:
> >
> > OK, so in examining the newest tdfx driver, I got to wondering about the calls
> > to LOCK_HARDWARE inside the triangle functions.  In particular, after noticing
> > the slowdown after Brian added cliprect handling to those calls, I wondered
> > what would happen if I moved locking out of the triangle function.
> >
> > The answer was suprising:
> >
> >                         old             lock in         cliprect        lock outside
> >                         trunk           trifunc         in trifunc      trifunc
> >
> >         gears           448             560             550             650     fps
> >         isosurf         56              60              60              85      fps
> >         trispd-50       520k            572k            567k            921k    tris/sec
> >
> > on a celeron 400 with a v3-3000.  We are getting close to a 50% overall
> > speedup on this branch (and better for certain apps)...
> >
> > So...  What's the catch?
> >
> > Basically, to lock outside the trifuncs, I need somewhere to lock.  The
> > obvious place is in the RenderStart/RenderFinish driver callbacks.  The only
> > trouble with this is the span fallbacks: we lock in these on a per-spanline
> > basis.  We can remove locking from the span callbacks, and be fine on triangle
> > rendering.  However, the span fallbacks are also called from DrawPixels, etc.
> >
> > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where
> > should the locking occur there?
> >
> > To my mind, the obvious thing to do is:
> >
> >         - Add RenderStart/RenderFinish calls around all possible calls to the
> > span/pixel functions
> >         - Do locking in RenderStart/RenderFinish in the tdfx driver
> >         - Remove locking from triangle and spanline functions in the tdfx driver
> 
> That's what I would do.
> 
> I can add the RenderStart/Finish calls to Mesa (if you haven't already).
> glClear also uses the span functions, BTW.

That would be helpful - I'm looking at other consequences of this (relating to
transition between single and multiple cliprects) at the moment.  

Keith

[Mesa3d-dev] Re: [Dri-devel] New tdfx driver, locking, small device driver changes

From: Gareth H. <ga...@va...> - 2000-10-09 23:07:32

Keith Whitwell wrote:
> 
> OK, so in examining the newest tdfx driver, I got to wondering about the calls
> to LOCK_HARDWARE inside the triangle functions.  In particular, after noticing
> the slowdown after Brian added cliprect handling to those calls, I wondered
> what would happen if I moved locking out of the triangle function.

Just before I left, I had begun implementing exactly the same
optimization.  Thanks for taking care of this!

> So...  What's the catch?
> 
> Basically, to lock outside the trifuncs, I need somewhere to lock.  The
> obvious place is in the RenderStart/RenderFinish driver callbacks.  The only
> trouble with this is the span fallbacks: we lock in these on a per-spanline
> basis.  We can remove locking from the span callbacks, and be fine on triangle
> rendering.  However, the span fallbacks are also called from DrawPixels, etc.
> 
> DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where
> should the locking occur there?
> 
> To my mind, the obvious thing to do is:
> 
>         - Add RenderStart/RenderFinish calls around all possible calls to the
> span/pixel functions
>         - Do locking in RenderStart/RenderFinish in the tdfx driver
>         - Remove locking from triangle and spanline functions in the tdfx driver

I think this is the nicest way to handle driver-side immediate mode
("direct") rendering.  If the tdfx driver buffered vertices and then
submitted them as required (like in the other MGA-style drivers) this
wouldn't be a problem.

Aside: I've been looking at taking advantage of the COMMAND_TRANSPORT
extension in Glide, which will basically allow us to bypass the Glide
triangle functions and write directly to the FIFO.  This would involve
buffering of vertices and submitting them in a bunch.  Only this
submission would require the hardware lock, just like in the other
drivers.  If only I had a damned machine...

> One potential problem with this is that in fallback cases we will hold the
> hardware lock for the time it takes to render an entire vertex buffer of
> triangles, one spanline at a time.  I propose to get around this by 'flashing'
> the lock in the spanline and pixel functions, eg:
> 
>         UNLOCK_HARDWARE(fxMesa);
>         LOCK_HARDWARE(fxMesa);
>         ...
> 
> To allow a (tiny) window for the X server or other clients to grab the lock.

This looks like a nice way to go.

-- Gareth

[Mesa3d-dev] Re: [Dri-devel] New tdfx driver, locking, small device driver changes

From: Nathan H. <na...@ma...> - 2000-10-10 01:48:03

On Mon, Oct 09, 2000 at 01:55:53PM -0600, Keith Whitwell wrote:
> OK, so in examining the newest tdfx driver, I got to wondering about the calls
> to LOCK_HARDWARE inside the triangle functions.  In particular, after noticing
> the slowdown after Brian added cliprect handling to those calls, I wondered
> what would happen if I moved locking out of the triangle function.
>
> The answer was suprising:
> 
> 			old 		lock in		cliprect 	lock outside
> 			trunk		trifunc		in trifunc	trifunc
> 	
> 	gears		448		560		550		650	fps
> 	isosurf		56		60		60		85	fps
> 	trispd-50	520k		572k		567k		921k	tris/sec
> 	
> on a celeron 400 with a v3-3000.  We are getting close to a 50% overall
> speedup on this branch (and better for certain apps)...
> 
> So...  What's the catch?
>
> Basically, to lock outside the trifuncs, I need somewhere to lock.  The
> obvious place is in the RenderStart/RenderFinish driver callbacks.  The only
> trouble with this is the span fallbacks: we lock in these on a per-spanline
> basis.  We can remove locking from the span callbacks, and be fine on triangle
> rendering.  However, the span fallbacks are also called from DrawPixels, etc. 
> 
> DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where
> should the locking occur there?
> 
> To my mind, the obvious thing to do is:
> 
> 	- Add RenderStart/RenderFinish calls around all possible calls to the
> span/pixel functions
> 	- Do locking in RenderStart/RenderFinish in the tdfx driver
> 	- Remove locking from triangle and spanline functions in the tdfx driver

That would be great. It'd make all the drivers simpler because they can
rely on the cliprects not "changing from under them" while rendering.

> One potential problem with this is that in fallback cases we will hold the
> hardware lock for the time it takes to render an entire vertex buffer of
> triangles, one spanline at a time.  I propose to get around this by 'flashing'
> the lock in the spanline and pixel functions, eg:
> 
> 	UNLOCK_HARDWARE(fxMesa);
> 	LOCK_HARDWARE(fxMesa);
> 	...
> 
> To allow a (tiny) window for the X server or other clients to grab the lock.  

Hrm.

We can't do that. Imagine the case where the span functions are writing
some form of large slow blit (say a DrawPixels into a large region with
many cliprects). If the X server gets any time then the user could move
a window over the region then the clip rects could potentially change.

The result is the span functions are working on outdated clip rects and
you get nasty screen corruptions.

This was what we had before and this is exactly why BEGIN/END_CLIP_LOOP
were added. We were getting screen corruptions in the span functions.

Now I agree the clip loops need to go elsewhere. They are a performance
diaster. In the pathological case (5-6 cliprects) you will be rendering
the same triangle 5-6 times, degrading performance to 15%. But the real
problem is that clip loops make the normal case (1 clip rect) slower.

I think your proposal is good, but we can't do this "unlock/lock" thing
or the problems all come back (though in a smaller time window).

[Mesa3d-dev] Re: [Dri-devel] New tdfx driver, locking, small device driver changes

From: Nathan H. <na...@ma...> - 2000-10-10 02:08:52

On Mon, Oct 09, 2000 at 01:55:53PM -0600, Keith Whitwell wrote:
> 
> One potential problem with this is that in fallback cases we will hold the
> hardware lock for the time it takes to render an entire vertex buffer of
> triangles, one spanline at a time.  I propose to get around this by 'flashing'
> the lock in the spanline and pixel functions, eg:
> 
> 	UNLOCK_HARDWARE(fxMesa);
> 	LOCK_HARDWARE(fxMesa);
> 	...
> 
> To allow a (tiny) window for the X server or other clients to grab the lock.  

Instead of putting the UNLOCK/LOCK pairs inside the span functions, and
having LOCK/UNLOCK around the slow fallbacks, how about having the slow 
fallback default to nolock but with LOCK/UNLOCK around each call to the
span functions.

The cliprects would be reworked (if necessary) around each span, inside
the LOCK/UNLOCK pair and just before drawing the span.

The other option is to extend the X server for two forms of lock: 1 for
changing the cliprects and 1 for everything else. Then you can lock the
cliprects around the whole fallback and unlock everything else with the
"flashing" concept.