From: Brian P. <br...@va...> - 2000-10-09 20:43:38
|
Keith Whitwell wrote: > > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. > > The answer was suprising: > > old lock in cliprect lock outside > trunk trifunc in trifunc trifunc > > gears 448 560 550 650 fps > isosurf 56 60 60 85 fps > trispd-50 520k 572k 567k 921k tris/sec > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > speedup on this branch (and better for certain apps)... > > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver That's what I would do. I can add the RenderStart/Finish calls to Mesa (if you haven't already). glClear also uses the span functions, BTW. > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Good idea. One more thing to consider: moving the locking to a higher level may make debugging harder. When the driver has the lock, the whole display is locked so you'd have to debug from a different X display. It would be nice if we could choose between the two locking levels at compile time. That might be a bit ugly but could make life easier when debugging the driver. -Brian |