From: Nathan H. <na...@ma...> - 2000-10-10 01:48:03
|
On Mon, Oct 09, 2000 at 01:55:53PM -0600, Keith Whitwell wrote: > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. > > The answer was suprising: > > old lock in cliprect lock outside > trunk trifunc in trifunc trifunc > > gears 448 560 550 650 fps > isosurf 56 60 60 85 fps > trispd-50 520k 572k 567k 921k tris/sec > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > speedup on this branch (and better for certain apps)... > > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver That would be great. It'd make all the drivers simpler because they can rely on the cliprects not "changing from under them" while rendering. > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Hrm. We can't do that. Imagine the case where the span functions are writing some form of large slow blit (say a DrawPixels into a large region with many cliprects). If the X server gets any time then the user could move a window over the region then the clip rects could potentially change. The result is the span functions are working on outdated clip rects and you get nasty screen corruptions. This was what we had before and this is exactly why BEGIN/END_CLIP_LOOP were added. We were getting screen corruptions in the span functions. Now I agree the clip loops need to go elsewhere. They are a performance diaster. In the pathological case (5-6 cliprects) you will be rendering the same triangle 5-6 times, degrading performance to 15%. But the real problem is that clip loops make the normal case (1 clip rect) slower. I think your proposal is good, but we can't do this "unlock/lock" thing or the problems all come back (though in a smaller time window). |