From: Keith W. <ke...@va...> - 2000-10-09 19:54:11
|
OK, so in examining the newest tdfx driver, I got to wondering about the calls to LOCK_HARDWARE inside the triangle functions. In particular, after noticing the slowdown after Brian added cliprect handling to those calls, I wondered what would happen if I moved locking out of the triangle function. The answer was suprising: old lock in cliprect lock outside trunk trifunc in trifunc trifunc gears 448 560 550 650 fps isosurf 56 60 60 85 fps trispd-50 520k 572k 567k 921k tris/sec on a celeron 400 with a v3-3000. We are getting close to a 50% overall speedup on this branch (and better for certain apps)... So... What's the catch? Basically, to lock outside the trifuncs, I need somewhere to lock. The obvious place is in the RenderStart/RenderFinish driver callbacks. The only trouble with this is the span fallbacks: we lock in these on a per-spanline basis. We can remove locking from the span callbacks, and be fine on triangle rendering. However, the span fallbacks are also called from DrawPixels, etc. DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where should the locking occur there? To my mind, the obvious thing to do is: - Add RenderStart/RenderFinish calls around all possible calls to the span/pixel functions - Do locking in RenderStart/RenderFinish in the tdfx driver - Remove locking from triangle and spanline functions in the tdfx driver One potential problem with this is that in fallback cases we will hold the hardware lock for the time it takes to render an entire vertex buffer of triangles, one spanline at a time. I propose to get around this by 'flashing' the lock in the spanline and pixel functions, eg: UNLOCK_HARDWARE(fxMesa); LOCK_HARDWARE(fxMesa); ... To allow a (tiny) window for the X server or other clients to grab the lock. Keith |
From: Brian P. <br...@va...> - 2000-10-09 20:43:38
|
Keith Whitwell wrote: > > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. > > The answer was suprising: > > old lock in cliprect lock outside > trunk trifunc in trifunc trifunc > > gears 448 560 550 650 fps > isosurf 56 60 60 85 fps > trispd-50 520k 572k 567k 921k tris/sec > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > speedup on this branch (and better for certain apps)... > > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver That's what I would do. I can add the RenderStart/Finish calls to Mesa (if you haven't already). glClear also uses the span functions, BTW. > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Good idea. One more thing to consider: moving the locking to a higher level may make debugging harder. When the driver has the lock, the whole display is locked so you'd have to debug from a different X display. It would be nice if we could choose between the two locking levels at compile time. That might be a bit ugly but could make life easier when debugging the driver. -Brian |
From: Keith W. <ke...@va...> - 2000-10-09 21:05:04
|
Brian Paul wrote: > > > One more thing to consider: moving the locking to a higher level may make > debugging harder. When the driver has the lock, the whole display is > locked so you'd have to debug from a different X display. It would be nice > if we could choose between the two locking levels at compile time. That > might be a bit ugly but could make life easier when debugging the driver. This is feasible at compile-time, though I think that debugging remotely is the better approach in any case. Keith |
From: Gareth H. <ga...@va...> - 2000-10-09 22:34:48
|
Keith Whitwell wrote: > > Brian Paul wrote: > > > > One more thing to consider: moving the locking to a higher level may make > > debugging harder. When the driver has the lock, the whole display is > > locked so you'd have to debug from a different X display. It would be nice > > if we could choose between the two locking levels at compile time. That > > might be a bit ugly but could make life easier when debugging the driver. > > This is feasible at compile-time, though I think that debugging remotely is > the better approach in any case. I agree - I speak from personal experience that debugging on the same machine as you're running the DRI on is typically very difficult to do. I think the code will be significantly cleaner if we don't allow this option at all. -- Gareth |
From: Keith W. <ke...@va...> - 2000-10-09 21:05:57
|
Brian Paul wrote: > > Keith Whitwell wrote: > > > > OK, so in examining the newest tdfx driver, I got to wondering about the calls > > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > > the slowdown after Brian added cliprect handling to those calls, I wondered > > what would happen if I moved locking out of the triangle function. > > > > The answer was suprising: > > > > old lock in cliprect lock outside > > trunk trifunc in trifunc trifunc > > > > gears 448 560 550 650 fps > > isosurf 56 60 60 85 fps > > trispd-50 520k 572k 567k 921k tris/sec > > > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > > speedup on this branch (and better for certain apps)... > > > > So... What's the catch? > > > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > > trouble with this is the span fallbacks: we lock in these on a per-spanline > > basis. We can remove locking from the span callbacks, and be fine on triangle > > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > > should the locking occur there? > > > > To my mind, the obvious thing to do is: > > > > - Add RenderStart/RenderFinish calls around all possible calls to the > > span/pixel functions > > - Do locking in RenderStart/RenderFinish in the tdfx driver > > - Remove locking from triangle and spanline functions in the tdfx driver > > That's what I would do. > > I can add the RenderStart/Finish calls to Mesa (if you haven't already). > glClear also uses the span functions, BTW. That would be helpful - I'm looking at other consequences of this (relating to transition between single and multiple cliprects) at the moment. Keith |
From: Gareth H. <ga...@va...> - 2000-10-09 23:07:32
|
Keith Whitwell wrote: > > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. Just before I left, I had begun implementing exactly the same optimization. Thanks for taking care of this! > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver I think this is the nicest way to handle driver-side immediate mode ("direct") rendering. If the tdfx driver buffered vertices and then submitted them as required (like in the other MGA-style drivers) this wouldn't be a problem. Aside: I've been looking at taking advantage of the COMMAND_TRANSPORT extension in Glide, which will basically allow us to bypass the Glide triangle functions and write directly to the FIFO. This would involve buffering of vertices and submitting them in a bunch. Only this submission would require the hardware lock, just like in the other drivers. If only I had a damned machine... > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. This looks like a nice way to go. -- Gareth |
From: Nathan H. <na...@ma...> - 2000-10-10 01:48:03
|
On Mon, Oct 09, 2000 at 01:55:53PM -0600, Keith Whitwell wrote: > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. > > The answer was suprising: > > old lock in cliprect lock outside > trunk trifunc in trifunc trifunc > > gears 448 560 550 650 fps > isosurf 56 60 60 85 fps > trispd-50 520k 572k 567k 921k tris/sec > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > speedup on this branch (and better for certain apps)... > > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver That would be great. It'd make all the drivers simpler because they can rely on the cliprects not "changing from under them" while rendering. > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Hrm. We can't do that. Imagine the case where the span functions are writing some form of large slow blit (say a DrawPixels into a large region with many cliprects). If the X server gets any time then the user could move a window over the region then the clip rects could potentially change. The result is the span functions are working on outdated clip rects and you get nasty screen corruptions. This was what we had before and this is exactly why BEGIN/END_CLIP_LOOP were added. We were getting screen corruptions in the span functions. Now I agree the clip loops need to go elsewhere. They are a performance diaster. In the pathological case (5-6 cliprects) you will be rendering the same triangle 5-6 times, degrading performance to 15%. But the real problem is that clip loops make the normal case (1 clip rect) slower. I think your proposal is good, but we can't do this "unlock/lock" thing or the problems all come back (though in a smaller time window). |
From: Nathan H. <na...@ma...> - 2000-10-10 02:08:52
|
On Mon, Oct 09, 2000 at 01:55:53PM -0600, Keith Whitwell wrote: > > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Instead of putting the UNLOCK/LOCK pairs inside the span functions, and having LOCK/UNLOCK around the slow fallbacks, how about having the slow fallback default to nolock but with LOCK/UNLOCK around each call to the span functions. The cliprects would be reworked (if necessary) around each span, inside the LOCK/UNLOCK pair and just before drawing the span. The other option is to extend the X server for two forms of lock: 1 for changing the cliprects and 1 for everything else. Then you can lock the cliprects around the whole fallback and unlock everything else with the "flashing" concept. |