From: Roland S. <rsc...@hi...> - 2005-08-25 15:15:12
|
I've looked at that crossbar patch for r200 again and improved it a bit. It will now - disable texture sampling of units if the result is not used - reorder tex env instructions to be always in-order on the gpu (according to earlier tests, this can make a performance difference, http://marc.theaimsgroup.com/?l=dri-devel&m=112308244205670&w=2, though I've yet to find an app which doesn't enable the units in-order, the only thing in real world I've found which doesn't was a marbleblastdemo, and it only doesn't because it fails the texture completeness test, not because it actually doesn't enable the unit...) - tries to optimize away env instructions. This is not a general optimizer, which would be very hard to do anyway and more or less impossible due to the requirement of OpenGL to clamp the results after each stage, but it will try to ditch the tex env if it is GL_REPLACE (for both rgb and alpha) by replacing the args in the next tex env. Seems to work, for instance ut2003 sometimes uses tex envs with 4 units enabled, and the optimizer reduces this to 3 sampled textures, and 2 env instructions. Impressive, isn't it? Unfortunately this makes absolutely no difference in performance... (ut2003 is horribly limited by vertex throughput with the current state of the driver, and anything which causes more cpu cycles to be used will probably make it slower, no matter how many gpu cycles this might save, plus I believe these tex envs which can be optimized are only used for small parts of the screen (powerups maybe).) It MIGHT make more of a performance difference with radeon 8500/9100, as those can sample more textures per pass (at least under some circumstances afaik), but have the same amout of arithmetic resources (afaik). Does this look somewhat reasonable? The code is a bit ugly (especially the GL_REPLACE env optimize stuff), I don't like that the env args have to be parsed two times, and it does cause some more cpu cycles spent (roughly 2.5 times as much as previously in the driver's tex env functions according to some quick profiling, it was still only 0.2 percent or so however). But there doesn't seem to be a good way to clean it up (without making it quite a bit slower at least). Roland |