Modern GPUs (and presumably many non-modern GPUs too!) work in 'quad pipelines'. This requires triangles to be suffciently large to deliver good efficiency; if they're just 1 pixel large, then you'll only be 25% as fast as with a single triangle covering the entire screen (okay, extreme case, but you get the point).
Furthermore, triangle setup can become a severe bottleneck with small triangles; in fact, for a Z-Pass, the average triangle size needs to be ridiculously large to achieve very high efficiency on the latest GPUs, especially those from NVIDIA. And when I say ridiculously, I mean it.
And there's yet another factor to consider: framebuffer compression efficiency and MSAA. Compression will obviously be worse with a lot of small triangles that have different Z planes, so you're more likely to be bandwidth limited; this wouldn't be a big deal in theory, because you'd get killed by other stages first, but there's a catch: if you write to a shadowmap buffer, it'll stay in memory compressed when you read it back in your main pass. If you got a lot of small triangles, your main pass is more likely to be bandwidth limited, providing yet another source of nearly impossible to identify but yet very real slowdown.
Finally, consider what happens with MSAA. You're much more likely to have multiple unique triangles per pixel; therefore, your pixel shading efficiency might be lower than 25% what it would be otherwise. All these things add up bigtime, and they're also why LODs are important, and not just for the obvious reasons and traditional bottlenecks!
Now, in the future (aka: PC devs, forget about it because you'll need to wait forever for it to be standard. Console devs, wait for PS4/Xbox720), triangle setup will be done in the shader core, so the peak performance will be much higher. The pixel shading problem will remain however: one possibility to make that substantially less problematic is deferred rendering and associated exotic techniques. If we want to reduce average triangle size significantly in the future, then that's very likely the way forward IMO.
P.S.: I've been lurking here for a long time but I just realized this is actually my first post (or at as far as I can remember) - so hello everyone! :)
P.P.S.: I sent this with the wrong e-mail address so it got in the moderation queue, oops. Resending now with the one I registered with; or at least I hope that's it...
----- Original Message -----
From: Jeff Russell
To: Game Development Algorithms
Sent: Wednesday, May 28, 2008 8:30 PM
Subject: [Algorithms] why not geometry?

So, say hypothetically that a person wanted to render a tree trunk. The typical approach is to use solid triangles for the trunk and a few major branches and use alpha-tested or blended geometry for the smaller branches and twigs, the goal being of course to reduce the triangle count to get good performance. One then ends up rendering for example something like 1-2k triangles instead of the 50k or more that might be required to render the 'full' tree. Seems logical.

But my question is - is this still wise? And if so, why?


- A typical vertex shader often compiles into about as many instructions as a fragment shader nowadays (for us in the 50-100 range). I would expect then that from a pure computation standpoint computing a single vertex would be roughly as fast as a single pixel on a modern GPU, since vertices and fragments now share ALU in most cases.

- Geometry in this case saves fill rate. Even if the tradeoff for vertices to fragments isnt 1:1, drawing your tree branches with geometry will avoid the massive overdraw that large alpha-tested triangles can incur. Even proper sorting won't save all those fragments that get discarded by the alpha test.

- Geometry looks better! You get antialiasing, proper depth buffer interaction, parallax, etc. all of which are trickier to attain in full for impostor geometry.

- The advent of the 'geometry shader' makes replicating parts of your data stream more feasible. I could see a situation of local geometry instancing where a large branch has the same 'twig' present in several different positions/orientations/scales along its length. This means you allow maybe 100 tris in your vertex buffer for a given twig, and the geometry shader replicates it at draw time using a constant table to maybe 16 instances or so to fill out your tree. Then you'd get a very large number of triangles generated by a single draw call that only passes in a fraction of the data.

I suppose maybe the memory problems alone could be what hurt vertex processing so much, given there's so much data flying around. Plus I've heard that geometry shaders aren't even very fast yet.

Any thoughts? Has anyone gone down this road - using lots of triangles in place of flat textures?

Jeff Russell

This email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.

GDAlgorithms-list mailing list