Thread: [Algorithms] why not geometry?
Brought to you by:
vexxed72
|
From: Jeff R. <je...@gm...> - 2008-05-28 18:31:16
|
So, say hypothetically that a person wanted to render a tree trunk. The typical approach is to use solid triangles for the trunk and a few major branches and use alpha-tested or blended geometry for the smaller branches and twigs, the goal being of course to reduce the triangle count to get good performance. One then ends up rendering for example something like 1-2k triangles instead of the 50k or more that might be required to render the 'full' tree. Seems logical. But my question is - is this still wise? And if so, why? Consider: - A typical vertex shader often compiles into about as many instructions as a fragment shader nowadays (for us in the 50-100 range). I would expect then that from a pure computation standpoint computing a single vertex would be roughly as fast as a single pixel on a modern GPU, since vertices and fragments now share ALU in most cases. - Geometry in this case saves fill rate. Even if the tradeoff for vertices to fragments isnt 1:1, drawing your tree branches with geometry will avoid the massive overdraw that large alpha-tested triangles can incur. Even proper sorting won't save all those fragments that get discarded by the alpha test. - Geometry looks better! You get antialiasing, proper depth buffer interaction, parallax, etc. all of which are trickier to attain in full for impostor geometry. - The advent of the 'geometry shader' makes replicating parts of your data stream more feasible. I could see a situation of local geometry instancing where a large branch has the same 'twig' present in several different positions/orientations/scales along its length. This means you allow maybe 100 tris in your vertex buffer for a given twig, and the geometry shader replicates it at draw time using a constant table to maybe 16 instances or so to fill out your tree. Then you'd get a very large number of triangles generated by a single draw call that only passes in a fraction of the data. I suppose maybe the memory problems alone could be what hurt vertex processing so much, given there's so much data flying around. Plus I've heard that geometry shaders aren't even very fast yet. Any thoughts? Has anyone gone down this road - using lots of triangles in place of flat textures? Jeff Russell |
|
From: Paul at H. <pa...@ru...> - 2008-05-28 20:02:33
|
At a recent XFest in Reading (UK), MS presented exactly this tree scenario as a way to use DX10 instancing to draw trees exactly like that. There were actually stacks of them on screen too, allegedly from just 4 draw calls. My problem with this is that it's a pipe dream imo. Until some distant time in the future when everyone has an SM4.0 card and the *next* set of consoles is out, you can't put this stuff in commercial games, as it's like developing two different titles at the same time. There are simply no convenient fallbacks for this stuff which means you need two codepaths and two sets of art. The final nail for me is that (apart from a very narrow space where it's the best target,) PC games are rarely the lead version. A big budget title is going to be for 360 and/or PS3 with sometimes a PC version thrown in and powerful though they are, neither console can approximate this technique well. So what's the point? Bloody impressive demos though. Hope I'm not starting a flamewar here. Regards, Paul Johnson www.rubicondev.com ----- Original Message ----- From: Jeff Russell To: Game Development Algorithms Sent: Wednesday, May 28, 2008 7:30 PM Subject: [Algorithms] why not geometry? So, say hypothetically that a person wanted to render a tree trunk. The typical approach is to use solid triangles for the trunk and a few major branches and use alpha-tested or blended geometry for the smaller branches and twigs, the goal being of course to reduce the triangle count to get good performance. One then ends up rendering for example something like 1-2k triangles instead of the 50k or more that might be required to render the 'full' tree. Seems logical. But my question is - is this still wise? And if so, why? Consider: - A typical vertex shader often compiles into about as many instructions as a fragment shader nowadays (for us in the 50-100 range). I would expect then that from a pure computation standpoint computing a single vertex would be roughly as fast as a single pixel on a modern GPU, since vertices and fragments now share ALU in most cases. - Geometry in this case saves fill rate. Even if the tradeoff for vertices to fragments isnt 1:1, drawing your tree branches with geometry will avoid the massive overdraw that large alpha-tested triangles can incur. Even proper sorting won't save all those fragments that get discarded by the alpha test. - Geometry looks better! You get antialiasing, proper depth buffer interaction, parallax, etc. all of which are trickier to attain in full for impostor geometry. - The advent of the 'geometry shader' makes replicating parts of your data stream more feasible. I could see a situation of local geometry instancing where a large branch has the same 'twig' present in several different positions/orientations/scales along its length. This means you allow maybe 100 tris in your vertex buffer for a given twig, and the geometry shader replicates it at draw time using a constant table to maybe 16 instances or so to fill out your tree. Then you'd get a very large number of triangles generated by a single draw call that only passes in a fraction of the data. I suppose maybe the memory problems alone could be what hurt vertex processing so much, given there's so much data flying around. Plus I've heard that geometry shaders aren't even very fast yet. Any thoughts? Has anyone gone down this road - using lots of triangles in place of flat textures? Jeff Russell ------------------------------------------------------------------------------ ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ------------------------------------------------------------------------------ _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
|
From: Jeff R. <je...@gm...> - 2008-05-28 21:13:43
|
Have a link, by any chance? Were they just discussing instancing of a single large mesh (1 full tree), or duplicating several small branches? Jeff On Wed, May 28, 2008 at 3:02 PM, Paul at Home <pa...@ru...> wrote: > At a recent XFest in Reading (UK), MS presented exactly this tree scenario > as a way to use DX10 instancing to draw trees exactly like that. There were > actually stacks of them on screen too, allegedly from just 4 draw calls. > > My problem with this is that it's a pipe dream imo. Until some distant time > in the future when everyone has an SM4.0 card and the *next* set of consoles > is out, you can't put this stuff in commercial games, as it's like > developing two different titles at the same time. There are simply no > convenient fallbacks for this stuff which means you need two codepaths and > two sets of art. > > The final nail for me is that (apart from a very narrow space where it's the > best target,) PC games are rarely the lead version. A big budget title is > going to be for 360 and/or PS3 with sometimes a PC version thrown in and > powerful though they are, neither console can approximate this technique > well. So what's the point? > > Bloody impressive demos though. Hope I'm not starting a flamewar here. > Regards, > Paul Johnson > www.rubicondev.com > > > > ----- Original Message ----- > From: Jeff Russell > To: Game Development Algorithms > Sent: Wednesday, May 28, 2008 7:30 PM > Subject: [Algorithms] why not geometry? > So, say hypothetically that a person wanted to render a tree trunk. The > typical approach is to use solid triangles for the trunk and a few major > branches and use alpha-tested or blended geometry for the smaller branches > and twigs, the goal being of course to reduce the triangle count to get good > performance. One then ends up rendering for example something like 1-2k > triangles instead of the 50k or more that might be required to render the > 'full' tree. Seems logical. > > But my question is - is this still wise? And if so, why? > > Consider: > > - A typical vertex shader often compiles into about as many instructions as > a fragment shader nowadays (for us in the 50-100 range). I would expect then > that from a pure computation standpoint computing a single vertex would be > roughly as fast as a single pixel on a modern GPU, since vertices and > fragments now share ALU in most cases. > > - Geometry in this case saves fill rate. Even if the tradeoff for vertices > to fragments isnt 1:1, drawing your tree branches with geometry will avoid > the massive overdraw that large alpha-tested triangles can incur. Even > proper sorting won't save all those fragments that get discarded by the > alpha test. > > - Geometry looks better! You get antialiasing, proper depth buffer > interaction, parallax, etc. all of which are trickier to attain in full for > impostor geometry. > > - The advent of the 'geometry shader' makes replicating parts of your data > stream more feasible. I could see a situation of local geometry instancing > where a large branch has the same 'twig' present in several different > positions/orientations/scales along its length. This means you allow maybe > 100 tris in your vertex buffer for a given twig, and the geometry shader > replicates it at draw time using a constant table to maybe 16 instances or > so to fill out your tree. Then you'd get a very large number of triangles > generated by a single draw call that only passes in a fraction of the data. > > I suppose maybe the memory problems alone could be what hurt vertex > processing so much, given there's so much data flying around. Plus I've > heard that geometry shaders aren't even very fast yet. > Any thoughts? Has anyone gone down this road - using lots of triangles in > place of flat textures? > > Jeff Russell > > ________________________________ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > ________________________________ > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
|
From: Paul at H. <pa...@ru...> - 2008-05-28 21:40:17
|
No links, didn't even remember to pick up the conf materials at the time. :( They built the whole thing from geom. One instance of a trunk did all the branches, another instance of this lost made all the trees, etc. Regards, Paul Johnson www.rubicondev.com ----- Original Message ----- From: "Jeff Russell" <je...@gm...> To: "Game Development Algorithms" <gda...@li...> Sent: Wednesday, May 28, 2008 10:13 PM Subject: Re: [Algorithms] why not geometry? > Have a link, by any chance? Were they just discussing instancing of a > single large mesh (1 full tree), or duplicating several small > branches? > > Jeff > > On Wed, May 28, 2008 at 3:02 PM, Paul at Home <pa...@ru...> wrote: >> At a recent XFest in Reading (UK), MS presented exactly this tree >> scenario >> as a way to use DX10 instancing to draw trees exactly like that. There >> were >> actually stacks of them on screen too, allegedly from just 4 draw calls. >> >> My problem with this is that it's a pipe dream imo. Until some distant >> time >> in the future when everyone has an SM4.0 card and the *next* set of >> consoles >> is out, you can't put this stuff in commercial games, as it's like >> developing two different titles at the same time. There are simply no >> convenient fallbacks for this stuff which means you need two codepaths >> and >> two sets of art. >> >> The final nail for me is that (apart from a very narrow space where it's >> the >> best target,) PC games are rarely the lead version. A big budget title is >> going to be for 360 and/or PS3 with sometimes a PC version thrown in and >> powerful though they are, neither console can approximate this technique >> well. So what's the point? >> >> Bloody impressive demos though. Hope I'm not starting a flamewar here. >> Regards, >> Paul Johnson >> www.rubicondev.com >> >> >> >> ----- Original Message ----- >> From: Jeff Russell >> To: Game Development Algorithms >> Sent: Wednesday, May 28, 2008 7:30 PM >> Subject: [Algorithms] why not geometry? >> So, say hypothetically that a person wanted to render a tree trunk. The >> typical approach is to use solid triangles for the trunk and a few major >> branches and use alpha-tested or blended geometry for the smaller >> branches >> and twigs, the goal being of course to reduce the triangle count to get >> good >> performance. One then ends up rendering for example something like 1-2k >> triangles instead of the 50k or more that might be required to render the >> 'full' tree. Seems logical. >> >> But my question is - is this still wise? And if so, why? >> >> Consider: >> >> - A typical vertex shader often compiles into about as many instructions >> as >> a fragment shader nowadays (for us in the 50-100 range). I would expect >> then >> that from a pure computation standpoint computing a single vertex would >> be >> roughly as fast as a single pixel on a modern GPU, since vertices and >> fragments now share ALU in most cases. >> >> - Geometry in this case saves fill rate. Even if the tradeoff for >> vertices >> to fragments isnt 1:1, drawing your tree branches with geometry will >> avoid >> the massive overdraw that large alpha-tested triangles can incur. Even >> proper sorting won't save all those fragments that get discarded by the >> alpha test. >> >> - Geometry looks better! You get antialiasing, proper depth buffer >> interaction, parallax, etc. all of which are trickier to attain in full >> for >> impostor geometry. >> >> - The advent of the 'geometry shader' makes replicating parts of your >> data >> stream more feasible. I could see a situation of local geometry >> instancing >> where a large branch has the same 'twig' present in several different >> positions/orientations/scales along its length. This means you allow >> maybe >> 100 tris in your vertex buffer for a given twig, and the geometry shader >> replicates it at draw time using a constant table to maybe 16 instances >> or >> so to fill out your tree. Then you'd get a very large number of triangles >> generated by a single draw call that only passes in a fraction of the >> data. >> >> I suppose maybe the memory problems alone could be what hurt vertex >> processing so much, given there's so much data flying around. Plus I've >> heard that geometry shaders aren't even very fast yet. >> Any thoughts? Has anyone gone down this road - using lots of triangles in >> place of flat textures? >> >> Jeff Russell >> >> ________________________________ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> >> ________________________________ >> >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > |
|
From: Eric H. <eri...@gm...> - 2008-06-05 00:41:01
|
I would guess this demo is pretty much the same as the "Instancing10" demo code that comes with the MS DirectX SDK sample browser: 3 calls for all the trees. Paul at Home wrote: > No links, didn't even remember to pick up the conf materials at the time. :( > > They built the whole thing from geom. One instance of a trunk did all the > branches, another instance of this lost made all the trees, etc. > >> Have a link, by any chance? Were they just discussing instancing of a >> single large mesh (1 full tree), or duplicating several small >> branches? >> >> Jeff >> |
|
From: Rob D. <bli...@go...> - 2008-05-30 12:04:44
|
One more reason why its not a good idea ... When you use extremely thin triangles, as you would get with the smallest twigs, you inevitably reach a point where the triangle doesn't cover a full set of pixel centres, and so doesn't get rendered in some parts. So you end up with gaps in your twigs. By comparison, the alpha blended version will instead be reducing the alpha level (because it moves into lower mipmaps of the texture), and so they will gradually fade out, without bits vanishing inappropriately. You are right of course that sorting is never quite right, and there are other issues, but you cannot get away from the fact that once triangles become very small or very thin, they no longer behave nicely. |
|
From: Eric H. <eri...@gm...> - 2008-06-04 13:35:19
|
I've been surveying the literature for how best to display large models when using rasterization. Frustum culling, LOD, occlusion culling - sure. What I'm most interested in is what sort of efficiency structure is best to use for large models. I personally favor a BSP tree, with each object's bounding box getting inserted in one location in the tree (unlike ray tracing, which has different needs, where objects are often pushed towards the leaves and duplicated). BSPs can be used for frustum culling, the cells can be used with LOD (too small a cell, toss it out or approximate by drawing a box), and with the BSP providing front to back traversal, occlusion culling can also be folded in fairly effectively. Even without occlusion culling, front to back allows less pixel shader evaluation, etc. My questions for the day: 1) Is there some other scheme I should be seriously considering? Spheres for bounding volumes are quicker to test for frustum culling, but the spheres themselves seem pretty inefficient for arbitrary groups of geometry. 2) Bounding box hierarchies might actually be better than BSP trees, but it's trickier to form them well. Really, I like using BSP tree formation algorithms that allow the creation of an efficient hierarchy, but for rasterization I can afford to let the two child boxes of a parent actually overlap - a strict front-to-back ordering is not required, it's more that you want the child bounding boxes to be as tight as possible. Efficient bounding volume creation for the purposes of rasterization seems underexplored, but maybe I'm missing out on someone's work here. 3) Memory locality is a big deal nowadays. Do people have any experience with van Emde Boas schemes to localize chunks of the tree, or other methods to improve locality (e.g. flattening the tree)? These seem good on paper, but I'd like to know before I spend a few days implementing and find no gain whatever (or, hopefully, large gains). The problem with BSP is that, if you want front to back traversal, the order of traversal does change depending on the view. 4) I'm favoring spending time making an efficient BSP because most of my data is static, not animated, so the time I put into making a BSP is paid back. However, the build cost of a BSP is noticeable. Animated objects I'm currently just putting in a list without any efficiency structure. For animations involving a lot of objects, I've wondered if something like loose octrees or some other "quick to create" efficiency structure would pay off in terms of time saved culling vs. time spent forming and maintaining them. Anyway, I'm interested in anyone's practical experiences here. As usual, I suspect it's a case of "it all depends", but I'd love to know what people have found useful and useless. The one cute little result I thought I'd share: for ray-tracing, the surface area heuristic (SAH) gets used in forming efficient BSP trees. That is, the efficiency of a split of a box is measured by taking the number of objects in a child box times the area of the child's box. The lower the summed value for the two children and for objects split by the cutting plane, the more likely the split is gaining you efficiency. What's interesting for rasterization is that frustum culling is a process of planes intersecting boxes, unlike ray tracing, where it's rays vs. boxes. When shooting rays, you want to use the surface area of the box for the relative probability a random ray hits the box. When testing planes, you want to use the mean width of the box. The mean width is proportional to the sum of the lengths of the sides of a box (or the radius of a sphere). Using the mean width to decide on the efficiency of a box gives me more what I would expect for a BSP tree for rasterization: the leaf nodes tend to have 10 children or so, as further splitting is less effective than with traditional SAH. This matches my own experience with forming such hierarchies by simple median-split schemes: the leaf nodes are less effective if you try to chop them too fine. All for now, Eric |
|
From: Stefan S. <kef...@gm...> - 2008-06-04 13:58:54
|
If you're talking software rasterization, a bsp coupled with a pvs is probably the most efficient means to render static geometry, as proven by quake, I guess.. But that's not a great solution for current state of the art in gpu rasterization, you'd probably be better off with a portal solution guided by occlusion queries.. As far as frustum culling is concerned, early out is the way to go.. (what do you mean when you say 'large models', do you mean a 1M poly dragon, or indoor fps style level, or crysis style world?) Eric Haines wrote: > I've been surveying the literature for how best to display large models > when using rasterization. Frustum culling, LOD, occlusion culling - > sure. What I'm most interested in is what sort of efficiency structure > is best to use for large models. I personally favor a BSP tree, with > each object's bounding box getting inserted in one location in the tree > (unlike ray tracing, which has different needs, where objects are often > pushed towards the leaves and duplicated). BSPs can be used for frustum > culling, the cells can be used with LOD (too small a cell, toss it out > or approximate by drawing a box), and with the BSP providing front to > back traversal, occlusion culling can also be folded in fairly > effectively. Even without occlusion culling, front to back allows less > pixel shader evaluation, etc. > > My questions for the day: > 1) Is there some other scheme I should be seriously considering? Spheres > for bounding volumes are quicker to test for frustum culling, but the > spheres themselves seem pretty inefficient for arbitrary groups of geometry. > > 2) Bounding box hierarchies might actually be better than BSP trees, but > it's trickier to form them well. Really, I like using BSP tree formation > algorithms that allow the creation of an efficient hierarchy, but for > rasterization I can afford to let the two child boxes of a parent > actually overlap - a strict front-to-back ordering is not required, it's > more that you want the child bounding boxes to be as tight as possible. > Efficient bounding volume creation for the purposes of rasterization > seems underexplored, but maybe I'm missing out on someone's work here. > > 3) Memory locality is a big deal nowadays. Do people have any experience > with van Emde Boas schemes to localize chunks of the tree, or other > methods to improve locality (e.g. flattening the tree)? These seem good > on paper, but I'd like to know before I spend a few days implementing > and find no gain whatever (or, hopefully, large gains). The problem with > BSP is that, if you want front to back traversal, the order of traversal > does change depending on the view. > > 4) I'm favoring spending time making an efficient BSP because most of my > data is static, not animated, so the time I put into making a BSP is > paid back. However, the build cost of a BSP is noticeable. Animated > objects I'm currently just putting in a list without any efficiency > structure. For animations involving a lot of objects, I've wondered if > something like loose octrees or some other "quick to create" efficiency > structure would pay off in terms of time saved culling vs. time spent > forming and maintaining them. > > Anyway, I'm interested in anyone's practical experiences here. As usual, > I suspect it's a case of "it all depends", but I'd love to know what > people have found useful and useless. > > The one cute little result I thought I'd share: for ray-tracing, the > surface area heuristic (SAH) gets used in forming efficient BSP trees. > That is, the efficiency of a split of a box is measured by taking the > number of objects in a child box times the area of the child's box. The > lower the summed value for the two children and for objects split by the > cutting plane, the more likely the split is gaining you efficiency. > What's interesting for rasterization is that frustum culling is a > process of planes intersecting boxes, unlike ray tracing, where it's > rays vs. boxes. When shooting rays, you want to use the surface area of > the box for the relative probability a random ray hits the box. When > testing planes, you want to use the mean width of the box. The mean > width is proportional to the sum of the lengths of the sides of a box > (or the radius of a sphere). Using the mean width to decide on the > efficiency of a box gives me more what I would expect for a BSP tree for > rasterization: the leaf nodes tend to have 10 children or so, as further > splitting is less effective than with traditional SAH. This matches my > own experience with forming such hierarchies by simple median-split > schemes: the leaf nodes are less effective if you try to chop them too fine. > > All for now, > > Eric > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > |
|
From: Eric H. <eri...@gm...> - 2008-06-04 14:30:35
|
I should have been clearer: yes, I'm using the GPU. By BSP trees I mean axis-aligned hierarchical trees, not the polygon-plane-aligned old-school Doom stuff. By large I mean lots of objects, say 10K on up, each object a mesh of say 20 to 20k polygons. Scene type is "unpredictable", right now I'm just looking at general solutions for large scenes: could be indoor architectural, or just "complex stuff". I'm looking to see if there's a good "this scheme works for most data sets in some reasonable way" choice. Certainly things like terrain rendering have their own specialized solutions (CLOD, etc. - DICE's from SIGGRAPH 2007 seems very cool, on a constrained quadtree), I understand that. Eric Stefan Sandberg wrote: > If you're talking software rasterization, a bsp coupled with a pvs is > probably the most efficient means to render static geometry, as proven > by quake, I guess.. > But that's not a great solution for current state of the art in gpu > rasterization, you'd probably be better off with a portal solution > guided by occlusion queries.. > > As far as frustum culling is concerned, early out is the way to go.. > > (what do you mean when you say 'large models', do you mean a 1M poly > dragon, or indoor fps style level, or crysis style world?) |
|
From: Jon W. <hp...@mi...> - 2008-06-04 15:10:10
|
Eric Haines wrote: > I'm looking to see if there's a good "this scheme works for most data > sets in some reasonable way" choice. Certainly things like terrain > rendering have their own specialized solutions (CLOD, etc. - DICE's from > SIGGRAPH 2007 seems very cool, on a constrained quadtree), I understand > that. > For static stuff, a kd-tree might work just fine -- sounds like that's similar to the axis aligned plane tree you're using. For moving stuff, if only a dozen things are moving, stuff them in a list/vector. If more things are (potentially) moving, then a loose octree has served us well. For indoors scenes where you want PVS, cells with portals are usually the optimal way to go. No bounding hierarchy tree will be able to cull out walls behind walls behind walls (or valleys behind hills behind valleys...) and even with early Z tests, if you end up pseudo-filling the entire screen a dozen times, that will hurt your frame rate. All of those mechanisms can be improved, from a rasterization point of view, by adding dynamic occluders (ideally with occluder fusion). I know that the IHVs say that their cards are efficient enough that you shouldn't bother, but that's only true for the "GTX" versions -- the 64-bit versions that end up in the laptops that most of the buying public are buying, are still very fill rate sensitive. Sincerely, jw |
|
From: <gd...@er...> - 2008-06-04 15:45:10
|
Hi Eric, We've been using various kind of AABB trees for static and dynamic triangle meshes (deformable objects), broadphase collision detection, view frustum culling. Static AABB trees with memory optimized layout that allow for fast stackless traversal in parallel on SPUs. It supports fast and cache-friendly local and global refit operations. Dynamic AABB trees: we have also implemented a very fast dynamic AABB tree, that rebalances itself, rather then a refit. For broadphase purposes, we keep two of those dynamic trees, and object can move from one to the other tree. We integrated this into Pierre Terdiman's CDTestBenchmark, and the dynamic AABB tree implementation outperforms OPCODE Box Pruning. All of this is available in latest open source Bullet 2.69 at http://bulletphysics.com Is your implementation exclusive to GPU, or do you consider a CPU implementation? Perhaps we can do some performance comparisons/benchmarks between BSP and other methods (AABB trees etc)? Thanks, Erwin Eric Haines writes: > I should have been clearer: yes, I'm using the GPU. By BSP trees I mean > axis-aligned hierarchical trees, not the polygon-plane-aligned > old-school Doom stuff. By large I mean lots of objects, say 10K on up, > each object a mesh of say 20 to 20k polygons. Scene type is > "unpredictable", right now I'm just looking at general solutions for > large scenes: could be indoor architectural, or just "complex stuff". > I'm looking to see if there's a good "this scheme works for most data > sets in some reasonable way" choice. Certainly things like terrain > rendering have their own specialized solutions (CLOD, etc. - DICE's from > SIGGRAPH 2007 seems very cool, on a constrained quadtree), I understand > that. > > Eric > > > Stefan Sandberg wrote: >> If you're talking software rasterization, a bsp coupled with a pvs is >> probably the most efficient means to render static geometry, as proven >> by quake, I guess.. >> But that's not a great solution for current state of the art in gpu >> rasterization, you'd probably be better off with a portal solution >> guided by occlusion queries.. >> >> As far as frustum culling is concerned, early out is the way to go.. >> >> (what do you mean when you say 'large models', do you mean a 1M poly >> dragon, or indoor fps style level, or crysis style world?) > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
|
From: Pierre T. <pie...@gr...> - 2008-06-04 14:18:21
|
If stuff is moving, do you sort of make a copy of the boxes' current locations and build off of that, or...? I rebuild using a copy of the boxes' current location, indeed. The other tree is constantly refit. Since the tree switch happens before the refit, the new tree is first refit with the new current positions before being used. So it's always a bit out-of-date, but never wrong. I did that in PhysX, was a lot faster than my previous loose octree for all collision queries. I don't know if this would work well if you put triangles in the structure. I only put game objects in it. So I had maybe 10.000 entities max, not millions. Also, how do you form your trees quickly and so they're efficient? I'm finding simple median splits stink. Same build options as in Opcode 1.3, really. - Pierre |
|
From: Aras P. <ar...@un...> - 2008-06-04 15:02:00
|
> I should have been clearer: yes, I'm using the GPU. By BSP trees I mean > axis-aligned hierarchical trees, not the polygon-plane-aligned > old-school Doom stuff. So it's a kd-tree then? -- Aras Pranckevičius work: http://unity3d.com home: http://aras-p.info |
|
From: Stefan S. <kef...@gm...> - 2008-06-04 15:20:36
|
Perhaps a good idea to look into bih & bvh trees as well, as they build much quicker than kd's.. Aras Pranckevicius wrote: >> I should have been clearer: yes, I'm using the GPU. By BSP trees I mean >> axis-aligned hierarchical trees, not the polygon-plane-aligned >> old-school Doom stuff. >> > > So it's a kd-tree then? > > > > |
|
From: Megan F. <sha...@gm...> - 2008-06-04 15:21:21
|
While this is true, do keep in mind that occluder fusion isn't necessarily going to win you anything you actually want. Consider the case where a collection of trees, viewed just so, occludes 80% of the scene, netting you a significant performance boost. This is a bad thing, not a good thing - because players respond better to a consistently mediocre framerate than they do to a framerate that spikes massively and inconsistently. I'm all for occlusion where it makes sense (big city buildings, big houses, large walls), but most of the places where occluders make sense are probably also places where individual objects are big enough to just be modelled with occluders built in - no need for fusion. The usual example of "and we can take this group of people / cars and occlude everything behind it!" tends to be a plain bad idea, unless you're in the unlikely situation of having a game in which you're absolutely always surrounded by a mass of (something indistinct). > All of those mechanisms can be improved, from a rasterization point of > view, by adding dynamic occluders (ideally with occluder fusion). I know > that the IHVs say that their cards are efficient enough that you > shouldn't bother, but that's only true for the "GTX" versions -- the > 64-bit versions that end up in the laptops that most of the buying > public are buying, are still very fill rate sensitive. > > Sincerely, > > jw > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Megan Fox Idyllon, LLC http://www.shalinor.com/ http://www.idyllon.com/ |
|
From: Stefan S. <kef...@gm...> - 2008-06-04 15:49:57
|
I find that reasoning pretty silly.. Wherever you gain performance(or rather, reduce cost), it's a win.. Naturally you cap your framerate at something, usually 30-60 fps, that doesn't mean that you wont benefit from a solution that is capable of 4 times that framerate if you let it loose. Just spend that time doing something else.. Most graphics artists & designers would massage you daily if you told them you figured out a way for them to have twice the amount of <something> in the game... Megan Fox wrote: > While this is true, do keep in mind that occluder fusion isn't > necessarily going to win you anything you actually want. Consider the > case where a collection of trees, viewed just so, occludes 80% of the > scene, netting you a significant performance boost. This is a bad > thing, not a good thing - because players respond better to a > consistently mediocre framerate than they do to a framerate that > spikes massively and inconsistently. > > I'm all for occlusion where it makes sense (big city buildings, big > houses, large walls), but most of the places where occluders make > sense are probably also places where individual objects are big enough > to just be modelled with occluders built in - no need for fusion. The > usual example of "and we can take this group of people / cars and > occlude everything behind it!" tends to be a plain bad idea, unless > you're in the unlikely situation of having a game in which you're > absolutely always surrounded by a mass of (something indistinct). > > >> All of those mechanisms can be improved, from a rasterization point of >> view, by adding dynamic occluders (ideally with occluder fusion). I know >> that the IHVs say that their cards are efficient enough that you >> shouldn't bother, but that's only true for the "GTX" versions -- the >> 64-bit versions that end up in the laptops that most of the buying >> public are buying, are still very fill rate sensitive. >> >> Sincerely, >> >> jw >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> >> > > > > |
|
From: Megan F. <sha...@gm...> - 2008-06-04 16:04:09
|
That's just the problem, though, it doesn't get you 4x the framerate. It gets you 4x the framerate when you're looking over there at this particular clump of trees (from this particular angle), and normal FPS when you look anywhere else. If you build the scene to that 4x figure, most of it will run like crap, and if you build the scene to the normal figure, you'll be getting (whatever FPS you designed for) everywhere and a spike to obscene framerates for a half second of a camera rotation in this particular spot - which isn't particularly nice. If the FPS was already what you wanted elsewhere, the spike doesn't get you anything appreciable aside from throwing off the consistency of your movement and player feedback. It only gets you 4x framerate if your scene happens to be built up of uniformly distributed homogenous occluders, and for the instances where dPVS-esque tech is often suggested (free roaming RPGs, sandbox environments, some similar inconsistent and hard to manually occlude region) you absolutely never have that. There are absolutely scenes where it makes sense (let's say you have a cityscape made up of clumps of distinct buildings, but where those buildings uniformly cover a city block), but those same scenes tend to be the sort of thing you'd optimize out (those distinct buildings should become a single large facade per block, lending themselves nicely to a standard large fixed occluder). ... but I admit, I am crusty about this particular tech, and my arguments have a distinct "darn kids playing your nintendoo video machines!" flavour. I'm more than willing to be disproven by a practical example of their implementation, but I'm unconvinced by the dPVS tech demos. On Wed, Jun 4, 2008 at 9:50 AM, Stefan Sandberg <kef...@gm...> wrote: > I find that reasoning pretty silly.. > Wherever you gain performance(or rather, reduce cost), it's a win.. > Naturally you cap your framerate at something, usually 30-60 fps, that > doesn't mean that you wont benefit from a solution that > is capable of 4 times that framerate if you let it loose. Just spend > that time doing something else.. > Most graphics artists & designers would massage you daily if you told > them you figured out a way for them to > have twice the amount of <something> in the game... > > Megan Fox wrote: >> While this is true, do keep in mind that occluder fusion isn't >> necessarily going to win you anything you actually want. Consider the >> case where a collection of trees, viewed just so, occludes 80% of the >> scene, netting you a significant performance boost. This is a bad >> thing, not a good thing - because players respond better to a >> consistently mediocre framerate than they do to a framerate that >> spikes massively and inconsistently. >> >> I'm all for occlusion where it makes sense (big city buildings, big >> houses, large walls), but most of the places where occluders make >> sense are probably also places where individual objects are big enough >> to just be modelled with occluders built in - no need for fusion. The >> usual example of "and we can take this group of people / cars and >> occlude everything behind it!" tends to be a plain bad idea, unless >> you're in the unlikely situation of having a game in which you're >> absolutely always surrounded by a mass of (something indistinct). >> >> >>> All of those mechanisms can be improved, from a rasterization point of >>> view, by adding dynamic occluders (ideally with occluder fusion). I know >>> that the IHVs say that their cards are efficient enough that you >>> shouldn't bother, but that's only true for the "GTX" versions -- the >>> 64-bit versions that end up in the laptops that most of the buying >>> public are buying, are still very fill rate sensitive. >>> >>> Sincerely, >>> >>> jw >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> _______________________________________________ >>> GDAlgorithms-list mailing list >>> GDA...@li... >>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >>> Archives: >>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >>> >>> >> >> >> >> > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Megan Fox Idyllon, LLC http://www.shalinor.com/ http://www.idyllon.com/ |
|
From: Jamie F. <ja...@qu...> - 2008-06-04 16:27:35
|
Stefan Sandberg wrote: > I find that reasoning pretty silly.. > Wherever you gain performance(or rather, reduce cost), it's a win.. Only if it's a consistent win, or improves your worst case. If you gain performance in a situation which isn't your worst case (e.g. facing a wall), and lose performance in your worst case (e.g. viewing a large open area with many entities), I wouldn't call that a win. All occluder fusion code falls into that category for all situations I've seen; your situation may be different. Jamie > Naturally you cap your framerate at something, usually 30-60 fps, that > doesn't mean that you wont benefit from a solution that > is capable of 4 times that framerate if you let it loose. Just spend > that time doing something else.. > Most graphics artists & designers would massage you daily if you told > them you figured out a way for them to > have twice the amount of <something> in the game... > > Megan Fox wrote: >> While this is true, do keep in mind that occluder fusion isn't >> necessarily going to win you anything you actually want. Consider the >> case where a collection of trees, viewed just so, occludes 80% of the >> scene, netting you a significant performance boost. This is a bad >> thing, not a good thing - because players respond better to a >> consistently mediocre framerate than they do to a framerate that >> spikes massively and inconsistently. >> >> I'm all for occlusion where it makes sense (big city buildings, big >> houses, large walls), but most of the places where occluders make >> sense are probably also places where individual objects are big enough >> to just be modelled with occluders built in - no need for fusion. The >> usual example of "and we can take this group of people / cars and >> occlude everything behind it!" tends to be a plain bad idea, unless >> you're in the unlikely situation of having a game in which you're >> absolutely always surrounded by a mass of (something indistinct). >> >> >>> All of those mechanisms can be improved, from a rasterization point of >>> view, by adding dynamic occluders (ideally with occluder fusion). I know >>> that the IHVs say that their cards are efficient enough that you >>> shouldn't bother, but that's only true for the "GTX" versions -- the >>> 64-bit versions that end up in the laptops that most of the buying >>> public are buying, are still very fill rate sensitive. >>> >>> Sincerely, >>> >>> jw -- Jamie Fowlston Program Manager Tools & Technology Qube Software www.qubesoft.com |
|
From: Jon W. <hp...@mi...> - 2008-06-04 17:01:49
|
Megan Fox wrote: > I'm all for occlusion where it makes sense (big city buildings, big > houses, large walls), but most of the places where occluders make > sense are probably also places where individual objects are big enough > to just be modelled with occluders built in - no need for fusion. The > Think of Manhattan, when your viewing angle isn't exactly 90 degrees to the buildings. There will be potentially significant amounts of people, cars, street signs etc that are occluded by some combination of buildings, but are not occluded fully by any single building. This is especially true when using conservative or "convenient" bounding volumes for occlusion testing (like, spheres with a center at the feet :-) I wasn't suggesting fusion in the sense of trees in a forest, as I was thinking about an urban scene. I actually think that most tree trunks are either too narrow, or too bent/complex in shape, to make for good occluders -- only the densest ground-hugging fir trees (a la christmas trees) would qualify. Your observations are very valuable to call me on that and make this clear! Sincerely, jw |
|
From: Pierre T. <pie...@gr...> - 2008-06-04 15:52:03
|
> We integrated this into Pierre Terdiman's CDTestBenchmark, and the dynamic > AABB tree implementation outperforms OPCODE Box Pruning. Well it's a bit apples to orange though, the "box pruning" is more a SAP competitor than a bv-tree competitor. But ok, I digress. - Pierre |
|
From: Pierre T. <pie...@gr...> - 2008-06-04 16:31:14
|
In other words, "optimize the worst case". - Pierre > That's just the problem, though, it doesn't get you 4x the framerate. > It gets you 4x the framerate when you're looking over there at this > particular clump of trees (from this particular angle), and normal FPS > when you look anywhere else. If you build the scene to that 4x > figure, most of it will run like crap, and if you build the scene to > the normal figure, you'll be getting (whatever FPS you designed for) > everywhere and a spike to obscene framerates for a half second of a > camera rotation in this particular spot - which isn't particularly > nice. If the FPS was already what you wanted elsewhere, the spike > doesn't get you anything appreciable aside from throwing off the > consistency of your movement and player feedback. > > It only gets you 4x framerate if your scene happens to be built up of > uniformly distributed homogenous occluders, and for the instances > where dPVS-esque tech is often suggested (free roaming RPGs, sandbox > environments, some similar inconsistent and hard to manually occlude > region) you absolutely never have that. There are absolutely scenes > where it makes sense (let's say you have a cityscape made up of clumps > of distinct buildings, but where those buildings uniformly cover a > city block), but those same scenes tend to be the sort of thing you'd > optimize out (those distinct buildings should become a single large > facade per block, lending themselves nicely to a standard large fixed > occluder). > > ... but I admit, I am crusty about this particular tech, and my > arguments have a distinct "darn kids playing your nintendoo video > machines!" flavour. I'm more than willing to be disproven by a > practical example of their implementation, but I'm unconvinced by the > dPVS tech demos. > > On Wed, Jun 4, 2008 at 9:50 AM, Stefan Sandberg > <kef...@gm...> wrote: >> I find that reasoning pretty silly.. >> Wherever you gain performance(or rather, reduce cost), it's a win.. >> Naturally you cap your framerate at something, usually 30-60 fps, that >> doesn't mean that you wont benefit from a solution that >> is capable of 4 times that framerate if you let it loose. Just spend >> that time doing something else.. >> Most graphics artists & designers would massage you daily if you told >> them you figured out a way for them to >> have twice the amount of <something> in the game... >> >> Megan Fox wrote: >>> While this is true, do keep in mind that occluder fusion isn't >>> necessarily going to win you anything you actually want. Consider the >>> case where a collection of trees, viewed just so, occludes 80% of the >>> scene, netting you a significant performance boost. This is a bad >>> thing, not a good thing - because players respond better to a >>> consistently mediocre framerate than they do to a framerate that >>> spikes massively and inconsistently. >>> >>> I'm all for occlusion where it makes sense (big city buildings, big >>> houses, large walls), but most of the places where occluders make >>> sense are probably also places where individual objects are big enough >>> to just be modelled with occluders built in - no need for fusion. The >>> usual example of "and we can take this group of people / cars and >>> occlude everything behind it!" tends to be a plain bad idea, unless >>> you're in the unlikely situation of having a game in which you're >>> absolutely always surrounded by a mass of (something indistinct). >>> >>> >>>> All of those mechanisms can be improved, from a rasterization point of >>>> view, by adding dynamic occluders (ideally with occluder fusion). I >>>> know >>>> that the IHVs say that their cards are efficient enough that you >>>> shouldn't bother, but that's only true for the "GTX" versions -- the >>>> 64-bit versions that end up in the laptops that most of the buying >>>> public are buying, are still very fill rate sensitive. >>>> >>>> Sincerely, >>>> >>>> jw >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> Check out the new SourceForge.net Marketplace. >>>> It's the best place to buy or sell services for >>>> just about anything Open Source. >>>> http://sourceforge.net/services/buy/index.php >>>> _______________________________________________ >>>> GDAlgorithms-list mailing list >>>> GDA...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >>>> Archives: >>>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >>>> >>>> >>> >>> >>> >>> >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > -- > Megan Fox > Idyllon, LLC > http://www.shalinor.com/ > http://www.idyllon.com/ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
|
From: Willem H. de B. <wi...@wh...> - 2008-06-05 09:21:39
|
There was one project that I worked on which featured both indoor as well as outdoor scenes. These scenes were pretty much given to me as 'polygon soup' and then collected together into fairly big 'chunks' of geometry that did not have any real restrictions on them in terms of how much volume (or area) it would take up in the world. The world was essentially static. The approach I ended up taking was to create a kD-tree (splitting at the midpoints of the subtree's AABB's) and then essentially subdivide 'into' the chunks until I reached a certain threshold number of triangles, and then make the rest into a leaf in the tree, to be preprocessed so that these could be fired off to the GPU. IIRC (and this is 3 years ago), there was a reasonable amount of culling going on, and so the kD-tree approach worked quite well. There is some benefit in balancing the tree. I remember an old Graphics Programming Methods article that goes into some depth on how to balance a kD-tree, which worked rather well. Then I experimented with occlusion. This is where we ended up not gaining much in terms of performance. Since these were outdoor and indoor scenes I took a general approach and 1) got artists to place 'occluder meshes' at appropriate parts of the scene, and 2) tried my hand at building occluders from buildings and then fusing these together. I remember this being quite messy and there being lots of ugly special cases. Neither of these occluder approaches were really worth the effort in retrospect. The idea of using AABB trees and people's positive experience with them sounds quite interesting - something to try next time. :) Cheers, Willem ----- Original Message ----- From: "Eric Haines" <eri...@gm...> To: "Game Development Algorithms" <gda...@li...> Sent: Wednesday, June 04, 2008 2:35 PM Subject: [Algorithms] BSP tree, or what? > I've been surveying the literature for how best to display large models > when using rasterization. Frustum culling, LOD, occlusion culling - > sure. What I'm most interested in is what sort of efficiency structure > is best to use for large models. I personally favor a BSP tree, with > each object's bounding box getting inserted in one location in the tree > (unlike ray tracing, which has different needs, where objects are often > pushed towards the leaves and duplicated). BSPs can be used for frustum > culling, the cells can be used with LOD (too small a cell, toss it out > or approximate by drawing a box), and with the BSP providing front to > back traversal, occlusion culling can also be folded in fairly > effectively. Even without occlusion culling, front to back allows less > pixel shader evaluation, etc. > > My questions for the day: > 1) Is there some other scheme I should be seriously considering? Spheres > for bounding volumes are quicker to test for frustum culling, but the > spheres themselves seem pretty inefficient for arbitrary groups of > geometry. > > 2) Bounding box hierarchies might actually be better than BSP trees, but > it's trickier to form them well. Really, I like using BSP tree formation > algorithms that allow the creation of an efficient hierarchy, but for > rasterization I can afford to let the two child boxes of a parent > actually overlap - a strict front-to-back ordering is not required, it's > more that you want the child bounding boxes to be as tight as possible. > Efficient bounding volume creation for the purposes of rasterization > seems underexplored, but maybe I'm missing out on someone's work here. > > 3) Memory locality is a big deal nowadays. Do people have any experience > with van Emde Boas schemes to localize chunks of the tree, or other > methods to improve locality (e.g. flattening the tree)? These seem good > on paper, but I'd like to know before I spend a few days implementing > and find no gain whatever (or, hopefully, large gains). The problem with > BSP is that, if you want front to back traversal, the order of traversal > does change depending on the view. > > 4) I'm favoring spending time making an efficient BSP because most of my > data is static, not animated, so the time I put into making a BSP is > paid back. However, the build cost of a BSP is noticeable. Animated > objects I'm currently just putting in a list without any efficiency > structure. For animations involving a lot of objects, I've wondered if > something like loose octrees or some other "quick to create" efficiency > structure would pay off in terms of time saved culling vs. time spent > forming and maintaining them. > > Anyway, I'm interested in anyone's practical experiences here. As usual, > I suspect it's a case of "it all depends", but I'd love to know what > people have found useful and useless. > > The one cute little result I thought I'd share: for ray-tracing, the > surface area heuristic (SAH) gets used in forming efficient BSP trees. > That is, the efficiency of a split of a box is measured by taking the > number of objects in a child box times the area of the child's box. The > lower the summed value for the two children and for objects split by the > cutting plane, the more likely the split is gaining you efficiency. > What's interesting for rasterization is that frustum culling is a > process of planes intersecting boxes, unlike ray tracing, where it's > rays vs. boxes. When shooting rays, you want to use the surface area of > the box for the relative probability a random ray hits the box. When > testing planes, you want to use the mean width of the box. The mean > width is proportional to the sum of the lengths of the sides of a box > (or the radius of a sphere). Using the mean width to decide on the > efficiency of a box gives me more what I would expect for a BSP tree for > rasterization: the leaf nodes tend to have 10 children or so, as further > splitting is less effective than with traditional SAH. This matches my > own experience with forming such hierarchies by simple median-split > schemes: the leaf nodes are less effective if you try to chop them too > fine. > > All for now, > > Eric > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
|
From: Sergei M. <se...@ha...> - 2008-06-18 12:15:22
|
Let me guess, your typical camera locations was not at the borders of the level, but around its center and your 1/4-th of the level fits nicely into the GPU bandwidth. I'm really don't have an idea, why one shouldn't get benefit from occlussion, unless of course he is not vertex processing bound (which tend to be the problem at distance if there is no level of detail) or can render the whole level at once as today hardware do with Q3-type of levels nowadays. The 'forget the occlusion culling' attitude is very common when you have that dreaded worst-case where you see huge vistas of an outdoor area. If you got that running, then you have good LOD and you assume it will be ok to render everything everywhere. But placing the camera inside an indoor area and having that outdoor behind it will make the overdraw from say 4x to 6x, which occlusion culling could save in these particular cases. And dips, of course. ----- Original Message ----- From: "Willem H. de Boer" <wi...@wh...> To: <er...@ac...>; "Game Development Algorithms" <gda...@li...> Sent: Thursday, June 05, 2008 12:21 PM Subject: Re: [Algorithms] BSP tree, or what? > There was one project that I worked on which featured both indoor > as well as outdoor scenes. These scenes were pretty much given to > me as 'polygon soup' and then collected together into fairly big > 'chunks' of geometry that did not have any real restrictions on them > in terms of how much volume (or area) it would take up in the world. > The world was essentially static. > > The approach I ended up taking was to create a kD-tree (splitting > at the midpoints of the subtree's AABB's) and then essentially > subdivide 'into' the chunks until I reached a certain threshold number > of triangles, and then make the rest into a leaf in the tree, to > be preprocessed so that these could be fired off to the GPU. > > IIRC (and this is 3 years ago), there was a reasonable amount of > culling going on, and so the kD-tree approach worked quite > well. There is some benefit in balancing the tree. I remember an > old Graphics Programming Methods article that goes into some > depth on how to balance a kD-tree, which worked rather well. > > Then I experimented with occlusion. This is where we ended > up not gaining much in terms of performance. Since these were > outdoor and indoor scenes I took a general approach and > 1) got artists to place 'occluder meshes' at appropriate parts > of the scene, and 2) tried my hand at building occluders > from buildings and then fusing these together. I remember this > being quite messy and there being lots of ugly special cases. > > Neither of these occluder approaches were really worth the effort > in retrospect. > > The idea of using AABB trees and people's positive experience > with them sounds quite interesting - something to try next time. :) > > Cheers, > Willem > > ----- Original Message ----- > From: "Eric Haines" <eri...@gm...> > To: "Game Development Algorithms" > <gda...@li...> > Sent: Wednesday, June 04, 2008 2:35 PM > Subject: [Algorithms] BSP tree, or what? > > >> I've been surveying the literature for how best to display large models >> when using rasterization. Frustum culling, LOD, occlusion culling - >> sure. What I'm most interested in is what sort of efficiency structure >> is best to use for large models. I personally favor a BSP tree, with >> each object's bounding box getting inserted in one location in the tree >> (unlike ray tracing, which has different needs, where objects are often >> pushed towards the leaves and duplicated). BSPs can be used for frustum >> culling, the cells can be used with LOD (too small a cell, toss it out >> or approximate by drawing a box), and with the BSP providing front to >> back traversal, occlusion culling can also be folded in fairly >> effectively. Even without occlusion culling, front to back allows less >> pixel shader evaluation, etc. >> >> My questions for the day: >> 1) Is there some other scheme I should be seriously considering? Spheres >> for bounding volumes are quicker to test for frustum culling, but the >> spheres themselves seem pretty inefficient for arbitrary groups of >> geometry. >> >> 2) Bounding box hierarchies might actually be better than BSP trees, but >> it's trickier to form them well. Really, I like using BSP tree formation >> algorithms that allow the creation of an efficient hierarchy, but for >> rasterization I can afford to let the two child boxes of a parent >> actually overlap - a strict front-to-back ordering is not required, it's >> more that you want the child bounding boxes to be as tight as possible. >> Efficient bounding volume creation for the purposes of rasterization >> seems underexplored, but maybe I'm missing out on someone's work here. >> >> 3) Memory locality is a big deal nowadays. Do people have any experience >> with van Emde Boas schemes to localize chunks of the tree, or other >> methods to improve locality (e.g. flattening the tree)? These seem good >> on paper, but I'd like to know before I spend a few days implementing >> and find no gain whatever (or, hopefully, large gains). The problem with >> BSP is that, if you want front to back traversal, the order of traversal >> does change depending on the view. >> >> 4) I'm favoring spending time making an efficient BSP because most of my >> data is static, not animated, so the time I put into making a BSP is >> paid back. However, the build cost of a BSP is noticeable. Animated >> objects I'm currently just putting in a list without any efficiency >> structure. For animations involving a lot of objects, I've wondered if >> something like loose octrees or some other "quick to create" efficiency >> structure would pay off in terms of time saved culling vs. time spent >> forming and maintaining them. >> >> Anyway, I'm interested in anyone's practical experiences here. As usual, >> I suspect it's a case of "it all depends", but I'd love to know what >> people have found useful and useless. >> >> The one cute little result I thought I'd share: for ray-tracing, the >> surface area heuristic (SAH) gets used in forming efficient BSP trees. >> That is, the efficiency of a split of a box is measured by taking the >> number of objects in a child box times the area of the child's box. The >> lower the summed value for the two children and for objects split by the >> cutting plane, the more likely the split is gaining you efficiency. >> What's interesting for rasterization is that frustum culling is a >> process of planes intersecting boxes, unlike ray tracing, where it's >> rays vs. boxes. When shooting rays, you want to use the surface area of >> the box for the relative probability a random ray hits the box. When >> testing planes, you want to use the mean width of the box. The mean >> width is proportional to the sum of the lengths of the sides of a box >> (or the radius of a sphere). Using the mean width to decide on the >> efficiency of a box gives me more what I would expect for a BSP tree for >> rasterization: the leaf nodes tend to have 10 children or so, as further >> splitting is less effective than with traditional SAH. This matches my >> own experience with forming such hierarchies by simple median-split >> schemes: the leaf nodes are less effective if you try to chop them too >> fine. >> >> All for now, >> >> Eric >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
|
From: Willem H. de B. <wi...@wh...> - 2008-06-18 12:24:05
|
> Let me guess, your typical camera locations was not at the borders of the > level, but around its center and your 1/4-th of the level fits nicely into > the GPU bandwidth. The camera locations were pretty evenly scattered; there were places near the border of the level and ofcourse also places near the center. > I'm really don't have an idea, why one shouldn't get benefit from > occlussion, unless of course he is not vertex processing bound (which tend I can't remember why occlusion didn't work in this case. The decision was made to ditch it since we'd already spent some time on it and not really seeing any benefit. This ofcourse is not saying that occlusion culling doesn't work - it just didn't work for us in this particular case. Cheers, Willem |
|
From: Marc B. R. <mar...@or...> - 2008-06-05 11:59:44
|
> 3) Memory locality is a big deal nowadays. Do people have any > experience with van Emde Boas schemes to localize chunks of the tree, > or other methods to improve locality (e.g. flattening the tree)? These > seem good on paper, but I'd like to know before I spend a few days > implementing and find no gain whatever (or, hopefully, large gains). > The problem with BSP is that, if you want front to back traversal, the > order of traversal does change depending on the view. Around 2001 (for a previous employer) I did some toy experiments ordering large octtree memory via 3D-Hilbert curve indices and was seeing worthwhile performance gains. |