Thread: RE: [Algorithms] portal engines in outdoor environments
Brought to you by:
vexxed72
From: Tom F. <to...@mu...> - 2000-08-18 17:45:54
|
I agree - it's a nightmare case for geometric occlusion. Delayed Z-visiblity is our only hope. A shame we've been waiting two DX versions for it, and it's going to miss this one as well. Grrrrrrr. So while we have fancy vertex shaders that nothing on Earth supports, we don't have API support for something that even crusty old hardware like Permedia2 and Voodoo1 supports. Life... don't talk to me about life. :-) Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. > -----Original Message----- > From: Charles Bloom [mailto:cb...@cb...] > Sent: 18 August 2000 18:26 > To: gda...@li... > Subject: [Algorithms] portal engines in outdoor environments > > > > .. is a hopeless proposition, I think. I'd just like you > guys to check my reasoning before I give up. Consider, > for example, a building placed on a landscape. The goal > is to construct portals so that when you stand on one > side of the building, you can't see the other. If the > player is constrained to stay on the ground, this is > easy enough, it's really a 2d occlusion problem, and you > need 8 portals (two * 4 faces of a square in 2d; two portals > cuz you need one on each side of a face of the square, > parrallel to that face) which splits the universe into 9 > zones (8 outside and one inside the occluding cube). > > However, if we have a plain cube in 3d and the player can > go anywhere around it, we need 48 portals !! (8 for each of > the 6 faces on the cube). Now if you have two cubes near > eachother, you need massive numbers of portals, and they > must all not intersect with eachother or other geometry. > This quickly becomes unworkable. > > I think something like a "real-time" occcluder (shadow volumes, > etc.) is the only hope for outdoors. > > -------------------------------------- > Charles Bloom www.cbloom.com |
From: Tom F. <to...@mu...> - 2000-08-19 09:57:38
|
We're talking two or three frames at most, and usually one to two I would expect. If a device is buffering any more than this, then the display is going to be lagging beind the input so much that the game becomes very hard to play (for a good illustration, try running Half-Life in D3D mode on an nVidia card - grrrrr). And you always have to render _something_ - that's how you get your visibility info back! True, you could possibly render using a ZERO:ONE blend mode, but I don't think it will save you all that much fillrate, and it will pop in a very grim way. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. > -----Original Message----- > From: gl [mailto:gl...@nt...] > Sent: 18 August 2000 19:28 > To: gda...@li... > Subject: Re: [Algorithms] portal engines in outdoor environments > > > > As I understand it, the idea with delayed z-testing is that > whilst it's > horribly slow to get 'did this primitive I just tried to > render get totally > z-rejected?' feedback from the card instantly (due to deep > pipelining going > on), it's reasonable to get that info back a few frames late without > stalling anything. > > The idea is to have that info determine your objects' LOD, > ie. you render > your object with a low LOD by default, unless you find out > that thing is now > visible, when you can bump up the tri count. As the data is > delayed, there > will be a change from low to hi LOD, but as we're only > talking a few frames > (I believe), this shouldn't be all that noticable. Note that > you can't > simply avoid drawing the object, due to the delay - > otherwise, it would > suddenly appear, a few frames later than it should. > > In fact, my contention is that if it's only as little as two or three > frames, the higher rates you run at, the less significant the > delay gets, to > the point were you probably don't even see the LOD pop, or > you might not > even need to render until you're told it's there. > > Tom, how many frames exactly would we be talking about? > -- > gl |
From: Tom F. <to...@mu...> - 2000-08-19 10:17:57
|
Though this is synchronous - you cannot queue results up, so you need to wait for the pipe to empty before getting the result, and can't do anything in the meantime. Which is the whole point of _delayed_ Z-visible - you can draw a frame with hundreds of objects being vis-checked, and then a frame or two later you read all those results (while the next frame, with more vis checks, is being rendered). Unless you can send multiple vis requests down the pipe and read them as they come out, you'll be stalling the hardware and software waiting for your results, which rather removes the point of using it for higher frame rates! Apart from that, it's the same idea. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. > -----Original Message----- > From: Bernd Kreimeier [mailto:bk...@lo...] > Sent: 18 August 2000 20:31 > To: gda...@li... > Subject: RE: [Algorithms] portal engines in outdoor environments > > > John Ratcliff writes: > > If you had the ability to ask questions of the zbuffer > like "is this > > bounding volume visable?" (yes/no) in an extremely high > speed fashion then > > you could do gross culling on an object by object basis > using the zbuffer > > contents during the render stage. Some of the old > spanbuffer software > > renderers could do this, because it was a fairly cheap > question to ask, > > especially relative to the time it took to software render > an entire 3d > > model. > > > > But, since you can't ask zbuffers these kinds of questions > it's a moot > > point. > > > http://oss.sgi.com/projects/ogl-sample/registry/HP/occlusion_test.txt > > Brian Paul mentioned that he is going to add that to Mesa > using the Glide GR_STATS counters. I have no idea which Win32 > drivers offer this extension. |
From: Tom F. <to...@mu...> - 2000-08-19 10:23:27
|
But _something_ still has to do the readback. The start of the chips' pipeline still needs to know the results at the framebuffer end before it can decide whether or not to start drawing the triangles. It doesn't matter whether the CPU does it or some bit in the T&L section of the chip - it's the same issue - something has to wait for pixels to go most of the way down the rendering pipeline. And that's a performance lose. Added to which, one of the growing bottlenecks is the AGP bus, and doing it this way doesn't help that at all. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. > -----Original Message----- > From: jason watkins [mailto:jas...@po...] > Sent: 18 August 2000 21:24 > To: gda...@li... > Subject: Re: [Algorithms] portal engines in outdoor environments > > > Better would be a pipeline that allows you to specify some > sort of simple > bounding volume for each primative (ie, each index array, > display list, > whatever). You could build these volumes with no overhead in > most games, and > providing hardware with the hint would allow it to estimate > if the entire > primative is obscured, in which case it simply skips the > entire primative. > > This would require finer grained primatives.. which I guess > is sort of on > the out in these geforce days... but I do imagine that for arrays of > moderate size, in the 100's of trys, that the driver buffers > and is already > looking at the next set. So, it's not exactly a perfect > solution, but it > seems a relatively simple feature, and avoids the necessity > of a readback to > application level. [snip] |
From: jason w. <jas...@po...> - 2000-08-20 01:54:27
|
> But _something_ still has to do the readback. The start of the chips' > pipeline still needs to know the results at the framebuffer end before it > can decide whether or not to start drawing the triangles. It doesn't matter > whether the CPU does it or some bit in the T&L section of the chip - it's > the same issue - something has to wait for pixels to go most of the way down > the rendering pipeline. And that's a performance lose. Right, of course the rejection logic has to be able to read the relavent z's. But, it doesn't have to read the most immediate state of the z.. the state a few polygons ago will do fine for a conservative rejection. So I don't see where a stall happens, or where there's a performance lose. Maybe the gain is reduce, when you think about how the rejection means there are skips in the flow from AGP RAM to the cards local storage/instruction bus, but as I understand it, that's all controled by DMA's from the card anyhow, so not a big deal. It doesn't rely on frame2frame conherance (which I feel is often a bad thing). Perhaps it would maybe be best with a heirarchical z system. > Added to which, one of the growing bottlenecks is the AGP bus, and doing it > this way doesn't help that at all. Not directly, well.. not unless your primatives are so large that the rejection happens before the primative has entirely streamed into the cards local FIFO or whatever. However, done right, it could potencially reduce state changes or texture downloads, aleavating some bus bandwidth. Hardware seems to be moving to complexe primatives that take more local processing to reduce to fragments anyhow, compressing the information that goes accross the bus, and the more overhead is associated with each member primative of a primative array, the more useful a rejection system like this becomes. So I totally agree that what I suggest isn't a ready to impliment scheme that should be in nVidia's next card, but I do think that early rejection of polygon or higher level primatives is a good thing, something that hardware should be moving toward. This is the same principle in use in the PVR series of chips, and it certainly seems to work well enough in the dreamcast. I don't think you should have to resort to the PVR's odd rendering interface to get the gains of early rejection and deferred texturing tho, and I don't see how hinting for early rejection can be a bad thing. Especially considering that we're quickly reaching the limits of what fill rate can be supported by available memory technologies (at the right price that is). And tho embeded dram seems to answer that, it and other possible technologies haven't exactly materialized. |
From: Tom F. <to...@mu...> - 2000-08-20 10:59:05
|
> From: jason watkins [mailto:jas...@po...] > > > > But _something_ still has to do the readback. The start of > the chips' > > pipeline still needs to know the results at the framebuffer > end before it > > can decide whether or not to start drawing the triangles. It doesn't > matter > > whether the CPU does it or some bit in the T&L section of > the chip - it's > > the same issue - something has to wait for pixels to go > most of the way > down > > the rendering pipeline. And that's a performance lose. > > Right, of course the rejection logic has to be able to read > the relavent > z's. But, it doesn't have to read the most immediate state of > the z.. the > state a few polygons ago will do fine for a conservative > rejection. So I > don't see where a stall happens, or where there's a > performance lose. That's not the performance hit. You are submitting tris like this: BeginChunk Draw tester tris EndChunk if ( previous chunk rendered ) { Draw real tris } The problem is that the if() is being done at the very start of the pipeline (i.e. the AGP bus - any later and you lose most of the gain), but it needs to know the results of the rasterisation & Z-test of all the pixels in the tester tris. That rasterisation and Z-test is comparatively late in the pipeline on a T&L device (it's early in the rasterisation, but there is a looooong pipe between the AGP bus and pixel rasterisation). So your pipeline is going to be completely empty between those two points. That is a huge pipeline bubble, and if you're doing it more than one or twice a frame, you are going to lose large amounts of performance. You may be able to improve things by doing: for i = 0 to nObjects { BeginChunk(i) Draw tester tris EndChunk } for i = 0 to nObjects { If ( chunk(i) rendered ) { Draw object i } } But that is a lot of extra hardware to store all that chunk information, retrieve it and so on. Lots of complexity. There are three very nice things about the frame-to-frame coherency scheme: (1) No extra fillrate hit. If the object is invisible, it's the same fillrate as your scheme. If the object is visible, then you still only draw it once, not twice as with your scheme. (2) No extra triangles needed. OK, the bounding box is a pretty small number of tris, but what if you wanted to do this scheme with lots of smallish object? Might get significant then. (3) (and this is the biggie) It is already supported by tons of existing, normal, shipped, out there hardware. Not some mystical future device. Real, existing ones that you have probably used. > Maybe > the gain is reduce, when you think about how the rejection > means there are > skips in the flow from AGP RAM to the cards local > storage/instruction bus, > but as I understand it, that's all controled by DMA's from > the card anyhow, > so not a big deal. Huge deal if the delay is longer than a few tens of clock cycles. The AGP FIFOs are not very big, and bubbles of the sort of size you are talking about are not going to be absorbed by them. So for part of your frame, the AGP bus will be sitting idle. And if, as is happening, you are limited by AGP speed, that is going to hurt quite a lot. > It doesn't rely on frame2frame conherance (which I feel is often a bad > thing). Perhaps it would maybe be best with a heirarchical z system. Doesn't help - you still need to rasterise your pixels, which is a long way down the pipe. What's wrong with frame-to-frame coherence? Remember, if there is a camera change or cut, the application can simply discard all the visibility info it has and just draw everything, until it has vis information for the new camera position. > > Added to which, one of the growing bottlenecks is the AGP > bus, and doing > it > > this way doesn't help that at all. > > Not directly, well.. not unless your primatives are so large that the > rejection happens before the primative has entirely streamed > into the cards > local FIFO or whatever. However, done right, it could > potencially reduce > state changes or texture downloads, aleavating some bus > bandwidth. No no no. There is no way you could get the _hardware_ to reject state change info and texture downloads because of some internally fed-back state. Drivers rely very heavily on persistent state, i.e. not having to send state that doesn't change. If the driver now doesn't know whether the state change info it sent actually made it to the chip's registers or not, that's just madness - the driver will go potty trying to figure out what it does and doesn't need to send over the bus. Ditto for downloading textures. Since the driver can't know whether the hardware is going to reject the tris or not, it always has to do the texture downloads anyway. And if the hardware is doing the downloads instead (e.g. AGP or cached-AGP textures), then either solution works - the fast-Z-rejection of pixels means those textures never get fetched. Far better is the delayed system, where the app can choose as well as using lower-tri models, to force lower mipmap levels as well, to reduce texture downloads. Or not, if that looks bad. This sort of decision MUST be left up to the app of course, because there will alsways be cases where a shortcut looks bad, and only the app can know whether they are acceptable in certain cases or not. [snip] > Especially considering that we're quickly reaching the limits > of what fill > rate can be supported by available memory technologies (at > the right price > that is). And tho embeded dram seems to answer that, it and > other possible > technologies haven't exactly materialized. Right. But I've already pointed out that the delayed version uses _less_ fillrate than what you are proposing, not more, because the "tester" tris are the same as the tris actually drawn. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. |
From: jason w. <jas...@po...> - 2000-08-20 21:33:09
|
> That's not the performance hit. You are submitting tris like this: > > BeginChunk > Draw tester tris > EndChunk > if ( previous chunk rendered ) > { > Draw real tris > } Nope, think more like: begin(indexed_array); setboundshint(my_boundingvolume); draw(my_iarray); end(indexed_array); it's handed off as a single, seperatable transaction.. the hint merely allows the hardware to quickly reject the entire array *if* it's obvious it's hidden.. I would think this happens fairly often, like when a character model is behind a wall, for example. So what you're not getting, is that the *if* is _not_ a blocking *if*. It's just a hint.. the hardware can deal with the hint in many ways.. it's true that it would work best in a heirarchical z pipeline, but it should still work in the typical. How that z information gets relayed back to the rejection block is an open question.. but I can think of several ways in a typical architecture.. it's cacheing scanlines anyhow, so it could do something like relay the maximal value for every 4 z's in the scanline being unloaded from cache back to the rejection block, where the rejection block has it's own low res local cache. The details of how this works could take many different forms.. the point being is that you only need delayed z info, and that having the hint processed on chip means that you can do it inside a single frame instead of relying on a previous frame. > The problem is that the if() is being done at the very start of the pipeline > (i.e. the AGP bus - any later and you lose most of the gain), Nope.. you gain fill rate by reduced depth complexity. You could gain effective polygon bandwidth as well. but it needs > to know the results of the rasterisation & Z-test of all the pixels in the > tester tris. no.. it just quickly needs to know if any z in the bounded region is further than the maximal value of the bounding volume.. or some similar conserative heuristic. > looooong pipe between the AGP bus and pixel rasterisation). So your pipeline > is going to be completely empty between those two points. That is a huge nope.. the pipeline is going to be full with the previous array, unless it was rejected and the chip didn't have the start of the next stream prefetch... depending on now long it takes to get the start of the next stream, that could trigger a stall, but I'm sure the bus access could be designed in a way to cut that down to few cycles... > > You may be able to improve things by doing: > > for i = 0 to nObjects > { > BeginChunk(i) > Draw tester tris > EndChunk > } > for i = 0 to nObjects > { > If ( chunk(i) rendered ) > { > Draw object i > } > } > Right, this is closer.. the key being that the if needs to be just a hint instead, and that it's treated as a input to conservative estimation that uses a delayed and lower detail copy of the z. Another approach would be a pageing/tiling architecture like the bitboys vapor ware (apparently) is. In this case, triangles are rasterized in a checker pattern instead of a scanline pattern, where each page is some reasonable size, like 32x32 or so. It's a no brainer to maintain range values for each page, enabling very fast rejetion of a page as a triangle is scan converted. Better yet, with the rejection in front of some higher level surface tessilator or the T&L, it can quickly read a few range values and discard an entire array or patch. > But that is a lot of extra hardware to store all that chunk information, > retrieve it and so on. Lots of complexity. There are three very nice things > about the frame-to-frame coherency scheme: > > (1) No extra fillrate hit. If the object is invisible, it's the same > fillrate as your scheme. If the object is visible, then you still only draw > it once, not twice as with your scheme. You misunderstood.. I never said anything about drawing anything. Just a bounding volume hint, which is a very different thing. There's plenty of existing work for converting a OBB to exact screen regions *very* quickly without resorting to scan conversion/rasterization. We're only interested in conservative values as well, since it's common for a character model to be completely seperate from a set of wall polygons. > (2) No extra triangles needed. OK, the bounding box is a pretty small number > of tris, but what if you wanted to do this scheme with lots of smallish > object? Might get significant then. again, it's a hint, not a set of triangles.. no added triangles. > (3) (and this is the biggie) It is already supported by tons of existing, > normal, shipped, out there hardware. Not some mystical future device. Real, > existing ones that you have probably used. *very* true, very good point. > > Maybe > > the gain is reduce, when you think about how the rejection > > means there are > > skips in the flow from AGP RAM to the cards local > > storage/instruction bus, > > but as I understand it, that's all controled by DMA's from > > the card anyhow, > > so not a big deal. > Huge deal if the delay is longer than a few tens of clock cycles. The AGP > FIFOs are not very big, and bubbles of the sort of size you are talking > about are not going to be absorbed by them. So for part of your frame, the > AGP bus will be sitting idle. And if, as is happening, you are limited by > AGP speed, that is going to hurt quite a lot. as long as the gap in the pipeline as it starts fetching the next stream after a rejection is shorter than the number of cycles it would have taken to finish the rejected stream, you win. Considering that a cycle = 4 rasterized pixels or so, and that trinagles typically are 4-8x that, and that arrays are typically 10 tris or more, I think it's not to much of a worry. Unless it really does take 200 cycles of a ~150mhz part to set up/redirect the dma. > > It doesn't rely on frame2frame conherance (which I feel is often a bad > > thing). Perhaps it would maybe be best with a heirarchical z system. > > Doesn't help - you still need to rasterise your pixels, which is a long way > down the pipe. > What's wrong with frame-to-frame coherence? Remember, if there is a camera > change or cut, the application can simply discard all the visibility info it > has and just draw everything, until it has vis information for the new > camera position. A couple things.. originally I didn't think this was a big deal, but later changed... I think making assumptions is bad, and I definately think that consistant framerate is more important than a high instantanious. Nothings more annoying than jumping through a portal in a game to get dropped frames for a few frames before it gets everything sorted out and cached and gets back up to 60fps (or whatever the target is). Making the granularity of rejection sub-frame should help avoid this... Also, when you're using an in engine cinematic approach, it's really annoying when you get a dropped frame every time the camera cuts. > No no no. There is no way you could get the _hardware_ to reject state > change info and texture downloads because of some internally fed-back state. > Drivers rely very heavily on persistent state, i.e. not having to send state > that doesn't change. If the driver now doesn't know whether the state change > info it sent actually made it to the chip's registers or not, that's just > madness - the driver will go potty trying to figure out what it does and > doesn't need to send over the bus. Ditto for downloading textures. Since the > driver can't know whether the hardware is going to reject the tris or not, > it always has to do the texture downloads anyway. And if the hardware is > doing the downloads instead (e.g. AGP or cached-AGP textures), then either > solution works - the fast-Z-rejection of pixels means those textures never > get fetched. Ouch.. hadn't thought much about the driver related issues. However, *if* state was constant accross a primative, it's not a problem. That would be a big issue, but I don't think it's insurmountable. So, maybe I'm foggy on some details.. but I still thing early rejection in rasterization pipes is a *good thing*tm :). |
From: Iain N. <i.a...@li...> - 2000-08-21 00:11:52
|
> Nope, think more like: > begin(indexed_array); > setboundshint(my_boundingvolume); > draw(my_iarray); > end(indexed_array); I have been looking at something very similar. My version adds functions to allows enscribed volumes to be set for each primative group. The occulion list would start empty each frame. primative groups would then be tested against the list as they pass though the pipeline, and be added to the list. This means you would really have to be using front to back rendering for at least 'major' objects but can be implemented totally in the geometry subsystem. |
From: <Lea...@en...> - 2000-08-21 00:40:55
|
> I have been looking at something very similar. My version adds functions to > allows enscribed volumes to be set for each primative group. The occulion > list would start empty each frame. primative groups would then be tested > against the list as they pass though the pipeline, and be added to the list. > This means you would really have to be using front to back rendering for at > least 'major' objects but can be implemented totally in the geometry > subsystem. Some things that I did a few years ago that may interest people with gross culling of objects in screen space that worked quite well in the end, that may interest some people (and may not too... :) It was pretty much a mix of my own algorithms and the Occlusion Using Shadow Frusta paper. Sphere Shadow - a sphere that is projected into screen space that consists on the minimal volume that will fit entirely within an object... you can transform the centre of the sphere and keep a screen size radius structure (hardest bit was working out the volume that fits -- but this can be done offline) BBox Shadow - as per sphere, with a BBox in screen space. I found the screen space sphere shadow to be extremely fast, easy and accurate, although it is a very limited method with respect to the volume most spheres end up occupying... for complex objects you can have multiple spheres... The entire cull algorithm for objects is then something like transform object sphere to screen space, getting center and radius for each sphere shadow occluder compare object center + radius with shadow size + radius Advantages: * It's not reliant on getting info back from the buffers * It doesn't take long to check a lot of objects against several occluders * It's flippingly quick... :) Disadvantages: * For mirrored areas, you may need to check and transform twice * Reflective surfaces such as floors in Unreal have the same problem * Useless for transparent objects, but you get that anyway... * Long thin objects suck for shadow sphere occluders For trees and terrain you need to weigh up how visible objects are behind the vegetation -- for example, if a tree has really bushy leaves you can use a distance based shadow sphere occluder that comes into play at longer distances and then actually render a billboarded quad where the shadow sphere occluder is to not have objects just disappear behind the vegetation. (You will need to blend this over time for most effective results though). A simple view independent distance test does fine... Leathal. |
From: <Lea...@en...> - 2000-08-21 01:19:02
|
> Some things that I did a few years ago that may interest people with > gross culling of objects in screen space that worked quite well in the > end, that may interest some people (and may not too... :) It was pretty > much a mix of my own algorithms and the Occlusion Using Shadow Frusta paper. Of course, it may only interest you once... :) Leathal. |
From: Akbar A. <sye...@ea...> - 2000-08-22 22:51:32
|
is the search feature of the list broken? i did a search for barycentric and i found nothing. i am pretty sure that word has been mentioned before ;) http://www.geocrawler.com/lists/3/SourceForge/4856/0/ peace. akbar A. "US4643421: Video game in which a host image repels ravenous images by serving filled vessels " http://www.patents.ibm.com/details?&pn10=US04643421 this was a fun game though ;) |
From: jason w. <jas...@po...> - 2000-08-21 02:22:14
|
Yeah, it's interesting.. Spheres are particularly nice because the sphere radius is the same as the circles radius after projection (heh.. Ron can tell us what that exact property is properly called... I know it's not invarient). But of course, the disadvantage is that they're a poor fit. Did you ever think of using a sphere tree, somewhat like the qspat systems structures? I think there's some real hope for very organic complexe outdoor environments to be based around a system.. basicly you'd have individual objects represented as in qsplat, and some sort of scene clustering collecting them into a tree. Then, each frame, start at the top of the tree, and check if any of the nodes occlude any of the others.. descend down the tree in similar way until you decide each node must be visable. Obviously, you'd need more smarts than that, since it could potencially be very slow to clip one highly concave object to another, like one tree against another, for example. You also need some form of fusion... storing a coverage ratio for each node in the sphere tree, much like HOM's image pyramid, would at least let you break out at a specific tolerence. The checks would have to be brutally fast tho to get to any really detailed environment. |
From: Jamie F. <j.f...@re...> - 2000-08-21 11:05:08
|
> Spheres are particularly nice because the sphere radius is the same as the > circles radius after projection Careful. There's a common misconception here (which you may or may not have made :). Let the sphere have radius r. Let S be the centre of the sphere. Let V be some vector perpendicular to the view vector of length r. Let P = S + V Some people claim that projecting point P gives you a point on the edge of the circle which is the rasterisation of the sphere. This is not true. Demonstration that this is so (in 2D, so hopefully it's clearer :) : Take a circle with centre C. Place an arbitrary point P outside the circle. The closer it is to the circle, the clearer my point (unintentional... sorry:) should be. Let the 2 tangents to the circle passing through P be T1, T2. Let P1 be the point of intersection between T1 and C. Define P2 similarly. It should be clear that the projections of P1 and P2 are equivalent to points on the edge of the rasterisation. But (P - C) is not perpendicular to (T1 - C) or (T2 - C). Although as | P - C | approaches infinity, they approach perpendicular. If you can be sure you'll never be close enough to appreciate the error, then you'll be fine :) Back to the sphere: this means that the true rasterisation of the sphere is larger than the circle calculated by projecting P. I'll expand more if anybody needs it... or gives a monkey's :) Jamie |
From: Tom F. <to...@mu...> - 2000-08-20 22:20:52
|
> From: jason watkins [mailto:jas...@po...] > > > That's not the performance hit. You are submitting tris like this: > > > > BeginChunk > > Draw tester tris > > EndChunk > > if ( previous chunk rendered ) > > { > > Draw real tris > > } > > > Nope, think more like: > begin(indexed_array); > setboundshint(my_boundingvolume); > draw(my_iarray); > end(indexed_array); > > it's handed off as a single, seperatable transaction.. the hint merely > allows the hardware to quickly reject the entire array *if* > it's obvious > it's hidden.. I would think this happens fairly often, like > when a character > model is behind a wall, for example. Looks the same to me - the point being that the "hint" is just before the triangles that are gated by it. the hardware can't rearrange triangle order or interleave or anything made like that - that's just not how hardware works (except for wacky stuff like scene-capture architecture, which has some very different problems to cope with, and so far has failed to live up to its claims). > So what you're not getting, is that the *if* is _not_ a > blocking *if*. If it doesn't block, then it's not much good. Not block = does nothing (or very little). > It's > just a hint.. the hardware can deal with the hint in many > ways.. If you want to abstract things this way, then this is very definately an API abstraction. This is not something that can be done directly in hardware. > it's true > that it would work best in a heirarchical z pipeline, but it > should still > work in the typical. How that z information gets relayed back to the > rejection block is an open question.. Erm... faster than light. OK, here is the pipeline of a typical T&L&rasterise chip: -Index AGP read -Vertex cacheing -Vertex AGP read (1) -Transform vertices -Light vertices -Clip vertices -Project vertices -Construct triangle from vertices -Backface cull -Rasteriser setup -Rasterise -Read Z buffer -Test against Z buffer (2) -Etc. (rest of pixel pipeline). So what you are asking is for results at (2) to affect what is done at (1) on the very next triangle. the only way this can be done is to hold the "drawn" triangles at (1) until all the "test" tris have passed (2). So the pipeline from (1) to (2) is empty. It has no triangle info in it at all. That is a huge bubble - probably hundreds of clock cycles long. You noticed all that complex floating-point maths in the middle, didn't you? Each floating-point operation has many pipelined clock stages, and there are a lot of operations in that section of the chip. It's a massive bubble, and no AGP FIFO is going to deal with those sorts of delays. > but I can think of > several ways in a > typical architecture.. it's cacheing scanlines anyhow, Not in a typical architecture it's not. But let's say it was... > so it could do > something like relay the maximal value for every 4 z's in the > scanline being > unloaded from cache back to the rejection block, where the > rejection block > has it's own low res local cache. OK, well there is only the Radeon that does this at the moment. It's cool, but it's nowhere near commonplace. And it still requires that the test triangles be rasterised - converted into pixel representation. The details of Z-testing are not important. The fact that you first have to rasterise them is the killer. > The details of how this > works could take > many different forms.. the point being is that you only need > delayed z info, > and that having the hint processed on chip means that you can > do it inside a > single frame instead of relying on a previous frame. I just don't see the problem with relying on the previous frame. There are hundreds of algorithms that we use every day in code that rely on frame-to-frame coherency for speed. One more is not going to drive people bonkers. > > The problem is that the if() is being done at the very start of the > pipeline > > (i.e. the AGP bus - any later and you lose most of the gain), > > Nope.. you gain fill rate by reduced depth complexity. You could gain > effective polygon bandwidth as well. No, you don't. Consider the case where the object is invisible: You: draw & test bounding object. Don't draw real object. Me: test previous frame's object results. Draw this frame's low-rez object. Pretty even - we both draw an object that is roughly the right number of pixels on-screen. Both get fast Z-rejects (by whatever hierarchial Z method you like) and fetch no texels. OK, now a drawn object: You: draw & test bounding object. Draw real object. Me: test previous frame's object results. Draw this frame's low-rez object. Looks like I win. I drew one object, you draw an object & tested a bounding object. True, you didn't need to fetch texels or write out results for your bounding object, but you staill rasterised & checked _something_. I didn't. Plus, you also had to send down the polygon data for your bounding object, while I didn't. It's usually small for a bounding object, but it's not zero. [snip stuff that also isn't right, but...] > You misunderstood.. I never said anything about drawing > anything. Just a > bounding volume hint, which is a very different thing. > There's plenty of > existing work for converting a OBB to exact screen regions > *very* quickly > without resorting to scan conversion/rasterization. We're > only interested in > conservative values as well, since it's common for a > character model to be > completely seperate from a set of wall polygons. OK, if you did this sort of incredibly conservative (i.e. add hardware to T&L OBB in some quick but conservative way, find enclosing screen BB, test all Z values using some sort of quickie rectangle rasteriser, somehow dodging the bullet of concurrent Z-buffer access with polys that are currently being rasterised), maybe it would work sometimes. But remember that you're finding the screen BB of an OBB. So the area being tested is quite big compared to your original shape. And that's still a decent chunk of hardware. I _still_ don't see what is so bad about adding zero hardware to existing chips and using some frame-to-frame coherency. > > (2) No extra triangles needed. OK, the bounding box is a > pretty small > number > > of tris, but what if you wanted to do this scheme with lots > of smallish > > object? Might get significant then. > > again, it's a hint, not a set of triangles.. no added triangles. > > > (3) (and this is the biggie) It is already supported by > tons of existing, > > normal, shipped, out there hardware. Not some mystical > future device. > Real, > > existing ones that you have probably used. > > *very* true, very good point. Indeedy. [snip] > > Huge deal if the delay is longer than a few tens of clock > cycles. The AGP > > FIFOs are not very big, and bubbles of the sort of size you > are talking > > about are not going to be absorbed by them. So for part of > your frame, the > > AGP bus will be sitting idle. And if, as is happening, you > are limited by > > AGP speed, that is going to hurt quite a lot. > > as long as the gap in the pipeline as it starts fetching the > next stream > after a rejection is shorter than the number of cycles it > would have taken > to finish the rejected stream, you win. Considering that a cycle = 4 > rasterized pixels or so, and that trinagles typically are > 4-8x that, and > that arrays are typically 10 tris or more, I think it's not > to much of a > worry. Unless it really does take 200 cycles of a ~150mhz part to set > up/redirect the dma. You're confusing _throughput_ with _latency_. The typical on-screen textured pixel may take thousands of clock cycles from its triangle being read in by the AGP bus, to actually being written onto the screen. However, the next one will be right behind it. so the throughput of chips is massive, but the latency is terrible. What you are relying on is a short latency. And in the graphics chip world, latency is very very expendable. Huge pipelines, massively parallel, quarter of the chip is FIFOs, multiple stages in even the simplest operations. That is what makes graphics chips fast. Do not stall the pipe, or you're toast. Those are the keys. This technique blows all that out of the water. [snip] > > What's wrong with frame-to-frame coherence? Remember, if > there is a camera > > change or cut, the application can simply discard all the > visibility info > it > > has and just draw everything, until it has vis information > for the new > > camera position. > > A couple things.. originally I didn't think this was a big > deal, but later > changed... I think making assumptions is bad, and I > definately think that > consistant framerate is more important than a high > instantanious. Nothings > more annoying than jumping through a portal in a game to get > dropped frames > for a few frames before it gets everything sorted out and > cached and gets > back up to 60fps (or whatever the target is). Making the > granularity of > rejection sub-frame should help avoid this... Also, when > you're using an in > engine cinematic approach, it's really annoying when you get > a dropped frame > every time the camera cuts. This is highly app-specific though - the app can happily modify its interpretation of the results based on the above. Whereas if you leave it all up to the hardware, it can get very had to get consistent framerates. Your method actually _removes_ control from the app. That is not going to help to get consistent, smooth framerates - if anything, it will give you the opposite. > > No no no. There is no way you could get the _hardware_ to > reject state > > change info and texture downloads because of some > internally fed-back > state. > > Drivers rely very heavily on persistent state, i.e. not > having to send > state > > that doesn't change. If the driver now doesn't know whether > the state > change > > info it sent actually made it to the chip's registers or > not, that's just > > madness - the driver will go potty trying to figure out > what it does and > > doesn't need to send over the bus. Ditto for downloading > textures. Since > the > > driver can't know whether the hardware is going to reject > the tris or not, > > it always has to do the texture downloads anyway. And if > the hardware is > > doing the downloads instead (e.g. AGP or cached-AGP > textures), then either > > solution works - the fast-Z-rejection of pixels means those > textures never > > get fetched. > > Ouch.. hadn't thought much about the driver related issues. > However, *if* > state was constant accross a primative, it's not a problem. > That would be a > big issue, but I don't think it's insurmountable. Except that was one of your supposed "plus" points - that state wouldn't have to be changed if the object was rejected! > So, maybe I'm foggy on some details.. but I still thing early > rejection in > rasterization pipes is a *good thing*tm :). (1) There _is_ early rejection in rasterisation pipes. Hierarchial Z is massively cool, but relatively conventional. (2) It's not the rasteriser that needs the speeding up. We have some awesomely fast rasterisers at the moment. But the T&L is a bottleneck under some situations (complex lighting, high tesselation), and the AGP bus is the bottleneck under others. Those are the things that need conserving right now. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. |
From: jason w. <jas...@po...> - 2000-08-20 23:13:23
|
> So what you are asking is for results at (2) to affect what is done at (1) > on the very next triangle. the only way this can be done is to hold the No.. I was very specific that 'primative' meant a set of triangles, as in a indexed array, or a higher level surface like a patch (such as the n-patches). Of course trying to make the ganularity at the triangle level would fall apart.. there'd be no gain. However, on arrays of traingles, such as my earlier example of a character behind a wall... > OK, well there is only the Radeon that does this at the moment. It's cool, > but it's nowhere near commonplace. And it still requires that the test > triangles be rasterised - converted into pixel representation. The details > of Z-testing are not important. The fact that you first have to rasterise > them is the killer. Right.. but I'm not talking about rasterizing test triangles, I'm talking about using a bounding volume directly. There are *much* faster ways to getting to a conservative test of "will everything within this bounding region be behind everything already rendered to this screen area." Such a conservative test need not know the actual z for the entire region, it merely needs to know the range of z in the bounding primative's screen region, as well as the range containted in the bounding primative (after perspective). Although this seems perhaps overly conservative, I think most games exhibit so much spatial locality that such a test will still have good gains. You need not even render front->back, merely render world geometry befor character geometry (a typical fps thing anyhow), and you'll get gains. Maybe not worth the gates it would take to impliment the feature tho, I'll grant you that. > You: draw & test bounding object. Draw real object. > Me: test previous frame's object results. Draw this frame's low-rez object. nope.. What I'm proposing is a hint that can be used to early out or reject the current triangle array. The hint could even be checked in parellel with processing the first few verticies of the array to avoid any gaps in the pipe for when the hint doesn't cull the array. > Plus, you also had to send down the polygon data for your bounding object, > while I didn't. It's usually small for a bounding object, but it's not zero. > > [snip stuff that also isn't right, but...] > > > You misunderstood.. I never said anything about drawing > > anything. Just a > > bounding volume hint, which is a very different thing. > > There's plenty of > > existing work for converting a OBB to exact screen regions > > *very* quickly > > without resorting to scan conversion/rasterization. We're > > only interested in > > conservative values as well, since it's common for a > > character model to be > > completely seperate from a set of wall polygons. > > OK, if you did this sort of incredibly conservative (i.e. add hardware to > T&L OBB in some quick but conservative way, find enclosing screen BB, test > all Z values using some sort of quickie rectangle rasteriser, somehow > dodging the bullet of concurrent Z-buffer access with polys that are > currently being rasterised), maybe it would work sometimes. But remember > that you're finding the screen BB of an OBB. So the area being tested is > quite big compared to your original shape. And that's still a decent chunk > of hardware. Yes, a very large chunk of hardware.. but hey, so far 3dfx and nvidia have shown no fear of going to some of the largest gate counts ever attempted. I never said I was sure it was a good tradeoff to impliment, but that I think that early rejection in rasterization *is* a good thing in the interests of increased scalability. > I _still_ don't see what is so bad about adding zero hardware to existing > chips and using some frame-to-frame coherency. I never said anything was bad about it.. as an application level scheme where you know about, plan around it's limits, it should work fine. But it's not a general purpose extension, like adding early rejection to the hardware would be. Now, the rejection would have little effect on large classes of scenes, but on the other hand, 90 of all triangles rendered in the universe are from quake or a similar game... in other words, the relevant class, which has lots of spatial coherency, is definately dominate. > latency is terrible. What you are relying on is a short latency. And in the no.. I'm relying on a latency of the delayed z bounds being somewhere on the scale of a couple 'primatives'. > This is highly app-specific though - the app can happily modify its > interpretation of the results based on the above. Whereas if you leave it > all up to the hardware, it can get very had to get consistent framerates. true. > Your method actually _removes_ control from the app. That is not going to > help to get consistent, smooth framerates - if anything, it will give you > the opposite. no.. it can always choose not to give the hints :). However, because the granularity is finer on hardware rejection, moments when you could get close to a dropped frame are not frame specific so much as action specific: like a 1000 poly character walks around a corner and becomes visable. But on a camera cut or teleportation, the hardware rejection can still work, and still effectively shorten the rendering time of that specific frame.. the frame2frame cohearency is powerless to do anything until the next frame. > > Ouch.. hadn't thought much about the driver related issues. > > However, *if* > > state was constant accross a primative, it's not a problem. > > That would be a > > big issue, but I don't think it's insurmountable. > > Except that was one of your supposed "plus" points - that state wouldn't > have to be changed if the object was rejected! True.. so it probibly wouldn't help you reduce texture downloads. But then again, reducing bus traffic isn't the primary goal of this sort of rejection: it's reducing invisable fill/primative consumption. > (1) There _is_ early rejection in rasterisation pipes. Hierarchial Z is > massively cool, but relatively conventional. Right... I need to go read up on that (one of those things on the list). And the truely wicked implimentation: use both the app level frame2 frame coherancy and the hardware rejection :). |
From: jason w. <jas...@po...> - 2000-08-20 23:15:24
|
it appears apple holds a patent on heirarchical z-buffering. However, they seem to have a patent on multi-pass rendering as well, so perhaps it's a non-issue. |
From: Tom F. <to...@mu...> - 2000-08-21 20:49:13
|
You can find it perfectly of course, but if you can't be bothered, or you're doing lots of these (a common case), then a good conservative approximation of the real size is to find the radius of a circle that is r units closer to the viewer than S, i.e. just move P closer by r. It's virtually the same thing for sensible-distance-away spheres, it's only as the sphere gets really very close that it starts to give poor results, and this is usually acceptable in things like hierachial culling. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. > -----Original Message----- > From: Jamie Fowlston [mailto:j.f...@re...] > Sent: 21 August 2000 12:05 > To: gda...@li... > Subject: Re: [Algorithms] portal engines in outdoor environments > > > > Spheres are particularly nice because the sphere radius is > the same as the > > circles radius after projection > > Careful. There's a common misconception here (which you may > or may not have made > :). > > Let the sphere have radius r. > Let S be the centre of the sphere. > Let V be some vector perpendicular to the view vector of length r. > > Let P = S + V > > Some people claim that projecting point P gives you a point > on the edge of the > circle which is the rasterisation of the sphere. This is not true. > > Demonstration that this is so (in 2D, so hopefully it's clearer :) : > > Take a circle with centre C. > Place an arbitrary point P outside the circle. The closer it > is to the circle, > the clearer my point (unintentional... sorry:) should be. > Let the 2 tangents to the circle passing through P be T1, T2. > Let P1 be the point of intersection between T1 and C. Define > P2 similarly. > > It should be clear that the projections of P1 and P2 are > equivalent to points on > the edge of the rasterisation. > But (P - C) is not perpendicular to (T1 - C) or (T2 - C). > > Although as | P - C | approaches infinity, they approach > perpendicular. If you > can be sure you'll never be close enough to appreciate the > error, then you'll be > fine :) > > > Back to the sphere: this means that the true rasterisation of > the sphere is > larger than the circle calculated by projecting P. > > I'll expand more if anybody needs it... or gives a monkey's :) > > Jamie > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list > |
From: Ignacio C. <i6...@ho...> - 2000-08-18 18:05:44
|
Tom Forsyth wrote: > I agree - it's a nightmare case for geometric occlusion. Delayed Z-visiblity > is our only hope. A shame we've been waiting two DX versions for it, and > it's going to miss this one as well. Grrrrrrr. So while we have fancy > vertex shaders that nothing on Earth supports, we don't have API support for > something that even crusty old hardware like Permedia2 and Voodoo1 supports. > Life... don't talk to me about life. :-) what do you mean with 'delayed Z-visibility'? could you explain that a bit more? Ignacio Castano ca...@cr... |
From: John R. <jra...@ve...> - 2000-08-18 18:13:32
|
If you had the ability to ask questions of the zbuffer like "is this bounding volume visable?" (yes/no) in an extremely high speed fashion then you could do gross culling on an object by object basis using the zbuffer contents during the render stage. Some of the old spanbuffer software renderers could do this, because it was a fairly cheap question to ask, especially relative to the time it took to software render an entire 3d model. But, since you can't ask zbuffers these kinds of questions it's a moot point. John -----Original Message----- From: Ignacio Castano [mailto:i6...@ho...] Sent: Friday, August 18, 2000 1:06 PM To: gda...@li... Subject: RE: [Algorithms] portal engines in outdoor environments Tom Forsyth wrote: > I agree - it's a nightmare case for geometric occlusion. Delayed Z-visiblity > is our only hope. A shame we've been waiting two DX versions for it, and > it's going to miss this one as well. Grrrrrrr. So while we have fancy > vertex shaders that nothing on Earth supports, we don't have API support for > something that even crusty old hardware like Permedia2 and Voodoo1 supports. > Life... don't talk to me about life. :-) what do you mean with 'delayed Z-visibility'? could you explain that a bit more? Ignacio Castano ca...@cr... _______________________________________________ GDAlgorithms-list mailing list GDA...@li... http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |
From: gl <gl...@nt...> - 2000-08-18 18:28:14
|
As I understand it, the idea with delayed z-testing is that whilst it's horribly slow to get 'did this primitive I just tried to render get totally z-rejected?' feedback from the card instantly (due to deep pipelining going on), it's reasonable to get that info back a few frames late without stalling anything. The idea is to have that info determine your objects' LOD, ie. you render your object with a low LOD by default, unless you find out that thing is now visible, when you can bump up the tri count. As the data is delayed, there will be a change from low to hi LOD, but as we're only talking a few frames (I believe), this shouldn't be all that noticable. Note that you can't simply avoid drawing the object, due to the delay - otherwise, it would suddenly appear, a few frames later than it should. In fact, my contention is that if it's only as little as two or three frames, the higher rates you run at, the less significant the delay gets, to the point were you probably don't even see the LOD pop, or you might not even need to render until you're told it's there. Tom, how many frames exactly would we be talking about? -- gl ----- Original Message ----- From: "John Ratcliff" <jra...@ve...> To: <gda...@li...> Sent: Friday, August 18, 2000 7:18 PM Subject: RE: [Algorithms] portal engines in outdoor environments > If you had the ability to ask questions of the zbuffer like "is this > bounding volume visable?" (yes/no) in an extremely high speed fashion then > you could do gross culling on an object by object basis using the zbuffer > contents during the render stage. Some of the old spanbuffer software > renderers could do this, because it was a fairly cheap question to ask, > especially relative to the time it took to software render an entire 3d > model. > > But, since you can't ask zbuffers these kinds of questions it's a moot > point. > > John > > -----Original Message----- > From: Ignacio Castano [mailto:i6...@ho...] > Sent: Friday, August 18, 2000 1:06 PM > To: gda...@li... > Subject: RE: [Algorithms] portal engines in outdoor environments > > > Tom Forsyth wrote: > > I agree - it's a nightmare case for geometric occlusion. Delayed > Z-visiblity > > is our only hope. A shame we've been waiting two DX versions for it, and > > it's going to miss this one as well. Grrrrrrr. So while we have fancy > > vertex shaders that nothing on Earth supports, we don't have API support > for > > something that even crusty old hardware like Permedia2 and Voodoo1 > supports. > > Life... don't talk to me about life. :-) > > what do you mean with 'delayed Z-visibility'? could you explain that a bit > more? > > > Ignacio Castano > ca...@cr... > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list > |
From: Bernd K. <bk...@lo...> - 2000-08-18 19:31:07
|
John Ratcliff writes: > If you had the ability to ask questions of the zbuffer like "is this > bounding volume visable?" (yes/no) in an extremely high speed fashion then > you could do gross culling on an object by object basis using the zbuffer > contents during the render stage. Some of the old spanbuffer software > renderers could do this, because it was a fairly cheap question to ask, > especially relative to the time it took to software render an entire 3d > model. > > But, since you can't ask zbuffers these kinds of questions it's a moot > point. http://oss.sgi.com/projects/ogl-sample/registry/HP/occlusion_test.txt Brian Paul mentioned that he is going to add that to Mesa using the Glide GR_STATS counters. I have no idea which Win32 drivers offer this extension. b. |
From: Brian P. <br...@va...> - 2000-08-18 19:39:20
|
Bernd Kreimeier wrote: > > John Ratcliff writes: > > If you had the ability to ask questions of the zbuffer like "is this > > bounding volume visable?" (yes/no) in an extremely high speed fashion then > > you could do gross culling on an object by object basis using the zbuffer > > contents during the render stage. Some of the old spanbuffer software > > renderers could do this, because it was a fairly cheap question to ask, > > especially relative to the time it took to software render an entire 3d > > model. > > > > But, since you can't ask zbuffers these kinds of questions it's a moot > > point. > > http://oss.sgi.com/projects/ogl-sample/registry/HP/occlusion_test.txt > > Brian Paul mentioned that he is going to add that to Mesa > using the Glide GR_STATS counters. I have no idea which Win32 > drivers offer this extension. The actual value of this extension is questionable. The problem is you have to do a read-back from the hardware to get the occlusion test result and the hit from doing that can be substantial. The extension works now (in a development branch of the 3dfx DRI driver) but I haven't done any performance analysis. -Brian |
From: Bernd K. <bk...@lo...> - 2000-08-18 21:22:12
|
Brian Paul writes: > The actual value of this extension is questionable. The problem > is you have to do a read-back from the hardware to get the > occlusion test result and the hit from doing that can be substantial. Someone here said that the idea is to count on frame coherence and use results from previous frames as an indicator. But yeah, just on gut level I would not be suprised to see any kind of readback interfere with the performance of pipelined architectures. b. |
From: gl <gl...@nt...> - 2000-08-18 21:31:54
|
I think the idea is that you wait for the data to be spit out at the end of the pipe, so it shouldn't stall. Of course by this time you've queued a few frames and/or are already drawing some of them, hence the delay. -- gl ----- Original Message ----- From: "Bernd Kreimeier" <bk...@lo...> To: "Brian Paul" <br...@va...> Cc: <gda...@li...> Sent: Friday, August 18, 2000 10:22 PM Subject: Re: [Algorithms] portal engines in outdoor environments > Brian Paul writes: > > The actual value of this extension is questionable. The problem > > is you have to do a read-back from the hardware to get the > > occlusion test result and the hit from doing that can be substantial. > > Someone here said that the idea is to count on frame coherence > and use results from previous frames as an indicator. But yeah, > just on gut level I would not be suprised to see any kind of > readback interfere with the performance of pipelined architectures. > > > b. > > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list > |
From: jason w. <jas...@po...> - 2000-08-18 20:22:38
|
Better would be a pipeline that allows you to specify some sort of simple bounding volume for each primative (ie, each index array, display list, whatever). You could build these volumes with no overhead in most games, and providing hardware with the hint would allow it to estimate if the entire primative is obscured, in which case it simply skips the entire primative. This would require finer grained primatives.. which I guess is sort of on the out in these geforce days... but I do imagine that for arrays of moderate size, in the 100's of trys, that the driver buffers and is already looking at the next set. So, it's not exactly a perfect solution, but it seems a relatively simple feature, and avoids the necessity of a readback to application level. I spent a while trying to dream up a good occlusion cull for this sort of environment... if you know fo the Bath model, that's pretty much my desired environement situation. View would mostly be on ground or on top of buildings.. on top of a building looking accross the cityscape is a pathlogical case that breaks nearly every scheme. Polygon level analytic solutions aren't even worth thinking about. Some raster based method like HOM works well in all but the on top of the building situation.. how can you pick a good set of potencial occluders in that case? My last thoughts were of using a hash based on a linear regular octree enumeration, storing a key for each primative. A skip list would be used to store an index on the keys (if you don't want to support dynamic changes to the environment, a simpler structure would work). For each frame, cull to view fustrum, then walk the skiplist in the view in order from near to far, using a coverage test similar to HOM.. the keys would allow for a fairly simple seperation test to be built, but I'm not really sure if that would be faster than just using the key to find screen bounds and check the coverage buffer.. In my case, each primative was going to be a scalable higher order surface of some sort, be it SS, bezier patch or VIPM, so the lowest level of detail would be used to quickly rasterize into the coverage buffer. The coverage buffer would probibly be somewhere around 1/4th the display resolution. Allthough it would be possible for improper rejections to happen in this case due to the resolution and geometric error, I can live with some visability errors on the order of 4 pixels. Thinking back on it now, I think for my situation moving to a geomtry rep that has some explicit idea of what's solid in the environement, like mathews sloppy internal occtree. I've been thinking more about image based geometry representations as well, and have a fuzzy feeling there's some nice structure out there that lends itself to both. |