RE: [Algorithms] VIPM With T&L - what about roam

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Let me be clear - I'm not saying that an app should choose to stream
vertex/index data across the AGP bus. I am saying (a) the app has no choice,
it is determined by the driver - this is true of both OpenGL and D3D, and
(b) this is what real, existing hardware does! So it's a bit of a
non-discussion. :-) Hardware does stream this data across the AGP bus, and
we need to deal with that.

Incidentally, U and V values are floating-point numbers, and so are 4 bytes
long. Yes, we could compress them to 2-byte fixed-point WORDs, but no
hardware does this (sadly).

As far as skipstrip vs list (both versions indexed), my feelings about the
two are these:

(a) At full detail, a list will give better vertex cache behaviour (since it
can do fractal-like patterns better that scale well to any cache size).

(b) However, the basic list case needs tris to be in discard order, which
almost completely removes this advantages, and in fact makes the vertex
cache performance much worse.

(c) But you could keep the list in vertex-cache-optimal order, and then just
make the binned tris degenerate, instead of dropping them off the end of the
list. This increases index bandwidth, but also increases vertex cache
behaviour, which IMHO is much more important.

(d) Skipstrips are a "no brainer" for some platforms (e.g. GF1, PS2, maybe
some other consoles). They are obviously faster.

(e) Thinking about it, the difference in complexity between the two is
minimal. For skipstrips you need to find optimal strip order, but then for
lists you also need to find an optimal order (you are not as constrained,
but this just makes the combinatorial explosions worse, not better). And for
both, doing the changes is the same code, it's still just changing index
values. So actually I don't think there is any great difference in
complexity.

So, skipstrips give better vertex cache behaviour than the basic lists
version, but they use the same index bandwidth whatever the actual number of
tris displayed. The modified version of lists, that orders them in
vertex-cache order, would get better vertex-cache performance, but the same
worse index bandwidth behaviour.

So when deciding whether to do skipstrips or "skiplists", I guess it's all
down to which (a) is better supported by hardware and (b) gives the best
vertex-cache performance. Remember that the strip will _not_ have a third
the number of indices of the list - it has to restart strips, turn corners,
etc, and those all bring the number of indices up. And that doubling the
number of indices to get 10% better vertex caching is probably worth it.

I'll have to have a look at the ROAM incremental stripping thing - would be
interesting.

Tom Forsyth - Muckyfoot bloke.
Whizzing and pasting and pooting through the day.

> -----Original Message-----
> From: Mark Duchaineau [mailto:duc...@ll...]
> Sent: 14 September 2000 19:07
> To: gda...@li...
> Subject: Re: [Algorithms] VIPM With T&L - what about roam
> 
> 
> Tom,
> 
> So the picture you and Charles are painting is this (see if
> I get this right):
> 
> 1) send verts attrribs (x,y,z,u,v,...) and index arrays
> across the AGP bus each frame, and let the textures and
> frame buffer dominate the on-card memory bandwidth.
> If all your lighting is done in textures/normal maps, and
> if you use tri bintree meshes per surface "patch", then
> the info per vert is (x,y,z): 3 floats=12 bytes, plus
> (u,v): 2 shorts=4 bytes, total=16 bytes.  Since you are
> sending each vert across the AGP bus, and there is only
> a dinky little cache on the other end, you have to be very
> careful to arrange the order the verts are indexed to
> avoid sending them multiple times.  This is of coarse
> dependent on what the hardware's replacement
> strategy is and what the cache size is.
> 
> 2) In the simple VIPM scheme, just send three indices
> per tri, and verts and tris are listed in the arrays in
> progressive split order, the indices need to be updated
> for triangles whose vertices moved in any splits
> introduced in the frame, which you precompute as a kind
> of table lookup of state-change lists.  These changes
> will generally be scattered through index memory,
> leading to bad PC-memory cache behavior.  But you
> hope that only a tiny fraction of the indices need
> to be updated per frame (this is a very ROAM-like
> hope ;-).  The state-change info takes 14 bytes
> per vert according to Charles' web page, so you
> are almost doubling the mem per vert.  If your
> progressive scheme was tri bintree split-only order,
> then there is no additional storage for index changes,
> you just know what they are based on which diamonds
> (which correspond one-to-one with verts) are split
> (Charles alluded to this on his VIPM page).
> 
> Per frame index transmission across the AGP bus per
> tri is 3 shorts=6 bytes.  So if you are very lucky and
> send each vertex (16 bytes) once, then on average you
> have 2 tris per vert and so 16+12=28 bytes per vert
> including indices.  If you are at "infinite strip optimum"
> you get each vertex sent twice, leading to 2*16+12=44
> bytes sent per vert including indices.  Let's imagine
> you using a graphics chip capable of 30M tris/sec,
> and you want to actually achieve this (ha!): this
> would mean pumping 840-1320M bytes/sec
> over the bus.  Okay, the bus can handle this
> in theory on AGP4x (1GB/sec) on the wildly
> optimistic side, but not in any real situation.
> Also, this is sucking up a big chunk of your PC
> memory system bandwith *continuously*,
> so the rest of your app is going to take a
> performance nose-dive.  So...does it make sense
> to put some geometry info into graphics-card mem?
> 
> Of course the optimistic scenario requires extreme
> care in the order the tris are listed and indexed.
> Since this is not a single static mesh, you have to
> come up with index orders that are best for the
> whole range of surfaces you get, not just one.
> If you really want to minimize the number of times
> verts are sent, you need to allow much more
> index manipulation per frame to optimize, whether
> via precomputed state-change lists or through
> some yet unknown on-the-fly technique.
> 
> 3) In the "stripped" version of VIPM, cover the chunk
> with strips (chosen in a particular way?) and fiddle with
> the indices just as in case (2).  But keep drawing the
> same strips.  This means you send the index data
> for the whole chunk at full res.  This limits how
> big a swing in resolution you can have in a chunk
> before this cost dominates.  If you are at full res
> then the index cost is 1/3 of case (2).  If you are
> at 1/3 res then the cost is the same as case (2).
> Since you are trying to force strip order, then you
> will generally do no better than case (2) for
> vertex on-card cache coherence, and probably
> worse.  The card has to expend some effort in
> theory to eliminate the degenerate tris, although
> this could be negligible for a good card/driver.
> I don't see this as either a big win or big loss
> versus the simple scheme, so I would tend to
> go simple.
> 
> Of course, you could use the incremental stripping
> idea from the ROAM paper, which works on any
> locally updating mesh including PM.  Since you
> are clearly hoping for coherence in the index-update
> step, this is a cheap way to make pretty good strips.
> Plus it avoids the issue of loosing vertex-cache
> coherence for any but the one mesh you optmized for.
> 
> --Mark D.