Re: [Algorithms] VIPM With T&L - what about roam

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Tom,

So the picture you and Charles are painting is this (see if
I get this right):

1) send verts attrribs (x,y,z,u,v,...) and index arrays
across the AGP bus each frame, and let the textures and
frame buffer dominate the on-card memory bandwidth.
If all your lighting is done in textures/normal maps, and
if you use tri bintree meshes per surface "patch", then
the info per vert is (x,y,z): 3 floats=12 bytes, plus
(u,v): 2 shorts=4 bytes, total=16 bytes.  Since you are
sending each vert across the AGP bus, and there is only
a dinky little cache on the other end, you have to be very
careful to arrange the order the verts are indexed to
avoid sending them multiple times.  This is of coarse
dependent on what the hardware's replacement
strategy is and what the cache size is.

2) In the simple VIPM scheme, just send three indices
per tri, and verts and tris are listed in the arrays in
progressive split order, the indices need to be updated
for triangles whose vertices moved in any splits
introduced in the frame, which you precompute as a kind
of table lookup of state-change lists.  These changes
will generally be scattered through index memory,
leading to bad PC-memory cache behavior.  But you
hope that only a tiny fraction of the indices need
to be updated per frame (this is a very ROAM-like
hope ;-).  The state-change info takes 14 bytes
per vert according to Charles' web page, so you
are almost doubling the mem per vert.  If your
progressive scheme was tri bintree split-only order,
then there is no additional storage for index changes,
you just know what they are based on which diamonds
(which correspond one-to-one with verts) are split
(Charles alluded to this on his VIPM page).

Per frame index transmission across the AGP bus per
tri is 3 shorts=6 bytes.  So if you are very lucky and
send each vertex (16 bytes) once, then on average you
have 2 tris per vert and so 16+12=28 bytes per vert
including indices.  If you are at "infinite strip optimum"
you get each vertex sent twice, leading to 2*16+12=44
bytes sent per vert including indices.  Let's imagine
you using a graphics chip capable of 30M tris/sec,
and you want to actually achieve this (ha!): this
would mean pumping 840-1320M bytes/sec
over the bus.  Okay, the bus can handle this
in theory on AGP4x (1GB/sec) on the wildly
optimistic side, but not in any real situation.
Also, this is sucking up a big chunk of your PC
memory system bandwith *continuously*,
so the rest of your app is going to take a
performance nose-dive.  So...does it make sense
to put some geometry info into graphics-card mem?

Of course the optimistic scenario requires extreme
care in the order the tris are listed and indexed.
Since this is not a single static mesh, you have to
come up with index orders that are best for the
whole range of surfaces you get, not just one.
If you really want to minimize the number of times
verts are sent, you need to allow much more
index manipulation per frame to optimize, whether
via precomputed state-change lists or through
some yet unknown on-the-fly technique.

3) In the "stripped" version of VIPM, cover the chunk
with strips (chosen in a particular way?) and fiddle with
the indices just as in case (2).  But keep drawing the
same strips.  This means you send the index data
for the whole chunk at full res.  This limits how
big a swing in resolution you can have in a chunk
before this cost dominates.  If you are at full res
then the index cost is 1/3 of case (2).  If you are
at 1/3 res then the cost is the same as case (2).
Since you are trying to force strip order, then you
will generally do no better than case (2) for
vertex on-card cache coherence, and probably
worse.  The card has to expend some effort in
theory to eliminate the degenerate tris, although
this could be negligible for a good card/driver.
I don't see this as either a big win or big loss
versus the simple scheme, so I would tend to
go simple.

Of course, you could use the incremental stripping
idea from the ROAM paper, which works on any
locally updating mesh including PM.  Since you
are clearly hoping for coherence in the index-update
step, this is a cheap way to make pretty good strips.
Plus it avoids the issue of loosing vertex-cache
coherence for any but the one mesh you optmized for.

--Mark D.

Tom Forsyth wrote:

> Oops. Yes.
>
> Tom Forsyth - Muckyfoot bloke.
> Whizzing and pasting and pooting through the day.
>
> > From: Tony Cox [mailto:to...@mi...]
> >
> > >In practice the extra bandwidth of the indices is pretty
> > tiny. One DWORD
> > per
> > >tri? Peanuts, considering your average fairly efficiant
> > vertex-caching
> > >scheme will need to load a whole vertex (around 32 DWORDS) per tri.
> >
> > You mean 32 BYTEs not DWORDs, right Tom? Still much bigger
> > than the index
> > data though. Your other comments about lists versus strips
> > for VIPM seem
> > right on the money.
> >
> > Tony Cox - DirectX Luminary