RE: [Algorithms] VIPM With T&L - what about roam

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> From: Charles Bloom [mailto:cb...@cb...]
> 
[snip]
> >2) In the simple VIPM scheme, just send three indices
> >per tri, and verts and tris are listed in the arrays in
> >progressive split order, the indices need to be updated
> >for triangles whose vertices moved in any splits
> >introduced in the frame, which you precompute as a kind
> >of table lookup of state-change lists.  These changes
> >will generally be scattered through index memory,
> >leading to bad PC-memory cache behavior.  But you
> >hope that only a tiny fraction of the indices need
> >to be updated per frame (this is a very ROAM-like
> >hope ;-).
> 
> This is actually not a problem for the CPU, because you
> never ever read from the index list.  That means all you
> ever do is writes, and you can just fire them off, and
> they get retired asynchronously, and the CPU never stalls.
> If you want to get fancy you can even tell the CPU cache
> not to mirror that memory in cache, just write straight
> to main memory.

In DX8, there are "index buffers" that can be (and usually will be) placed
in AGP memory, so this behaviour will apply automagically - AGP memory is
uncached, and is written to using the writeback queue. I am sure there will
be similar optimisations under OpenGL if possible.

Yes, the data structures being (the collapse/expand data) is strictly
linear, and so well-cached. At the moment, the destination is in system
memory and fairly random, and so is poorly cached (sadly on x86
architectures, the memory is read into the cache, even if you only ever
write to it), but this is a feature of the API rather than the hardware, and
once they move to AGP memory (DX8 is released in about a month or two), this
will be sorted.

> The wonderful Athlon chip can have 32
> outstanding queue'd stores, so this is no problemo.  The
> problem would only arise if you write and read from the same
> data structure, which we would never do.  Note that you must,
> however, wait a while before rendering after you make your
> index changes, or your AGP DMA will stall on the CPU stores
> finishing.  If this were the bottleneck, we'd be golden.

Since drivers typically do around a frame's worth of data buffering anyway,
this is almost never a problem. The writes will be finished well before the
chip needs them.

[snip]

> For example, consider the "skip strips" of El-Sana et. al.

Whoops - just realised that in my previous mails, I've been using
"skipstrips" in an ambiguous way. What I mean is "VIPM skipstrips", i.e.
VIPM strips that just keep the existing strip, but make some tris in the
middle of it degenerate when they collapse an edge. I don't mean that the
V_D_PM part of El-Sana et. al. is used, just the trick of using strips.

[snip]

> Charles Bloom           www.cbloom.com

Tom Forsyth - Muckyfoot bloke.
Whizzing and pasting and pooting through the day.