RE: [Algorithms] VIPM With T&L - what about roam

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> From: Mark Duchaineau [mailto:duc...@ll...]
> 
> Hmmm...I just skimmed your web page on VIPM and that explains a lot.
> I am concerned about the three indices per tri rather than 
> what you get
> with strips (for infinitely long strips this gets down to 1). 
>  We are talking
> dynamic loads to graphics-card memory in general, as you swap in and
> out chunks, and taking up room on the limited graphics 
> memory, and depending
> on the chip taking additional bandwidth to access those 
> indices.  In short,
> I think in general you need to minimize the number of indices 
> that you send to
> 
> the hardware or store there (same for vert coords and texture 
> coords etc).
> The O(number of new verts) applies mainly to PC memory unless you have
> really fine-grained ways to send a few more verts/indices 
> here and there to
> the graphics card.  Unless you are assuming that the indices 
> are all getting
> shipped over the AGP bus every frame?  Also, how do you manage to
> keep the array continguous in graphics card memory if you 
> don't allocate
> enough space there to ship the whole thing?  But these are a quibbles.

In practice the extra bandwidth of the indices is pretty tiny. One DWORD per
tri? Peanuts, considering your average fairly efficiant vertex-caching
scheme will need to load a whole vertex (around 32 DWORDS) per tri.

But versions that use strips have certainly been considered on this list
(the "skipstrips" stuff). They in turn suffer because as the number of tris
decreases, the number of indices doesn't. With list VIPM, tris that get
removed fall off the end of the index list and never get considered. With
skipstrips, they are still there, they just become degenerate.

The real point of skipstrips is not to save memory/bandwidth, but (a) to
increase vertex cache efficiency (which the lists method isn't great at),
and to be friendly to bits of hardware (specifically a certain console) that
like strips, but not much else.

And yes, on all current implementations of PC hardware, both the indices and
the vertices get shipped over the AGP bus every frame. Even the GeForce,
which in theory can use video memory vertex data, in practice doesn't. The
reason is that if you put your vertices in video memory, they just fight for
precious pixel & texel bandwidth, and your nice wide AGP4x bus sits there
idle. There are very few, if any, cases where the D3D driver (which is free
to choose where it puts its vertices) decides to put them in videomemory.
This is also true of the Radeon AFAIK, and likely to stay that way for the
foreseeable future.

> The term VIPM is misleading and restrictive--the concept 
> deserves better.
> The idea really has to do with sending things *within* chunks 
> in a "static"
> progressive order, taking advantage of caching effects to allow only
> an array-range update or trickle of geometry each frame, and 
> relying on some
> other VD processing at the macro level.  The PM is the restrictive
> part--you can do this with *any* progressive stream that doesn't move
> vertices.  The results of doing a split-only ROAM without 
> screen-projecting
> the error produces such a progressive sequence, as does the 
> more general
> queue-based refinement of a DAG style hierarcy.  VIPM is misleading in
> that it makes one think "view independent" for the big 
> picture, which is
> not the intention.  It is a progressive vertex array, really, 
> and you would
> like to have a few thousand of them active, with a few 
> hundred verts each.
> Ideally.  I'd be curious what current graphics hardware can handle.

Indeed - the "VI" bit only refers to a small chunk at a time (I aim for
around a thousand tris or so per batch).

I am currently getting around 2-3 million tris/sec on a GeForce1+Athlon600,
though my vertex cache efficiency is currently poor, and the GeForce1 is
known to have ... um ... "issues" with indexed lists (I believe the driver
has to do some work before sending them down to the chip - this is unique to
the GeForce1 - it's fixed in GF2, and indeed every other chip for the past
few years).

I really need to try the code on a GF2 or Radeon to see what the real speed
is. But I do know that the CPU load due to the processing I have to do
because of the VIPM is miniscule. The current bottleneck is the driver
working around the hardware features.

Using skipstrips would probably be much better for a GeForce1, since it can
handle indexed strips natively, but I haven't found the time to do that yet.

> --Mark D.

Tom Forsyth - Muckyfoot bloke.
Whizzing and pasting and pooting through the day.