Re: [PyOpenGL-Users] 2d sprite engine performance.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher"
> Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I
> *think* is around the age of a TNT)) was to max out around 1000 to 1500
> textured, lit polygons/second (games use (fairly advanced) culling
> algorithms to reduce the number of polygons on-screen for any given
> rendering pass).  Eliminating lighting should increase that to around 2
> or 3000 polys, but that was with the old (small) textures that were
> heavily reduced (32x32 or 64x64).

I believe the Rage 128 is roughly equivalent to a TNT2.  I am using
mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting
are what make me surprised to be maxing out at 250 polygons.

> I wouldn't be surprised if you're running into texture bandwidth
> problems and maybe even simple fill-rate problems.  You may find that
> the card is extremely sensitive to colour mode for its performance (IIRC
> switching a TNT to 16-bit mode would get close to doubling frame-rates
> on our VR system of the time).

I have heard that older cards don't do 32-bit mode well.  I will look
into 16-bit.

As I said in another message though, I can do 1400 sprites with a
Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly
faster than a Rage128.  I don't think the video card is my limiting
factor at the moment.  CPU speed seems to make all the difference.

> Just to be clear:
> 
>     * You *are* running those glBegin...glEnd blocks as display-lists,
>       not immediate-mode calls, right?
>           o You create a display list holding each sprite to draw (you
>             may only need one, depends on the proportions of the sprites)
>           o Sort the sprites by texture (and by potential overlap
>             (virtual Z ordering))
>           o Load the texture
>           o for (x,y,z),sprite in textureset:
>                 + glTranslated( x,y,z )
>                 + glCallList( sprite )

Interesting, I hadn't thought of this approach.

> Python is much slower than equivalent C; to get decent performance you
> do need to use a mechanism that pushes the code down into C.  I normally
> use array geometry myself, but then I normally do 3D work with game-like
> rendering loads.

I understand this principle, I just haven't found a good way to
implement it with large numbers of polys that can move relative to each
other every frame.

> Display lists likely would help if you're currently drawing the polygons
> with run-time calls: create a single "sprite" display list for your
> standard sprite size, call that once for each sprite (after translate
> and texture load) to do the
> glBegin();glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glEnd(...)
> and you've just reduced the number of Python calls by a factor of ~10. 
> That *should* have a significant effect on performance.

Yes, I can see this approach helping.  I think that it does conflict
with my current scheme of grouping my small textures onto a big texture
and never changing away from my big texture.  With your approach, I
would need to keep my small textures and do texture swaps, but I would
greatly reduce my function call overhead.  I guess this is where sorting
by texture comes in.  I will have to experiment with this and see how it
affects performance.

Would creating a unique display list for every sprite be a viable
option?

> >The other optimizations that I can think to try now are cutting out as
> >many glBegins and glEnds as possible, and do big groups of Vertex calls.
> >Or I can try rendering any sprite that won't move for a few frames to 
> >the background, and work around the problem by cutting down on the
> >number of sprites I have on screen.
> >  
> >
> Sounds like a lot of extra bitmap bandwidth (re-storing the
> background).

I've been avoiding trying this approach for this exact reason.

> You likely do *not* want to be doing Vertex calls directly
> from Python save to generate a display list (as noted above).  Python
> just isn't the right tool for that kind of low-level operation, it has
> too much per-call overhead.  If you do that kind of thing you should be
> using array geometry (and be sure you use exactly the correct array type
> for the data-type of the calls you're making to avoid extra copying).

The problem that I ran into with vertex arrays is that while a single
call to drawarrays is faster than all the immediate mode calls, the
overhead of building the needed arrays every frame ended up being too
great and making things slower.

I will try using display lists for individual quads, and hopefully that
will help.

> Python is slower than C, but OpenGL has an enormous amount of room to
> play.  Using higher-level features from the higher-level language can
> make the experience much more rewarding.

I get the impression that OpenGL can deliver all the speed I want, I
just seem to be having problems unlocking that speed.

Thanks for your help,

Erik