Re: [PyOpenGL-Users] 2d sprite engine performance.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Erik Johnson wrote:

>On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher"
>  
>
>>Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I
>>*think* is around the age of a TNT)) was to max out around 1000 to 1500
>>textured, lit polygons/second (games use (fairly advanced) culling
>>algorithms to reduce the number of polygons on-screen for any given
>>rendering pass).  Eliminating lighting should increase that to around 2
>>or 3000 polys, but that was with the old (small) textures that were
>>heavily reduced (32x32 or 64x64).
>>    
>>
>
>I believe the Rage 128 is roughly equivalent to a TNT2.  I am using
>mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting
>are what make me surprised to be maxing out at 250 polygons.
>  
>
To be clear, that 1000 to 1500, was normally array or display-list
geometry, *not* individual calls to glVertex (even in C).  I'm talking
about performance from SGI's (rather well tuned (for a general
scenegraph engine)) CosmoPlayer scenegraph engine.  With culling it
would get you down to a few hundred polygons actually getting drawn for
a frame (with some overdraw).  Though, again, those were textured, lit
and z-buffered, you'd expect a lot more for unlit and non-z-buffered, so
250 is probably very low.

>As I said in another message though, I can do 1400 sprites with a
>Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly
>faster than a Rage128.  I don't think the video card is my limiting
>factor at the moment.  CPU speed seems to make all the difference.
>  
>
A P4 also likely has a faster AGP and/or PCI bridge to the graphics
card, but I'm not a hardware guy, so I can't really comment on their
relative performance.  Still, I think you're probably losing most of
your time over in Python just now.

>Yes, I can see this approach helping.  I think that it does conflict
>with my current scheme of grouping my small textures onto a big texture
>and never changing away from my big texture.  With your approach, I
>would need to keep my small textures and do texture swaps, but I would
>greatly reduce my function call overhead.  I guess this is where sorting
>by texture comes in.  I will have to experiment with this and see how it
>affects performance.
>
>Would creating a unique display list for every sprite be a viable
>option?
>  
>
Yes, keeping in mind the memory overhead required.  Older cards were
extremely memory-limited.  BTW, you are using Textures, not doing copys
for each frame, right?  Even if you've got more textures than card
memory, letting OpenGL handle the back-and-forth swapping of textures is
likely going to be better for performance than anything you're going to
do.  i.e. use glBindTexture and glGenTextures, not just bald
glTexImage2D calls... hmm, you know, it's been so long I'm not even sure
you *can* use bald glTexImage2D in OpenGL... I think you can because of
the video-display cases... you'd have to be able to if my understanding
is correct of common practice there... need to go back to doing raw
OpenGL coding sometime soon :) .

>>You likely do *not* want to be doing Vertex calls directly
>>from Python save to generate a display list (as noted above).  Python
>>just isn't the right tool for that kind of low-level operation, it has
>>too much per-call overhead.  If you do that kind of thing you should be
>>using array geometry (and be sure you use exactly the correct array type
>>for the data-type of the calls you're making to avoid extra copying).
>>    
>>
>
>The problem that I ran into with vertex arrays is that while a single
>call to drawarrays is faster than all the immediate mode calls, the
>overhead of building the needed arrays every frame ended up being too
>great and making things slower.
>
>I will try using display lists for individual quads, and hopefully that
>will help.
>  
>
Ah, there's a problem.  You'd want to keep an array handy, with each
sprite knowing it's index and updating the array directly, so a sprite's
move command would look like:

    self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta

(where delta would be a simple tuple), allowing the array to handle
updates in Numpy code.

You'd want to use the contiguous() function from PyOpenGL whenever you
resize the array, hence the need for the getSpriteVectors level of
indirection.  Goal there is that you don't *build* the array for each
frame (lots of memory copying), but just update it in-place.  You have
to watch out for rotation problems with that approach, however.  Might
want special code to watch for and fix skew when rotations are in play
for a given sprite.  You still pay the copy penalty for the array going
over the bus to the card, but at least you're not allocating and
de-allocating thousands of Python object references to rebuild the array
each frame.

Honestly, though, this kind of code gets messy fast enough that I'd
avoid it until I'd exhausted the display-list approach.

>>Python is slower than C, but OpenGL has an enormous amount of room to
>>play.  Using higher-level features from the higher-level language can
>>make the experience much more rewarding.
>>    
>>
>
>I get the impression that OpenGL can deliver all the speed I want, I
>just seem to be having problems unlocking that speed.
>  
>
Good luck!
Mike

________________________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://www.vrplumber.com
  http://blog.vrplumber.com