Re: [PyOpenGL-Users] 2d sprite engine performance.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

My (much more limited than Mike's) performance experience with PyOpenGL
agrees with what he was suggesting.  I found using non-interleaved arrays
with glDrawArrays was the fastest route from my Python code to the screen
on a number of medium-old machines including my PowerBook G4.  As Mike
says, it's critical to build the array once; then just poke your updated X
and Y value into it and then call .tostring() and glDrawArrays each frame.
 To make that more practical to keep a single array allocated, remember
that you can ignore certain elements of the array just by moving them far
offscreen and letting them be clipped by the hardware.  Though
counterintuitive from a software point of view, that sort of trick often
works well with hardware.

Also, while Python code is slower than native C code, that's usually one
of the last things to fix.  Modern processors (including even your 500MHz
G3 ;-) can execute a lot of Python statements.   If possible, try to learn
how to use the array operations in Numeric to do parallel assignments --
that can often produce order-of-magnitude speedups.

Anyway, that's my $0.02!

Leo

> Erik Johnson wrote:
>
>>On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher"
>>
>>
>>>Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I
>>>*think* is around the age of a TNT)) was to max out around 1000 to 1500
>>>textured, lit polygons/second (games use (fairly advanced) culling
>>>algorithms to reduce the number of polygons on-screen for any given
>>>rendering pass).  Eliminating lighting should increase that to around 2
>>>or 3000 polys, but that was with the old (small) textures that were
>>>heavily reduced (32x32 or 64x64).
>>>
>>>
>>
>>I believe the Rage 128 is roughly equivalent to a TNT2.  I am using
>>mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting
>>are what make me surprised to be maxing out at 250 polygons.
>>
>>
> To be clear, that 1000 to 1500, was normally array or display-list
> geometry, *not* individual calls to glVertex (even in C).  I'm talking
> about performance from SGI's (rather well tuned (for a general
> scenegraph engine)) CosmoPlayer scenegraph engine.  With culling it
> would get you down to a few hundred polygons actually getting drawn for
> a frame (with some overdraw).  Though, again, those were textured, lit
> and z-buffered, you'd expect a lot more for unlit and non-z-buffered, so
> 250 is probably very low.
>
>>As I said in another message though, I can do 1400 sprites with a
>>Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly
>>faster than a Rage128.  I don't think the video card is my limiting
>>factor at the moment.  CPU speed seems to make all the difference.
>>
>>
> A P4 also likely has a faster AGP and/or PCI bridge to the graphics
> card, but I'm not a hardware guy, so I can't really comment on their
> relative performance.  Still, I think you're probably losing most of
> your time over in Python just now.
>
>>Yes, I can see this approach helping.  I think that it does conflict
>>with my current scheme of grouping my small textures onto a big texture
>>and never changing away from my big texture.  With your approach, I
>>would need to keep my small textures and do texture swaps, but I would
>>greatly reduce my function call overhead.  I guess this is where sorting
>>by texture comes in.  I will have to experiment with this and see how it
>>affects performance.
>>
>>Would creating a unique display list for every sprite be a viable
>>option?
>>
>>
> Yes, keeping in mind the memory overhead required.  Older cards were
> extremely memory-limited.  BTW, you are using Textures, not doing copys
> for each frame, right?  Even if you've got more textures than card
> memory, letting OpenGL handle the back-and-forth swapping of textures is
> likely going to be better for performance than anything you're going to
> do.  i.e. use glBindTexture and glGenTextures, not just bald
> glTexImage2D calls... hmm, you know, it's been so long I'm not even sure
> you *can* use bald glTexImage2D in OpenGL... I think you can because of
> the video-display cases... you'd have to be able to if my understanding
> is correct of common practice there... need to go back to doing raw
> OpenGL coding sometime soon :) .
>
>>>You likely do *not* want to be doing Vertex calls directly
>>>from Python save to generate a display list (as noted above).  Python
>>>just isn't the right tool for that kind of low-level operation, it has
>>>too much per-call overhead.  If you do that kind of thing you should be
>>>using array geometry (and be sure you use exactly the correct array type
>>>for the data-type of the calls you're making to avoid extra copying).
>>>
>>>
>>
>>The problem that I ran into with vertex arrays is that while a single
>>call to drawarrays is faster than all the immediate mode calls, the
>>overhead of building the needed arrays every frame ended up being too
>>great and making things slower.
>>
>>I will try using display lists for individual quads, and hopefully that
>>will help.
>>
>>
> Ah, there's a problem.  You'd want to keep an array handy, with each
> sprite knowing it's index and updating the array directly, so a sprite's
> move command would look like:
>
>     self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta
>
> (where delta would be a simple tuple), allowing the array to handle
> updates in Numpy code.
>
> You'd want to use the contiguous() function from PyOpenGL whenever you
> resize the array, hence the need for the getSpriteVectors level of
> indirection.  Goal there is that you don't *build* the array for each
> frame (lots of memory copying), but just update it in-place.  You have
> to watch out for rotation problems with that approach, however.  Might
> want special code to watch for and fix skew when rotations are in play
> for a given sprite.  You still pay the copy penalty for the array going
> over the bus to the card, but at least you're not allocating and
> de-allocating thousands of Python object references to rebuild the array
> each frame.
>
> Honestly, though, this kind of code gets messy fast enough that I'd
> avoid it until I'd exhausted the display-list approach.
>
>>>Python is slower than C, but OpenGL has an enormous amount of room to
>>>play.  Using higher-level features from the higher-level language can
>>>make the experience much more rewarding.
>>>
>>>
>>
>>I get the impression that OpenGL can deliver all the speed I want, I
>>just seem to be having problems unlocking that speed.
>>
>>
> Good luck!
> Mike
>
> ________________________________________________
>   Mike C. Fletcher
>   Designer, VR Plumber, Coder
>   http://www.vrplumber.com
>   http://blog.vrplumber.com
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> PyOpenGL Homepage
> http://pyopengl.sourceforge.net
> _______________________________________________
> PyOpenGL-Users mailing list
> PyO...@li...
> https://lists.sourceforge.net/lists/listinfo/pyopengl-users
>