My (much more limited than Mike's) performance experience with PyOpenGL
agrees with what he was suggesting. I found using non-interleaved arrays
with glDrawArrays was the fastest route from my Python code to the screen
on a number of medium-old machines including my PowerBook G4. As Mike
says, it's critical to build the array once; then just poke your updated X
and Y value into it and then call .tostring() and glDrawArrays each frame.
To make that more practical to keep a single array allocated, remember
that you can ignore certain elements of the array just by moving them far
offscreen and letting them be clipped by the hardware. Though
counterintuitive from a software point of view, that sort of trick often
works well with hardware.
Also, while Python code is slower than native C code, that's usually one
of the last things to fix. Modern processors (including even your 500MHz
G3 ;-) can execute a lot of Python statements. If possible, try to learn
how to use the array operations in Numeric to do parallel assignments --
that can often produce order-of-magnitude speedups.
Anyway, that's my $0.02!
> Erik Johnson wrote:
>>On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher"
>>>Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I
>>>*think* is around the age of a TNT)) was to max out around 1000 to 1500
>>>textured, lit polygons/second (games use (fairly advanced) culling
>>>algorithms to reduce the number of polygons on-screen for any given
>>>rendering pass). Eliminating lighting should increase that to around 2
>>>or 3000 polys, but that was with the old (small) textures that were
>>>heavily reduced (32x32 or 64x64).
>>I believe the Rage 128 is roughly equivalent to a TNT2. I am using
>>mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting
>>are what make me surprised to be maxing out at 250 polygons.
> To be clear, that 1000 to 1500, was normally array or display-list
> geometry, *not* individual calls to glVertex (even in C). I'm talking
> about performance from SGI's (rather well tuned (for a general
> scenegraph engine)) CosmoPlayer scenegraph engine. With culling it
> would get you down to a few hundred polygons actually getting drawn for
> a frame (with some overdraw). Though, again, those were textured, lit
> and z-buffered, you'd expect a lot more for unlit and non-z-buffered, so
> 250 is probably very low.
>>As I said in another message though, I can do 1400 sprites with a
>>Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly
>>faster than a Rage128. I don't think the video card is my limiting
>>factor at the moment. CPU speed seems to make all the difference.
> A P4 also likely has a faster AGP and/or PCI bridge to the graphics
> card, but I'm not a hardware guy, so I can't really comment on their
> relative performance. Still, I think you're probably losing most of
> your time over in Python just now.
>>Yes, I can see this approach helping. I think that it does conflict
>>with my current scheme of grouping my small textures onto a big texture
>>and never changing away from my big texture. With your approach, I
>>would need to keep my small textures and do texture swaps, but I would
>>greatly reduce my function call overhead. I guess this is where sorting
>>by texture comes in. I will have to experiment with this and see how it
>>Would creating a unique display list for every sprite be a viable
> Yes, keeping in mind the memory overhead required. Older cards were
> extremely memory-limited. BTW, you are using Textures, not doing copys
> for each frame, right? Even if you've got more textures than card
> memory, letting OpenGL handle the back-and-forth swapping of textures is
> likely going to be better for performance than anything you're going to
> do. i.e. use glBindTexture and glGenTextures, not just bald
> glTexImage2D calls... hmm, you know, it's been so long I'm not even sure
> you *can* use bald glTexImage2D in OpenGL... I think you can because of
> the video-display cases... you'd have to be able to if my understanding
> is correct of common practice there... need to go back to doing raw
> OpenGL coding sometime soon :) .
>>>You likely do *not* want to be doing Vertex calls directly
>>>from Python save to generate a display list (as noted above). Python
>>>just isn't the right tool for that kind of low-level operation, it has
>>>too much per-call overhead. If you do that kind of thing you should be
>>>using array geometry (and be sure you use exactly the correct array type
>>>for the data-type of the calls you're making to avoid extra copying).
>>The problem that I ran into with vertex arrays is that while a single
>>call to drawarrays is faster than all the immediate mode calls, the
>>overhead of building the needed arrays every frame ended up being too
>>great and making things slower.
>>I will try using display lists for individual quads, and hopefully that
> Ah, there's a problem. You'd want to keep an array handy, with each
> sprite knowing it's index and updating the array directly, so a sprite's
> move command would look like:
> self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta
> (where delta would be a simple tuple), allowing the array to handle
> updates in Numpy code.
> You'd want to use the contiguous() function from PyOpenGL whenever you
> resize the array, hence the need for the getSpriteVectors level of
> indirection. Goal there is that you don't *build* the array for each
> frame (lots of memory copying), but just update it in-place. You have
> to watch out for rotation problems with that approach, however. Might
> want special code to watch for and fix skew when rotations are in play
> for a given sprite. You still pay the copy penalty for the array going
> over the bus to the card, but at least you're not allocating and
> de-allocating thousands of Python object references to rebuild the array
> each frame.
> Honestly, though, this kind of code gets messy fast enough that I'd
> avoid it until I'd exhausted the display-list approach.
>>>Python is slower than C, but OpenGL has an enormous amount of room to
>>>play. Using higher-level features from the higher-level language can
>>>make the experience much more rewarding.
>>I get the impression that OpenGL can deliver all the speed I want, I
>>just seem to be having problems unlocking that speed.
> Good luck!
> Mike C. Fletcher
> Designer, VR Plumber, Coder
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> PyOpenGL Homepage
> PyOpenGL-Users mailing list