Re: [PyOpenGL-Users] 2d sprite engine performance.
Brought to you by:
mcfletch
From: <le...@st...> - 2005-04-06 22:27:17
|
My (much more limited than Mike's) performance experience with PyOpenGL agrees with what he was suggesting. I found using non-interleaved arrays with glDrawArrays was the fastest route from my Python code to the screen on a number of medium-old machines including my PowerBook G4. As Mike says, it's critical to build the array once; then just poke your updated X and Y value into it and then call .tostring() and glDrawArrays each frame. To make that more practical to keep a single array allocated, remember that you can ignore certain elements of the array just by moving them far offscreen and letting them be clipped by the hardware. Though counterintuitive from a software point of view, that sort of trick often works well with hardware. Also, while Python code is slower than native C code, that's usually one of the last things to fix. Modern processors (including even your 500MHz G3 ;-) can execute a lot of Python statements. If possible, try to learn how to use the array operations in Numeric to do parallel assignments -- that can often produce order-of-magnitude speedups. Anyway, that's my $0.02! Leo > Erik Johnson wrote: > >>On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher" >> >> >>>Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I >>>*think* is around the age of a TNT)) was to max out around 1000 to 1500 >>>textured, lit polygons/second (games use (fairly advanced) culling >>>algorithms to reduce the number of polygons on-screen for any given >>>rendering pass). Eliminating lighting should increase that to around 2 >>>or 3000 polys, but that was with the old (small) textures that were >>>heavily reduced (32x32 or 64x64). >>> >>> >> >>I believe the Rage 128 is roughly equivalent to a TNT2. I am using >>mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting >>are what make me surprised to be maxing out at 250 polygons. >> >> > To be clear, that 1000 to 1500, was normally array or display-list > geometry, *not* individual calls to glVertex (even in C). I'm talking > about performance from SGI's (rather well tuned (for a general > scenegraph engine)) CosmoPlayer scenegraph engine. With culling it > would get you down to a few hundred polygons actually getting drawn for > a frame (with some overdraw). Though, again, those were textured, lit > and z-buffered, you'd expect a lot more for unlit and non-z-buffered, so > 250 is probably very low. > >>As I said in another message though, I can do 1400 sprites with a >>Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly >>faster than a Rage128. I don't think the video card is my limiting >>factor at the moment. CPU speed seems to make all the difference. >> >> > A P4 also likely has a faster AGP and/or PCI bridge to the graphics > card, but I'm not a hardware guy, so I can't really comment on their > relative performance. Still, I think you're probably losing most of > your time over in Python just now. > >>Yes, I can see this approach helping. I think that it does conflict >>with my current scheme of grouping my small textures onto a big texture >>and never changing away from my big texture. With your approach, I >>would need to keep my small textures and do texture swaps, but I would >>greatly reduce my function call overhead. I guess this is where sorting >>by texture comes in. I will have to experiment with this and see how it >>affects performance. >> >>Would creating a unique display list for every sprite be a viable >>option? >> >> > Yes, keeping in mind the memory overhead required. Older cards were > extremely memory-limited. BTW, you are using Textures, not doing copys > for each frame, right? Even if you've got more textures than card > memory, letting OpenGL handle the back-and-forth swapping of textures is > likely going to be better for performance than anything you're going to > do. i.e. use glBindTexture and glGenTextures, not just bald > glTexImage2D calls... hmm, you know, it's been so long I'm not even sure > you *can* use bald glTexImage2D in OpenGL... I think you can because of > the video-display cases... you'd have to be able to if my understanding > is correct of common practice there... need to go back to doing raw > OpenGL coding sometime soon :) . > >>>You likely do *not* want to be doing Vertex calls directly >>>from Python save to generate a display list (as noted above). Python >>>just isn't the right tool for that kind of low-level operation, it has >>>too much per-call overhead. If you do that kind of thing you should be >>>using array geometry (and be sure you use exactly the correct array type >>>for the data-type of the calls you're making to avoid extra copying). >>> >>> >> >>The problem that I ran into with vertex arrays is that while a single >>call to drawarrays is faster than all the immediate mode calls, the >>overhead of building the needed arrays every frame ended up being too >>great and making things slower. >> >>I will try using display lists for individual quads, and hopefully that >>will help. >> >> > Ah, there's a problem. You'd want to keep an array handy, with each > sprite knowing it's index and updating the array directly, so a sprite's > move command would look like: > > self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta > > (where delta would be a simple tuple), allowing the array to handle > updates in Numpy code. > > You'd want to use the contiguous() function from PyOpenGL whenever you > resize the array, hence the need for the getSpriteVectors level of > indirection. Goal there is that you don't *build* the array for each > frame (lots of memory copying), but just update it in-place. You have > to watch out for rotation problems with that approach, however. Might > want special code to watch for and fix skew when rotations are in play > for a given sprite. You still pay the copy penalty for the array going > over the bus to the card, but at least you're not allocating and > de-allocating thousands of Python object references to rebuild the array > each frame. > > Honestly, though, this kind of code gets messy fast enough that I'd > avoid it until I'd exhausted the display-list approach. > >>>Python is slower than C, but OpenGL has an enormous amount of room to >>>play. Using higher-level features from the higher-level language can >>>make the experience much more rewarding. >>> >>> >> >>I get the impression that OpenGL can deliver all the speed I want, I >>just seem to be having problems unlocking that speed. >> >> > Good luck! > Mike > > ________________________________________________ > Mike C. Fletcher > Designer, VR Plumber, Coder > http://www.vrplumber.com > http://blog.vrplumber.com > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > PyOpenGL Homepage > http://pyopengl.sourceforge.net > _______________________________________________ > PyOpenGL-Users mailing list > PyO...@li... > https://lists.sourceforge.net/lists/listinfo/pyopengl-users > |