Re: [PyOpenGL-Users] 2d sprite engine performance.
Brought to you by:
mcfletch
From: Mike C. F. <mcf...@ro...> - 2005-04-06 21:55:12
|
Erik Johnson wrote: >On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher" > > >>Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I >>*think* is around the age of a TNT)) was to max out around 1000 to 1500 >>textured, lit polygons/second (games use (fairly advanced) culling >>algorithms to reduce the number of polygons on-screen for any given >>rendering pass). Eliminating lighting should increase that to around 2 >>or 3000 polys, but that was with the old (small) textures that were >>heavily reduced (32x32 or 64x64). >> >> > >I believe the Rage 128 is roughly equivalent to a TNT2. I am using >mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting >are what make me surprised to be maxing out at 250 polygons. > > To be clear, that 1000 to 1500, was normally array or display-list geometry, *not* individual calls to glVertex (even in C). I'm talking about performance from SGI's (rather well tuned (for a general scenegraph engine)) CosmoPlayer scenegraph engine. With culling it would get you down to a few hundred polygons actually getting drawn for a frame (with some overdraw). Though, again, those were textured, lit and z-buffered, you'd expect a lot more for unlit and non-z-buffered, so 250 is probably very low. >As I said in another message though, I can do 1400 sprites with a >Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly >faster than a Rage128. I don't think the video card is my limiting >factor at the moment. CPU speed seems to make all the difference. > > A P4 also likely has a faster AGP and/or PCI bridge to the graphics card, but I'm not a hardware guy, so I can't really comment on their relative performance. Still, I think you're probably losing most of your time over in Python just now. >Yes, I can see this approach helping. I think that it does conflict >with my current scheme of grouping my small textures onto a big texture >and never changing away from my big texture. With your approach, I >would need to keep my small textures and do texture swaps, but I would >greatly reduce my function call overhead. I guess this is where sorting >by texture comes in. I will have to experiment with this and see how it >affects performance. > >Would creating a unique display list for every sprite be a viable >option? > > Yes, keeping in mind the memory overhead required. Older cards were extremely memory-limited. BTW, you are using Textures, not doing copys for each frame, right? Even if you've got more textures than card memory, letting OpenGL handle the back-and-forth swapping of textures is likely going to be better for performance than anything you're going to do. i.e. use glBindTexture and glGenTextures, not just bald glTexImage2D calls... hmm, you know, it's been so long I'm not even sure you *can* use bald glTexImage2D in OpenGL... I think you can because of the video-display cases... you'd have to be able to if my understanding is correct of common practice there... need to go back to doing raw OpenGL coding sometime soon :) . >>You likely do *not* want to be doing Vertex calls directly >>from Python save to generate a display list (as noted above). Python >>just isn't the right tool for that kind of low-level operation, it has >>too much per-call overhead. If you do that kind of thing you should be >>using array geometry (and be sure you use exactly the correct array type >>for the data-type of the calls you're making to avoid extra copying). >> >> > >The problem that I ran into with vertex arrays is that while a single >call to drawarrays is faster than all the immediate mode calls, the >overhead of building the needed arrays every frame ended up being too >great and making things slower. > >I will try using display lists for individual quads, and hopefully that >will help. > > Ah, there's a problem. You'd want to keep an array handy, with each sprite knowing it's index and updating the array directly, so a sprite's move command would look like: self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta (where delta would be a simple tuple), allowing the array to handle updates in Numpy code. You'd want to use the contiguous() function from PyOpenGL whenever you resize the array, hence the need for the getSpriteVectors level of indirection. Goal there is that you don't *build* the array for each frame (lots of memory copying), but just update it in-place. You have to watch out for rotation problems with that approach, however. Might want special code to watch for and fix skew when rotations are in play for a given sprite. You still pay the copy penalty for the array going over the bus to the card, but at least you're not allocating and de-allocating thousands of Python object references to rebuild the array each frame. Honestly, though, this kind of code gets messy fast enough that I'd avoid it until I'd exhausted the display-list approach. >>Python is slower than C, but OpenGL has an enormous amount of room to >>play. Using higher-level features from the higher-level language can >>make the experience much more rewarding. >> >> > >I get the impression that OpenGL can deliver all the speed I want, I >just seem to be having problems unlocking that speed. > > Good luck! Mike ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |