Thread: [PyOpenGL-Users] 2d sprite engine performance.
Brought to you by:
mcfletch
From: Erik J. <ejo...@fa...> - 2005-04-05 04:55:05
|
For a few months I have been doing hobby work on a 2d tile based game using PyOpenGL. I find that I am running into performance limitations on older hardware, and I hope that the list might have some advice. First let me explain what I am working with. The code runs strictly in ortho mode, and doesn't use lighting or the depth buffer. All the graphics work is done with individual texture mapped quads used as sprites. All the sprites have an alpha channel. Some of the sprites are rotated. I have set up an experiment using my game engine to draw unmoving sprites in a window and do nothing else. On my 500mhz G3 iMac with a Rage128 Ultra, I can draw about 250 items before the frame rate starts dropping below 30fps. Is this a reasonable number? It seems low to me. Can this class of hardware only handle 250 textured quads in a frame? Is it unreasonable to expect more? Let me explain what I have discovered on my quest for better performance. The profiler indicates that most of the program time is spent in my draw function. Adjusting the number of draw calls I make supports this conclusion. But Apple's OpenGL profiler tells me that only 10% of the app time is spent in OpenGL. I think the Apple tool is lying to me. The engine seems to be CPU bound not GPU bound. An 900mhz cpu with a Geforce 3 runs the game much slower than a 1400mhz cpu with an integrated S3 graphics chipset. Eliminating texture switches by grouping my textures onto a single large texture has helped noticeably. Vertex arrays run slower than immediate mode. Because the sprites can move every frame, the vertex array changes every frame. In my case generating the array and calling glDrawArrays is slower than many glBegin ... glEnd blocks. I think part of my problem is that I want to draw many quads that are constantly moving relative to each other. OpenGL performance tips I have found are targeted at groups of polys that are relatively static. Display lists and vertex arrays don't seem to fit with my sprite approach. The other optimizations that I can think to try now are cutting out as many glBegins and glEnds as possible, and do big groups of Vertex calls. Or I can try rendering any sprite that won't move for a few frames to the background, and work around the problem by cutting down on the number of sprites I have on screen. I would love to hear opinions. Is Python just slow? Should I just give up on older hardware? Does it sound like I am doing something wrong? Are there other optimizations that people can suggest? Thanks! Erik |
From: Simon W. <sim...@gm...> - 2005-04-06 01:05:41
|
On Apr 5, 2005 12:55 PM, Erik Johnson <ejo...@fa...> wrote: > > For a few months I have been doing hobby work on a 2d tile based game > using PyOpenGL. It would seem we have been working on similar projects. My project however, has been much for of an experiment, and a learning tool, than a serious development effort. http://developer.berlios.de/projects/lgt/ > I have set up an experiment using my game engine to draw unmoving > sprites in a window and do nothing else. On my 500mhz G3 iMac with a > Rage128 Ultra, I can draw about 250 items before the frame rate starts > dropping below 30fps. Is this a reasonable number? I don't know much about Mac hardware, but these numbers seem fine to me. Personally, I would me quite impressed if my own 2DGL library had these sorts of figures on similar hardware, but perhaps I have low expectations... Sw. |
From: Erik J. <ejo...@fa...> - 2005-04-06 16:48:14
|
On Wed, 6 Apr 2005 09:05:32 +0800, "Simon Wittber" said: > It would seem we have been working on similar projects. My project > however, has been much for of an experiment, and a learning tool, than > a serious development effort. > > http://developer.berlios.de/projects/lgt/ I will have to take a look at your project. I expect a lot of people have made 2d sprite engines with OpenGL, but I wouldn't think that it has been done in Python all that often. > > I have set up an experiment using my game engine to draw unmoving > > sprites in a window and do nothing else. On my 500mhz G3 iMac with a > > Rage128 Ultra, I can draw about 250 items before the frame rate starts > > dropping below 30fps. Is this a reasonable number? > > I don't know much about Mac hardware, but these numbers seem fine to > me. I ran my demo code on a 2.6ghz Pentium 4 with a Matrox G400 video card, and that machine can handle about 1400 sprites @30fps. I guess that is pretty good, but I had hoped that 3d hardware acceleration would have let me do much better on the low end hardware. Instead it seems that I am very CPU bound. I wonder if I am actually getting anything out of OpenGL, or if I could do better with normal PyGame using SDL surfaces. Perhaps this is a question for the PyGame mailing list. > Personally, I would me quite impressed if my own 2DGL library had > these sorts of figures on similar hardware, but perhaps I have low > expectations... One thing to keep in mind with the numbers that I am quoting is that they are taken from a very simple demo that does nothing but put sprites on the screen and measure fps. The actual game code has more overhead. When I think of the number of textured polygons that a game like Quake can put on the screen, the 250 that I get on my iMac seems so low. I guess a lot of those polys in 3d games are static though, and can take advantage of display lists and vertex arrays. If you are looking to improve performance in your own code, the most obvious win that I found was grouping all of my textures onto a single big texture to avoid texture swaps. Minimizing color changes also helped. Erik |
From: Erik J. <ejo...@fa...> - 2005-04-06 16:57:26
|
On Tue, 5 Apr 2005 10:00:19 -0700, "Andrew Straw" said: ... > But, IF python itself if the > problem (or to test if this is so), I suggest re-coding your main loop > using Pyrex -- you'll be able to call OpenGL directly from the C level > using a syntax that seems very familar. :) If I were to use Pyrex to call the C level OpenGL libs, would I be able to work with the same window with PyOpenGL? Or would it be exclusively one or the other? Would I need to use the C level SDL calls as well for creating my window? Or could I still use PyGame to do the setup for OpenGL called from Pyrex? Thanks, Erik |
From: Mike C. F. <mcf...@ro...> - 2005-04-06 19:58:53
|
Erik Johnson wrote: >On Tue, 5 Apr 2005 10:00:19 -0700, "Andrew Straw" said: >... > > >> But, IF python itself if the >>problem (or to test if this is so), I suggest re-coding your main loop >>using Pyrex -- you'll be able to call OpenGL directly from the C level >>using a syntax that seems very familar. :) >> >> > >If I were to use Pyrex to call the C level OpenGL libs, would I be able >to work with the same window with PyOpenGL? Or would it be exclusively >one or the other? > > You should be able to freely mix them as long as you're loading the same OpenGL library from both, in the end PyRex is just C code, just like PyOpenGL. Only thing to keep in mind is that it likely doesn't handle Numpy arrays in quite the same way. >Would I need to use the C level SDL calls as well for creating my >window? Or could I still use PyGame to do the setup for OpenGL called >from Pyrex? > > Generally speaking you should be able to mix and match with no special considerations, OpenGL is just a big state machine where you trigger events with method calls. HTH, Mike ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Mike C. F. <mcf...@ro...> - 2005-04-06 20:24:39
|
Erik Johnson wrote: ... >The profiler indicates that most of the program time is spent in my draw >function. Adjusting the number of draw calls I make supports this >conclusion. But Apple's OpenGL profiler tells me that only 10% of the >app time is spent in OpenGL. I think the Apple tool is lying to me. > > Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I *think* is around the age of a TNT)) was to max out around 1000 to 1500 textured, lit polygons/second (games use (fairly advanced) culling algorithms to reduce the number of polygons on-screen for any given rendering pass). Eliminating lighting should increase that to around 2 or 3000 polys, but that was with the old (small) textures that were heavily reduced (32x32 or 64x64). I wouldn't be surprised if you're running into texture bandwidth problems and maybe even simple fill-rate problems. You may find that the card is extremely sensitive to colour mode for its performance (IIRC switching a TNT to 16-bit mode would get close to doubling frame-rates on our VR system of the time). >The engine seems to be CPU bound not GPU bound. An 900mhz cpu with a >Geforce 3 runs the game much slower than a 1400mhz cpu with an >integrated S3 graphics chipset. > >Eliminating texture switches by grouping my textures onto a single large >texture has helped noticeably. > >Vertex arrays run slower than immediate mode. Because the sprites can >move every frame, the vertex array changes every frame. In my case >generating the array and calling glDrawArrays is slower than many >glBegin ... glEnd blocks. > > Just to be clear: * You *are* running those glBegin...glEnd blocks as display-lists, not immediate-mode calls, right? o You create a display list holding each sprite to draw (you may only need one, depends on the proportions of the sprites) o Sort the sprites by texture (and by potential overlap (virtual Z ordering)) o Load the texture o for (x,y,z),sprite in textureset: + glTranslated( x,y,z ) + glCallList( sprite ) * If your sprites don't actually move much you could put the glTranslate calls into the lists and then use glCallLists instead (assumes, however, that you don't have overlap) Python is much slower than equivalent C; to get decent performance you do need to use a mechanism that pushes the code down into C. I normally use array geometry myself, but then I normally do 3D work with game-like rendering loads. >I think part of my problem is that I want to draw many quads that are >constantly moving relative to each other. OpenGL performance tips I >have found are targeted at groups of polys that are relatively static. >Display lists and vertex arrays don't seem to fit with my sprite >approach. > > Display lists likely would help if you're currently drawing the polygons with run-time calls: create a single "sprite" display list for your standard sprite size, call that once for each sprite (after translate and texture load) to do the glBegin();glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glEnd(...) and you've just reduced the number of Python calls by a factor of ~10. That *should* have a significant effect on performance. >The other optimizations that I can think to try now are cutting out as >many glBegins and glEnds as possible, and do big groups of Vertex calls. >Or I can try rendering any sprite that won't move for a few frames to >the background, and work around the problem by cutting down on the >number of sprites I have on screen. > > Sounds like a lot of extra bitmap bandwidth (re-storing the background). You likely do *not* want to be doing Vertex calls directly from Python save to generate a display list (as noted above). Python just isn't the right tool for that kind of low-level operation, it has too much per-call overhead. If you do that kind of thing you should be using array geometry (and be sure you use exactly the correct array type for the data-type of the calls you're making to avoid extra copying). >I would love to hear opinions. Is Python just slow? Should I just give >up on older hardware? Does it sound like I am doing something wrong? >Are there other optimizations that people can suggest? > > Python is slower than C, but OpenGL has an enormous amount of room to play. Using higher-level features from the higher-level language can make the experience much more rewarding. HTH, Mike ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Erik J. <ejo...@fa...> - 2005-04-06 21:04:55
|
On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher" > Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I > *think* is around the age of a TNT)) was to max out around 1000 to 1500 > textured, lit polygons/second (games use (fairly advanced) culling > algorithms to reduce the number of polygons on-screen for any given > rendering pass). Eliminating lighting should increase that to around 2 > or 3000 polys, but that was with the old (small) textures that were > heavily reduced (32x32 or 64x64). I believe the Rage 128 is roughly equivalent to a TNT2. I am using mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting are what make me surprised to be maxing out at 250 polygons. > I wouldn't be surprised if you're running into texture bandwidth > problems and maybe even simple fill-rate problems. You may find that > the card is extremely sensitive to colour mode for its performance (IIRC > switching a TNT to 16-bit mode would get close to doubling frame-rates > on our VR system of the time). I have heard that older cards don't do 32-bit mode well. I will look into 16-bit. As I said in another message though, I can do 1400 sprites with a Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly faster than a Rage128. I don't think the video card is my limiting factor at the moment. CPU speed seems to make all the difference. > Just to be clear: > > * You *are* running those glBegin...glEnd blocks as display-lists, > not immediate-mode calls, right? > o You create a display list holding each sprite to draw (you > may only need one, depends on the proportions of the sprites) > o Sort the sprites by texture (and by potential overlap > (virtual Z ordering)) > o Load the texture > o for (x,y,z),sprite in textureset: > + glTranslated( x,y,z ) > + glCallList( sprite ) Interesting, I hadn't thought of this approach. > Python is much slower than equivalent C; to get decent performance you > do need to use a mechanism that pushes the code down into C. I normally > use array geometry myself, but then I normally do 3D work with game-like > rendering loads. I understand this principle, I just haven't found a good way to implement it with large numbers of polys that can move relative to each other every frame. > Display lists likely would help if you're currently drawing the polygons > with run-time calls: create a single "sprite" display list for your > standard sprite size, call that once for each sprite (after translate > and texture load) to do the > glBegin();glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glVertex(...);glTexCoord(...);glEnd(...) > and you've just reduced the number of Python calls by a factor of ~10. > That *should* have a significant effect on performance. Yes, I can see this approach helping. I think that it does conflict with my current scheme of grouping my small textures onto a big texture and never changing away from my big texture. With your approach, I would need to keep my small textures and do texture swaps, but I would greatly reduce my function call overhead. I guess this is where sorting by texture comes in. I will have to experiment with this and see how it affects performance. Would creating a unique display list for every sprite be a viable option? > >The other optimizations that I can think to try now are cutting out as > >many glBegins and glEnds as possible, and do big groups of Vertex calls. > >Or I can try rendering any sprite that won't move for a few frames to > >the background, and work around the problem by cutting down on the > >number of sprites I have on screen. > > > > > Sounds like a lot of extra bitmap bandwidth (re-storing the > background). I've been avoiding trying this approach for this exact reason. > You likely do *not* want to be doing Vertex calls directly > from Python save to generate a display list (as noted above). Python > just isn't the right tool for that kind of low-level operation, it has > too much per-call overhead. If you do that kind of thing you should be > using array geometry (and be sure you use exactly the correct array type > for the data-type of the calls you're making to avoid extra copying). The problem that I ran into with vertex arrays is that while a single call to drawarrays is faster than all the immediate mode calls, the overhead of building the needed arrays every frame ended up being too great and making things slower. I will try using display lists for individual quads, and hopefully that will help. > Python is slower than C, but OpenGL has an enormous amount of room to > play. Using higher-level features from the higher-level language can > make the experience much more rewarding. I get the impression that OpenGL can deliver all the speed I want, I just seem to be having problems unlocking that speed. Thanks for your help, Erik |
From: Mike C. F. <mcf...@ro...> - 2005-04-06 21:55:12
|
Erik Johnson wrote: >On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher" > > >>Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I >>*think* is around the age of a TNT)) was to max out around 1000 to 1500 >>textured, lit polygons/second (games use (fairly advanced) culling >>algorithms to reduce the number of polygons on-screen for any given >>rendering pass). Eliminating lighting should increase that to around 2 >>or 3000 polys, but that was with the old (small) textures that were >>heavily reduced (32x32 or 64x64). >> >> > >I believe the Rage 128 is roughly equivalent to a TNT2. I am using >mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting >are what make me surprised to be maxing out at 250 polygons. > > To be clear, that 1000 to 1500, was normally array or display-list geometry, *not* individual calls to glVertex (even in C). I'm talking about performance from SGI's (rather well tuned (for a general scenegraph engine)) CosmoPlayer scenegraph engine. With culling it would get you down to a few hundred polygons actually getting drawn for a frame (with some overdraw). Though, again, those were textured, lit and z-buffered, you'd expect a lot more for unlit and non-z-buffered, so 250 is probably very low. >As I said in another message though, I can do 1400 sprites with a >Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly >faster than a Rage128. I don't think the video card is my limiting >factor at the moment. CPU speed seems to make all the difference. > > A P4 also likely has a faster AGP and/or PCI bridge to the graphics card, but I'm not a hardware guy, so I can't really comment on their relative performance. Still, I think you're probably losing most of your time over in Python just now. >Yes, I can see this approach helping. I think that it does conflict >with my current scheme of grouping my small textures onto a big texture >and never changing away from my big texture. With your approach, I >would need to keep my small textures and do texture swaps, but I would >greatly reduce my function call overhead. I guess this is where sorting >by texture comes in. I will have to experiment with this and see how it >affects performance. > >Would creating a unique display list for every sprite be a viable >option? > > Yes, keeping in mind the memory overhead required. Older cards were extremely memory-limited. BTW, you are using Textures, not doing copys for each frame, right? Even if you've got more textures than card memory, letting OpenGL handle the back-and-forth swapping of textures is likely going to be better for performance than anything you're going to do. i.e. use glBindTexture and glGenTextures, not just bald glTexImage2D calls... hmm, you know, it's been so long I'm not even sure you *can* use bald glTexImage2D in OpenGL... I think you can because of the video-display cases... you'd have to be able to if my understanding is correct of common practice there... need to go back to doing raw OpenGL coding sometime soon :) . >>You likely do *not* want to be doing Vertex calls directly >>from Python save to generate a display list (as noted above). Python >>just isn't the right tool for that kind of low-level operation, it has >>too much per-call overhead. If you do that kind of thing you should be >>using array geometry (and be sure you use exactly the correct array type >>for the data-type of the calls you're making to avoid extra copying). >> >> > >The problem that I ran into with vertex arrays is that while a single >call to drawarrays is faster than all the immediate mode calls, the >overhead of building the needed arrays every frame ended up being too >great and making things slower. > >I will try using display lists for individual quads, and hopefully that >will help. > > Ah, there's a problem. You'd want to keep an array handy, with each sprite knowing it's index and updating the array directly, so a sprite's move command would look like: self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta (where delta would be a simple tuple), allowing the array to handle updates in Numpy code. You'd want to use the contiguous() function from PyOpenGL whenever you resize the array, hence the need for the getSpriteVectors level of indirection. Goal there is that you don't *build* the array for each frame (lots of memory copying), but just update it in-place. You have to watch out for rotation problems with that approach, however. Might want special code to watch for and fix skew when rotations are in play for a given sprite. You still pay the copy penalty for the array going over the bus to the card, but at least you're not allocating and de-allocating thousands of Python object references to rebuild the array each frame. Honestly, though, this kind of code gets messy fast enough that I'd avoid it until I'd exhausted the display-list approach. >>Python is slower than C, but OpenGL has an enormous amount of room to >>play. Using higher-level features from the higher-level language can >>make the experience much more rewarding. >> >> > >I get the impression that OpenGL can deliver all the speed I want, I >just seem to be having problems unlocking that speed. > > Good luck! Mike ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: <le...@st...> - 2005-04-06 22:27:17
|
My (much more limited than Mike's) performance experience with PyOpenGL agrees with what he was suggesting. I found using non-interleaved arrays with glDrawArrays was the fastest route from my Python code to the screen on a number of medium-old machines including my PowerBook G4. As Mike says, it's critical to build the array once; then just poke your updated X and Y value into it and then call .tostring() and glDrawArrays each frame. To make that more practical to keep a single array allocated, remember that you can ignore certain elements of the array just by moving them far offscreen and letting them be clipped by the hardware. Though counterintuitive from a software point of view, that sort of trick often works well with hardware. Also, while Python code is slower than native C code, that's usually one of the last things to fix. Modern processors (including even your 500MHz G3 ;-) can execute a lot of Python statements. If possible, try to learn how to use the array operations in Numeric to do parallel assignments -- that can often produce order-of-magnitude speedups. Anyway, that's my $0.02! Leo > Erik Johnson wrote: > >>On Wed, 06 Apr 2005 16:24:33 -0400, "Mike C. Fletcher" >> >> >>>Btw, IIRC, performance for graphics chips of that age (Rage 128 (which I >>>*think* is around the age of a TNT)) was to max out around 1000 to 1500 >>>textured, lit polygons/second (games use (fairly advanced) culling >>>algorithms to reduce the number of polygons on-screen for any given >>>rendering pass). Eliminating lighting should increase that to around 2 >>>or 3000 polys, but that was with the old (small) textures that were >>>heavily reduced (32x32 or 64x64). >>> >>> >> >>I believe the Rage 128 is roughly equivalent to a TNT2. I am using >>mostly 16 * 16 or 32 * 32 textures, and the numbers that you are quoting >>are what make me surprised to be maxing out at 250 polygons. >> >> > To be clear, that 1000 to 1500, was normally array or display-list > geometry, *not* individual calls to glVertex (even in C). I'm talking > about performance from SGI's (rather well tuned (for a general > scenegraph engine)) CosmoPlayer scenegraph engine. With culling it > would get you down to a few hundred polygons actually getting drawn for > a frame (with some overdraw). Though, again, those were textured, lit > and z-buffered, you'd expect a lot more for unlit and non-z-buffered, so > 250 is probably very low. > >>As I said in another message though, I can do 1400 sprites with a >>Pentium 4 @2.6ghz using a Matrox G400, which I believe is only slightly >>faster than a Rage128. I don't think the video card is my limiting >>factor at the moment. CPU speed seems to make all the difference. >> >> > A P4 also likely has a faster AGP and/or PCI bridge to the graphics > card, but I'm not a hardware guy, so I can't really comment on their > relative performance. Still, I think you're probably losing most of > your time over in Python just now. > >>Yes, I can see this approach helping. I think that it does conflict >>with my current scheme of grouping my small textures onto a big texture >>and never changing away from my big texture. With your approach, I >>would need to keep my small textures and do texture swaps, but I would >>greatly reduce my function call overhead. I guess this is where sorting >>by texture comes in. I will have to experiment with this and see how it >>affects performance. >> >>Would creating a unique display list for every sprite be a viable >>option? >> >> > Yes, keeping in mind the memory overhead required. Older cards were > extremely memory-limited. BTW, you are using Textures, not doing copys > for each frame, right? Even if you've got more textures than card > memory, letting OpenGL handle the back-and-forth swapping of textures is > likely going to be better for performance than anything you're going to > do. i.e. use glBindTexture and glGenTextures, not just bald > glTexImage2D calls... hmm, you know, it's been so long I'm not even sure > you *can* use bald glTexImage2D in OpenGL... I think you can because of > the video-display cases... you'd have to be able to if my understanding > is correct of common practice there... need to go back to doing raw > OpenGL coding sometime soon :) . > >>>You likely do *not* want to be doing Vertex calls directly >>>from Python save to generate a display list (as noted above). Python >>>just isn't the right tool for that kind of low-level operation, it has >>>too much per-call overhead. If you do that kind of thing you should be >>>using array geometry (and be sure you use exactly the correct array type >>>for the data-type of the calls you're making to avoid extra copying). >>> >>> >> >>The problem that I ran into with vertex arrays is that while a single >>call to drawarrays is faster than all the immediate mode calls, the >>overhead of building the needed arrays every frame ended up being too >>great and making things slower. >> >>I will try using display lists for individual quads, and hopefully that >>will help. >> >> > Ah, there's a problem. You'd want to keep an array handy, with each > sprite knowing it's index and updating the array directly, so a sprite's > move command would look like: > > self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta > > (where delta would be a simple tuple), allowing the array to handle > updates in Numpy code. > > You'd want to use the contiguous() function from PyOpenGL whenever you > resize the array, hence the need for the getSpriteVectors level of > indirection. Goal there is that you don't *build* the array for each > frame (lots of memory copying), but just update it in-place. You have > to watch out for rotation problems with that approach, however. Might > want special code to watch for and fix skew when rotations are in play > for a given sprite. You still pay the copy penalty for the array going > over the bus to the card, but at least you're not allocating and > de-allocating thousands of Python object references to rebuild the array > each frame. > > Honestly, though, this kind of code gets messy fast enough that I'd > avoid it until I'd exhausted the display-list approach. > >>>Python is slower than C, but OpenGL has an enormous amount of room to >>>play. Using higher-level features from the higher-level language can >>>make the experience much more rewarding. >>> >>> >> >>I get the impression that OpenGL can deliver all the speed I want, I >>just seem to be having problems unlocking that speed. >> >> > Good luck! > Mike > > ________________________________________________ > Mike C. Fletcher > Designer, VR Plumber, Coder > http://www.vrplumber.com > http://blog.vrplumber.com > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > PyOpenGL Homepage > http://pyopengl.sourceforge.net > _______________________________________________ > PyOpenGL-Users mailing list > PyO...@li... > https://lists.sourceforge.net/lists/listinfo/pyopengl-users > |
From: Mike C. F. <mcf...@ro...> - 2005-04-07 00:31:14
|
le...@st... wrote: ... >on a number of medium-old machines including my PowerBook G4. As Mike >says, it's critical to build the array once; then just poke your updated X >and Y value into it and then call .tostring() and glDrawArrays each frame. > > ... Just a minor performance hint: using tostring() is likely going to be slower than passing in the Numpy array directly to the appropriate function. If the array is a contiguous array of the correct datatype PyOpenGL can save a full-data-copy operation for the array. You can make an array contiguous by doing: myarray = contiguous( myarray ) Most arrays are already contiguous, btw, it's just resizes and the like that create non-contiguous arrays... The tostring method, by comparison always has to do the copy, since tostring is creating a copy of the array data in the Python string (and then throwing it away when finished). String-based array functions should likely be the method-of-last-resort for any performance-critical code. Have fun, Mike ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Erik J. <ejo...@fa...> - 2005-04-06 23:19:45
|
> >Would creating a unique display list for every sprite be a viable > >option? > > > > > Yes, keeping in mind the memory overhead required. Older cards were > extremely memory-limited. I'm thinking that 16mb vram is a reasonable minimum for anything that I would seriously consider playing games on, so I should be ok in this regard. I have small textures. > BTW, you are using Textures, not doing copys > for each frame, right? Even if you've got more textures than card > memory, letting OpenGL handle the back-and-forth swapping of textures is > likely going to be better for performance than anything you're going to > do. i.e. use glBindTexture and glGenTextures, not just bald > glTexImage2D calls... hmm, you know, it's been so long I'm not even sure > you *can* use bald glTexImage2D in OpenGL... I think you can because of > the video-display cases... you'd have to be able to if my understanding > is correct of common practice there... need to go back to doing raw > OpenGL coding sometime soon :) . Yes, I'm using real textures. I have read about the early pre-texture days of opengl, and it doesn't sound like fun. > >The problem that I ran into with vertex arrays is that while a single > >call to drawarrays is faster than all the immediate mode calls, the > >overhead of building the needed arrays every frame ended up being too > >great and making things slower. > > Ah, there's a problem. You'd want to keep an array handy, with each > sprite knowing it's index and updating the array directly, This is more sophisticated that what I have tired so far. I have taken a couple passes at vertex arrays. At first, I was just using list.append(vertex) to create a big python list every frame and passing this as a vertex array. Performance was slower than immediate mode, but not by much. Then I tried just allocating a big Numeric array of zeros, and using array[index] = vertex # 4 times per sprite to add sprites to the list as I needed them. At the end of the frame after drawing the array, I would reset the index to 0. With this second approach, drawing the array was very fast due to not having to do type conversions, but building the array in this manner was so inefficient that overall this approach was even slower than my first try. > so a sprite's > move command would look like: > > self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta > > (where delta would be a simple tuple), allowing the array to handle > updates in Numpy code. I might need to learn more about Numeric. I didn't know that you could do things like the above and have Numeric take care of it without Python doing a bunch of intermediate steps behind the scene. > You'd want to use the contiguous() function from PyOpenGL whenever you > resize the array, hence the need for the getSpriteVectors level of > indirection. Goal there is that you don't *build* the array for each > frame (lots of memory copying), but just update it in-place. I thought that contiguous() just checked if the array in question could be passed to opengl directly as a pointer instead of being copied. Am I missing something? > You have > to watch out for rotation problems with that approach, however. Might > want special code to watch for and fix skew when rotations are in play > for a given sprite. With my vertex array approaches I have just bypassed rotated sprites and used immediate mode to draw them. Most of my sprites aren't rotated. > Honestly, though, this kind of code gets messy fast enough that I'd > avoid it until I'd exhausted the display-list approach. I will take your advice and start with the display list approach. > Good luck! Thank you, and thanks again for you help. This has been very informative. Erik |
From: Rich D. <dr...@in...> - 2005-04-06 21:21:22
|
> > Python is slower than C, but OpenGL has an enormous amount of room to > > play. Using higher-level features from the higher-level language can > > make the experience much more rewarding. > > I get the impression that OpenGL can deliver all the speed I want, I > just seem to be having problems unlocking that speed. It may also be worth trying psyco to speed up the Python portion of your code (http://psyco.sf.net). psyco is an easy install and requires only a line or two at the top of your program to invoke it. Depending on your code, speedup can be dramatic. I suspect in your case it won't be dramatic, but it might be enough. The biggest win, in my experience, is putting everything into a display list but it sounds like that might not be possible for your application. Rich |
From: Erik J. <ejo...@fa...> - 2005-04-06 23:25:09
|
On Wed, 6 Apr 2005 14:21:11 -0700 (PDT), "Rich Drewes" said: > It may also be worth trying psyco to speed up the Python portion of your > code (http://psyco.sf.net). psyco is an easy install and requires only a > line or two at the top of your program to invoke it. Depending on your > code, speedup can be dramatic. I suspect in your case it won't be > dramatic, but it might be enough. Doing a blanket call to Psyco for my whole application gives me a 3-5% performance improvement, which isn't spectacular, but is nice to get for free. I haven't looked into had tuning things for further gains. The last time I checked, Psyco was x86 only. Unfortunately it doesn't help my iMac. This is kind of off topic for this list, but while we are talking about Psyco, I might as well ask. Does anyone know if you can use Psyco with py2exe? Thanks, Erik |
From: Mike C. F. <mcf...@ro...> - 2005-04-07 00:41:08
|
Erik Johnson wrote: ... >>so a sprite's >>move command would look like: >> >> self.getSpriteVectors()[self.startIndex:self.stopIndex] += delta >> >>(where delta would be a simple tuple), allowing the array to handle >>updates in Numpy code. >> >> > >I might need to learn more about Numeric. I didn't know that you could >do things like the above and have Numeric take care of it without Python >doing a bunch of intermediate steps behind the scene. > > They are fairly serious about allowing efficient number crunching :) . We just go along for the ride on the backs of their work :) . Particularly, the in-place modifiers are a big part of the Numpy interface. By the way, watch out for them too: In [6]:a = arange( 20 ) In [7]:b = a[5:8] In [8]:b += 5 In [9]:a Out[9]: array([ 0, 1, 2, 3, 4, 10, 11, 12, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) that's the same feature that allows the += delta, but it can bite you if you don't keep in mind that the sliced array is pointing to the same memory area as the original array. >I thought that contiguous() just checked if the array in question could >be passed to opengl directly as a pointer instead of being copied. Am I >missing something? > > It does the check, and if the array *isn't* contiguous, it copies it into a contiguous array and returns that. The effect is accomplished with a fast Numpy flag-check, so if the array is contiguous (the normal case), then there's little performance impact. OpenGLContext, for instance, has a flag (default on) that just runs contiguous on every array it's going to pass to PyOpenGL for IndexedFaceSets. Have fun, Mike ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Erik J. <ejo...@fa...> - 2005-04-07 16:31:13
|
Last night I removed the immediate mode calls in my code and replaced them with each sprite having it's own Display List. With little effort it yielded a very nice performance boost. My iMac went from being able to handle 250 sprites to working well with 370. I have a minor issue to sort out with scaling, but the impact on my code was very minimal. Interestingly, the P4 that used to do 1400 sprites can now do about 1520 at 30fps. I think this machine might actually be gpu bound, but it is funny to see the exact same increase between the two machines. Now I have to decide if I want to adapt my old vertex array code to the recommended method of keeping fixed indices for every object. The approach sounds like it could be very good for performance, but changing my code to use it would be a lot of work. I wonder if it would be enough of a win over display lists to be worth the effort. On Wed, 06 Apr 2005 20:41:00 -0400, "Mike C. Fletcher" said: > They are fairly serious about allowing efficient number crunching :) . > We just go along for the ride on the backs of their work :) . > Particularly, the in-place modifiers are a big part of the Numpy > interface. By the way, watch out for them too: > > In [6]:a = arange( 20 ) > > In [7]:b = a[5:8] > > In [8]:b += 5 > > In [9]:a > Out[9]: > array([ 0, 1, 2, 3, 4, 10, 11, 12, 8, 9, 10, 11, 12, 13, 14, > 15, 16, 17, 18, 19]) > > that's the same feature that allows the += delta, but it can bite you if > you don't keep in mind that the sliced array is pointing to the same > memory area as the original array. Very interesting. The Numeric modifiers are much more complex than I thought they were. Thanks for your help, this whole thread has been very informative. Erik |