Thread: [PyOpenGL-Users] Suspiciously slow script
Brought to you by:
mcfletch
From: Derakon <de...@gm...> - 2010-05-20 04:11:35
|
I'm looking into replacing the SDL-based rendering in my game with OpenGL-based rendering, so I downloaded the NeHe OpenGL port ( http://www.pygame.org/gamelets/games/nehe1-10.zip ) and started tweaking the lessons to suit my own purposes. I modified one script to draw a 20x20 array of textured quads, with the texture just being a 100x100 PNG with alpha. This script, which isn't doing anything special that I can see, is giving me a whopping 15FPS, which seems horribly slow to me. Unfortunately, cProfile isn't telling me anything useful (at least as far as I can tell). I don't suppose anyone here could take a look at the script and tell me if I've being boneheaded somehow? I've put it online here: http://derakon.dyndns.org/~chriswei/temp/gltest.py and the textures I'm using are here: http://derakon.dyndns.org/~chriswei/temp/allway Any suggestions would be appreciated. I'm a relative newbie when it comes to OpenGL. -Chris |
From: Ian M. <geo...@gm...> - 2010-05-20 05:51:10
|
Hi, Two things are making your code slow that I notice immediately: 1) You're using a Python loop to do 400 operations. That's not going to be terribly fast. 2) More importantly, you're using fixed functionality to draw 400 polygons. You can fix both problems by using a display list, vertex array, or vertex buffer object. I do not recommend the latter two, as they are more difficult (although more flexible, and also not technically deprecated). To use display list rendering, simply bracket the drawing code (that's everything including the texture binding, the glBegin(...), the loops, and the glEnd()) as follows: display_list = glGenLists(1) glNewList(display_list,GL_COMPILE) ... #draw your stuff here ... glEndList() ...and drop the whole thing *outside* of your main loop (put it with initialization). Then, to render the display list: glCallList(display_list) ...and your polygons will be magically be drawn. And much faster too! Another tip: disable vsync to get framerates faster than 60Hz. Simply do the following before creating the window: pygame.display.gl_set_attribute(GL_SWAP_CONTROL,0) Hope this helps, good luck, and welcome to PyOpenGL. Ian Mallett |
From: Derakon <de...@gm...> - 2010-05-21 04:59:20
|
Thanks for the advice. I gave it a shot, and while it's an improvement...it's one of 4FPS, from 15 to 19. So clearly something is still wrong. I've uploaded the new script here: http://derakon.dyndns.org/~chriswei/temp/gltest2.py I turned off the texture cycling because it was just distracting from the matter at hand, so the program now just creates the display list and then loops, drawing it, as the camera moves about (I note that my framerate is much better when the tiles are further away from the camera). Any other ideas? -Chris On Wed, May 19, 2010 at 10:51 PM, Ian Mallett <geo...@gm...> wrote: > Hi, > > Two things are making your code slow that I notice immediately: > 1) You're using a Python loop to do 400 operations. That's not going to be > terribly fast. > 2) More importantly, you're using fixed functionality to draw 400 polygons. > > You can fix both problems by using a display list, vertex array, or vertex > buffer object. I do not recommend the latter two, as they are more > difficult (although more flexible, and also not technically deprecated). > > To use display list rendering, simply bracket the drawing code (that's > everything including the texture binding, the glBegin(...), the loops, and > the glEnd()) as follows: > > display_list = glGenLists(1) > glNewList(display_list,GL_COMPILE) > ... > #draw your stuff here > ... > glEndList() > > ...and drop the whole thing outside of your main loop (put it with > initialization). Then, to render the display list: > glCallList(display_list) > > ...and your polygons will be magically be drawn. And much faster too! > > Another tip: disable vsync to get framerates faster than 60Hz. Simply do > the following before creating the window: > pygame.display.gl_set_attribute(GL_SWAP_CONTROL,0) > > Hope this helps, good luck, and welcome to PyOpenGL. > Ian Mallett > |
From: Ian M. <geo...@gm...> - 2010-05-21 15:21:32
|
Hi, Well, your texture loading isn't going to work properly. You need to generate texture IDs for each texture and then bind to that. Currently, you're loading all the image sequentially into the same texture (so the last image will be the one displayed, if it works at all. I've modified the code: -Without the texturing, the code runs 430 to 450 fps for me, which is about what it should be. -With texturing, the code runs at 150-190 fps, again, about what it should be. -After some deliberation, I've found what might be your problem: you're rebuilding the list every frame. "makeNewList(...)" should not be called inside your loop at all. Essentially, what display lists do is allocate memory for the geometry, transfer the geometry, then store everything as machine code for optimized delivery. This isn't exactly fast to do, and doing it every frame is going to be slower than just drawing the thing directly; it's designed to be fast later (when you call glCallList(...)). Bottom lines: texture ids, don't put building-display-lists-operations in the main loop. Ian |
From: Derakon <de...@gm...> - 2010-05-22 17:24:42
|
I'm including the mailing list again, because at this point it looks pretty clear that there's something wrong with my PyOpenGL install, and I have no idea how to figure out what it could be. To recap, we've tried display lists and VBOs now, and can verify that the same script running on my machine is slow, while running on Ian's machine it is fast. The script can be downloaded from here: http://derakon.dyndns.org/~chriswei/temp/test3.py The image being used as a texture is here: https://jetblade.googlecode.com/hg/data/sprites/terrain/jungle/grass/blocks/allway/1.png The task being performed (drawing 400 textured quads) does not require significant computing power, so hardware should not be an issue (I have a Radion X1600 with 256MB of RAM, which is entirely capable of playing modern games). My PyOpenGL install was created by downloading the PyOpenGL and PyOpenGL-accelerate packages from http://pyopengl.sourceforge.net/documentation/installation.html and doing "python setup.py install". The only problem I ran into there was that PyOpenGL-accelerate was trying to pass -Wno-long-doubles to gcc, which didn't recognize it as a valid commandline option. I told it to use gcc 4.0 instead of gcc 4.2 and it built without complaints. I thought perhaps the fact that I'm still using Python 2.5 could have been the problem, so I installed numpy/PyOpenGL/PyOpenGL_accelerate with Python 2.6 using those same downloads, and that's also slow. I thought maybe the easy_install instructions could generate a different install, so I tried those, and it's still slow. If I remove the GL_TEXTURE_MIN_FILTER line then I get an absurdly fast (830FPS) set of white rectangles. As I understand it, doing this causes OpenGL to assume that I'm going to provide mipmaps for the texture, and since I don't it defaults to white. Which is, apparently, much easier to draw than the textured quads. If I switch to RGB instead of RGBA and turn off blending, then it's still slow. Switching from an 800x600 window to a 400x300 window gets me 104FPS; likewise, switching to a 1600x1200 window gets me 3FPS. Here's the output of running PyOpenGL's performance test (from tests/performance.py); I have no idea how to interpret it. Count: 256 Total Time for 100 iterations: 0.00929379463196 MTri/s: 2.75452611272 Count: 512 Total Time for 100 iterations: 0.00796604156494 MTri/s: 6.42728255717 Count: 1024 Total Time for 100 iterations: 0.00892996788025 MTri/s: 11.4670065305 Count: 2048 Total Time for 100 iterations: 0.0114290714264 MTri/s: 17.9192160377 Count: 4096 Total Time for 100 iterations: 0.0237309932709 MTri/s: 17.2601287828 Count: 8192 Total Time for 100 iterations: 0.0263659954071 MTri/s: 31.070323246 Count: 16384 Total Time for 100 iterations: 0.0477550029755 MTri/s: 34.3084472394 Count: 32768 Total Time for 100 iterations: 0.0931870937347 MTri/s: 35.1636677213 Count: 65536 Total Time for 100 iterations: 0.188548088074 MTri/s: 34.758241608 Count: 131072 Total Time for 100 iterations: 0.3538210392 MTri/s: 37.0447162488 Count: 262144 Total Time for 100 iterations: 0.695466995239 MTri/s: 37.6932337256 Any ideas? Any additional information I could provide? I'd love to get this sorted out, as there are things I want to do in this project that SDL really isn't capable of doing in a remotely timely manner. Thanks in advance! -Chris On Sat, May 22, 2010 at 8:44 AM, Ian Mallett <geo...@gm...> wrote: > Hi, > > Well, modern games don't always use display lists. Although display lists > are easy, they're not *technically* allowed. They're depreciated, but we > wouldn't be expecting computers to be losing support for them for at least > another 5 to 10 years. Maybe ATI is jumping ahead, just to be annoying. > > VBOs and vertex arrays are supposed to be supported everywhere. For your > convenience, attached is my VBO version of your code (requires NumPy). If > anything should make it fast, this should. > > Ian > |
From: Mike C. F. <mcf...@vr...> - 2010-05-22 19:05:07
|
On 10-05-22 01:24 PM, Derakon wrote: > I'm including the mailing list again, because at this point it looks > pretty clear that there's something wrong with my PyOpenGL install, > and I have no idea how to figure out what it could be. I'd tend to agree that *something* is wrong with your installation, either PyOpenGL or the OpenGL driver. My laptop gets 1000+fps on the test3 script on a Radeon Mobile HD 3650 under Kubuntu Lucid (64-bit) using bzr head of PyOpenGL on Python 2.6.5. The test2 script gets 990+fps on the same machine. Your hardware is a generation older than mine, with ~= pixel-fill bandwidth and ~2/4 texture-fill bandwidth, so we'd expect to see around 500fps for the same texture-fill-rate-limited code. You're 30x slower than that, so yeah, something isn't configured properly. > My PyOpenGL install was created by downloading the PyOpenGL and > PyOpenGL-accelerate packages from > http://pyopengl.sourceforge.net/documentation/installation.html and > doing "python setup.py install". The only problem I ran into there was > that PyOpenGL-accelerate was trying to pass -Wno-long-doubles to gcc, > which didn't recognize it as a valid commandline option. I told it to > use gcc 4.0 instead of gcc 4.2 and it built without complaints. > Hmm, sounds like Cython's distutils extension might need to be updated on that system. > If I remove the GL_TEXTURE_MIN_FILTER line then I get an absurdly fast > (830FPS) set of white rectangles. As I understand it, doing this > causes OpenGL to assume that I'm going to provide mipmaps for the > texture, and since I don't it defaults to white. Which is, apparently, > much easier to draw than the textured quads. > It is certainly much easier, even software rendering could handle that without blinking (which I'm guessing is what's happening with your system). > If I switch to RGB instead of RGBA and turn off blending, then it's still slow. > > Switching from an 800x600 window to a 400x300 window gets me 104FPS; > likewise, switching to a 1600x1200 window gets me 3FPS. > > Here's the output of running PyOpenGL's performance test (from > tests/performance.py); I have no idea how to interpret it. > You expect results on the order of a handful of mega-triangles per second on reasonable hardware with the middle array sizes (basically there's a sweet-spot where you're maxing out your hardware's capabilities per-call), on my machine the values for 16,000 and 32,000 are about 9 MTris. mcfletch@sturm:~/OpenGL-dev/OpenGL-ctypes/tests$ python performance.py Count: 256 Total Time for 100 iterations: 0.0823800563812 MTri/s: 0.310754824949 Count: 512 Total Time for 100 iterations: 0.0618591308594 MTri/s: 0.82768702516 Count: 1024 Total Time for 100 iterations: 0.0667719841003 MTri/s: 1.53357731359 Count: 2048 Total Time for 100 iterations: 0.0698039531708 MTri/s: 2.933931256 Count: 4096 Total Time for 100 iterations: 0.0796709060669 MTri/s: 5.14114901186 Count: 8192 Total Time for 100 iterations: 0.113540887833 MTri/s: 7.21502196819 Count: 16384 Total Time for 100 iterations: 0.1773250103 MTri/s: 9.23953139623 Count: 32768 Total Time for 100 iterations: 0.360249996185 MTri/s: 9.09590571741 Count: 65536 Total Time for 100 iterations: 0.921277046204 MTri/s: 7.1136039121 Count: 131072 Total Time for 100 iterations: 2.15656399727 MTri/s: 6.07781638597 Count: 262144 Total Time for 100 iterations: 5.70130205154 MTri/s: 4.59796722977 The values you are seeing are extremely small, and would indicate that your hardware isn't being used properly. This may be a PyOpenGL issue, given that you are seeing such slow performance in all PyOpenGL scripts tested so far, but I don't intuitively see what it would be. To start debugging: * does disabling OpenGL_accelerate change your performance (on my machine there is no difference, which suggests that OpenGL_accelerate isn't likely to be your problem) o import OpenGL OpenGL.USE_ACCELERATE = False * confirm that your machine is using direct rendering (i.e. actually using your hardware driver, not a software renderer) o on Linux: glxinfo | grep direct * confirm that non-Python OpenGL programs are *currently* running reasonably well on this machine * confirm that you are not using an OpenGL compositing desktop (e.g. compiz on Linux) which may cause indirect rendering of OpenGL windows * confirm that you do not have system-level anti-aliasing settings enabled (i.e. a 4x or 8x antialiasing specified in ATIs control panel) * try generating mipmaps and using mipmap-nearest (just for kicks) Realize that isn't all that much help, but this is looking like a system/config issue. Good luck, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Derakon <de...@gm...> - 2010-05-22 20:00:22
|
Responses inline. On Sat, May 22, 2010 at 12:04 PM, Mike C. Fletcher <mcf...@vr...> wrote: > > To start debugging: > > does disabling OpenGL_accelerate change your performance (on my machine > there is no difference, which suggests that OpenGL_accelerate isn't likely > to be your problem) > > import OpenGL > OpenGL.USE_ACCELERATE = False > Tried this; no change. > confirm that your machine is using direct rendering (i.e. actually using > your hardware driver, not a software renderer) > > on Linux: glxinfo | grep direct > No idea how to do this on an OSX box, but given that I'm using a card that shipped with the box, and that games do work properly, I'd be extremely surprised if I were using software rendering. > confirm that non-Python OpenGL programs are *currently* running reasonably > well on this machine They are. I've been playing Torchlight all last week; I assume it's OpenGL because what else would it be on a Mac? DirectX is out of the question and I'm not aware of any other graphics libraries that would work. > confirm that you are not using an OpenGL compositing desktop (e.g. compiz on > Linux) which may cause indirect rendering of OpenGL windows Again, not certain how to do this; however, I tested the script in OSX's built-in X11 system, which (I'm fairly certain) skips most of the pretty-ifying steps that the window manager normally does, and it's still slow. > confirm that you do not have system-level anti-aliasing settings enabled > (i.e. a 4x or 8x antialiasing specified in ATIs control panel) No ATI control panel, but again, something like this would affect games. > try generating mipmaps and using mipmap-nearest (just for kicks) Okay, I replaced the glTexImage2D in the script with this: GLU.gluBuild2DMipmaps(GL.GL_TEXTURE_2D, GL.GL_RGBA, surface.get_width(), surface.get_height(), GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, textureData) and replaced the GL_TEXTURE_MIN_FILTER line with this: GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR_MIPMAP_LINEAR) and now I get 333FPS! What the heck? > > Realize that isn't all that much help, but this is looking like a > system/config issue. Good luck, > Mike > > -- > ________________________________________________ > Mike C. Fletcher > Designer, VR Plumber, Coder > http://www.vrplumber.com > http://blog.vrplumber.com > > ------------------------------------------------------------------------------ > > > _______________________________________________ > PyOpenGL Homepage > http://pyopengl.sourceforge.net > _______________________________________________ > PyOpenGL-Users mailing list > PyO...@li... > https://lists.sourceforge.net/lists/listinfo/pyopengl-users > > |
From: Mike C. F. <mcf...@vr...> - 2010-05-24 13:55:38
|
On 10-05-22 04:00 PM, Derakon wrote: > Responses inline. > > On Sat, May 22, 2010 at 12:04 PM, Mike C. Fletcher > <mcf...@vr...> wrote: > >> To start debugging: >> >> does disabling OpenGL_accelerate change your performance (on my machine >> there is no difference, which suggests that OpenGL_accelerate isn't likely >> to be your problem) >> >> import OpenGL >> OpenGL.USE_ACCELERATE = False >> >> > Tried this; no change. > Okay, so not likely an issue with OpenGL_accelerate (good). ... > No idea how to do this on an OSX box, but given that I'm using a card > that shipped with the box, and that games do work properly, I'd be > extremely surprised if I were using software rendering. > Good surmise, I hadn't realized you were on OSX, that should always have DRI available. ... > They are. I've been playing Torchlight all last week; I assume it's > OpenGL because what else would it be on a Mac? DirectX is out of the > question and I'm not aware of any other graphics libraries that would > work. > Yup, it would have to be OpenGL AFAIU. >> confirm that you are not using an OpenGL compositing desktop (e.g. compiz on >> Linux) which may cause indirect rendering of OpenGL windows >> > Again, not certain how to do this; however, I tested the script in > OSX's built-in X11 system, which (I'm fairly certain) skips most of > the pretty-ifying steps that the window manager normally does, and > it's still slow. > OSX has compositing by default, but it should work properly (whereas there's some situations on Compiz (Linux) that cause issues). > No ATI control panel, but again, something like this would affect games. Yup. >> try generating mipmaps and using mipmap-nearest (just for kicks) >> > Okay, I replaced the glTexImage2D in the script with this: > > GLU.gluBuild2DMipmaps(GL.GL_TEXTURE_2D, GL.GL_RGBA, surface.get_width(), > surface.get_height(), GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, textureData) > > and replaced the GL_TEXTURE_MIN_FILTER line with this: > > GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, > GL.GL_LINEAR_MIPMAP_LINEAR) > > and now I get 333FPS! What the heck? > That's what I was half expecting. OpenGL drivers tend to be optimized along certain (common) paths, use of MipMaps is pretty much universal, so they will be very fast. You're scaling the view constantly (IIRC) so you're going to have every pixel doing sampling, and it looks like on your driver the linear bitmap sampler is doing something non-optimal when it's sampling a (large) texture down across a large scale difference. With the MipMap, the textures being sampled are much smaller. You could ask on the OpenGL.org forums and likely get a definitive answer as to why this particular operation is slow. I normally chalk it up to the old "do what everyone else does and you'll be fast" rule of thumb and move on in my code. Still a surprisingly low MTri on the performance test. That is, however, likely a different issue from the texture-fill one. Enjoy, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Greg E. <gre...@ca...> - 2010-05-24 23:57:10
|
Mike C. Fletcher wrote: > it looks like on > your driver the linear bitmap sampler is doing something non-optimal > when it's sampling a (large) texture down across a large scale > difference. I would expect this to be slow using any driver. When there is a large scale reduction, each pixel on the screen projects onto a big block of texels in the texture. Doing linear sampling on that requires scanning all of those texels and averaging them together. Using mipmaps, on the other hand, it's never necessary to average more than four texels (or possibly eight, if you're also interpolating between mipmap levels) for each screen pixel. Moral: Mipmaps are good. Use them! -- Greg |
From: Ian M. <geo...@gm...> - 2010-05-25 00:38:29
|
Hi, I was under the impression that (in the absence of bilinear or trilinear filtering) each pixel simply maps to a single texel, no matter how far away you are. This is what causes small (in screenspace) polygons that use a large range of texture coordinates to look static-y as they move. Mipmaps address this by successively averaging the adjacent texels down until you get a teensy texture image (when this is sampled, effectively, the hardware is reading an average of all the texels instead of one texel more-or-less at random from the original image). The (speed) advantage of mipmapping is that the texture data that's being sampled can be smaller, so the hardware can find the proper values more efficiently. Of course, in theory, mipmaps ought to be slowest. Bilinear filtering requires 4 samples (hardware-controlled samples, but four nonetheless). Trilinear filtering of course uses 8, and unfortunately, texture samples are one of the slowest processes on any graphics card. A quick benchmark on my computer confirms all this to be true. 'Course, there may be something about well-defined graphics paths on other computers that I don't know about . . . And I agree--mipmaps are great. Use 'em anyway. Ian |