Thread: [PyOpenGL-Users] Rolling spectrogram with pyopengl
Brought to you by:
mcfletch
From: Timothée L. <tim...@lp...> - 2009-12-03 14:21:50
|
Dear pyopengl users, I am writing (as a hobby) an application that does real-time visualization of audio data. The main widget is a rolling spectrogram, that is a colored image where the horizontal axis is the time, the vertical axis is the frequency, and the color of each pixel represents the intensity of the corresponding spectrum component (see http://www.flickr.com/photos/41584197@N03/3832486029/in/set-72157622072708326/ for example). Each column of the image is computed with a FFT of the audio data every 20 ms or so, and the whole image is displayed on screen and "rolls" and time goes by. I am trying to use OpenGL to improve the performance of the part of the application that displays the image on the screen, but I can't manage to get it really faster than a simple 2D blitting, so I am asking for your help ! Currently, I first set up a texture with the following : # Create Texture GL.glGenTextures(1, self.texture) # generate one texture name GL.glBindTexture(GL.GL_TEXTURE_2D, self.texture) # bind a 2d texture to the generated name GL.glPixelStorei(GL.GL_UNPACK_ALIGNMENT, 1) GL.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, 2*self.canvas_width, height, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, None) GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_S, GL.GL_CLAMP) GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_T, GL.GL_CLAMP) GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_S, GL.GL_REPEAT) GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_T, GL.GL_REPEAT) GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MAG_FILTER, GL.GL_NEAREST) GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, GL.GL_NEAREST) GL.glTexEnvf(GL.GL_TEXTURE_ENV, GL.GL_TEXTURE_ENV_MODE, GL.GL_DECAL) Then, every 20 ms or so, I modify two columns of that texture with : GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset, 0, 1, self.height, GL.GL_BGRA, GL.GL_UNSIGNED_BYTE, byteString) GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset + self.canvas_width, 0, 1, self.height, GL.GL_BGRA, GL.GL_UNSIGNED_BYTE, byteString) And I draw part of that texture to my widget with : GL.glLoadIdentity() GL.glBegin(GL.GL_QUADS) xoff = float(self.offset)/(2*self.canvas_width) GL.glTexCoord2f(xoff, 0.) GL.glVertex2f(0, 0) GL.glTexCoord2f(1.+xoff , 0.) GL.glVertex2f(2*self.canvas_width, 0) GL.glTexCoord2f(1.+xoff, 1.) GL.glVertex2f(2*self.canvas_width, self.height) GL.glTexCoord2f(xoff, 1.) GL.glVertex2f(0, self.height) GL.glEnd() Profiling shows that the two GL.glTexSubImage2D take a lot of time (more than twice as much as the whole drawing part) whereas it's "just" two single columns of the texture being updated... What can I do to optimize this ? Is there a smarter way to achieve the same result ? Thanks for your help ! Timothée Lecomte |
From: Gijs <in...@bs...> - 2009-12-03 16:10:36
|
Hello Timothée, You could try to use two textures instead of one. When you display one, you can write to the other and swap them and continue (also called the "ping-pong" technique). Another possibility is to use shaders to change your texture, but I'm not too sure if that is any faster. Also, since Python is actually quite slow when compared to languages like C, even OpenGL slows down considerably when used in Python. So if possible, push as much commands to display lists, vertex arrays, or VBOs. You can push all commands you use to draw the quad to a display list, which basically brings down the number of calls to one (since you only need to call the display list). Regards, Gijs PS: While it is not necessary, I would supply the glTexImage2D call, that you use to create the texture, an array with zeros the size of the texture. This way you know that the texture is zero everywhere. On 3-12-2009 15:19, Timothée Lecomte wrote: > Dear pyopengl users, > > I am writing (as a hobby) an application that does real-time > visualization of audio data. The main widget is a rolling spectrogram, > that is a colored image where the horizontal axis is the time, the > vertical axis is the frequency, and the color of each pixel represents > the intensity of the corresponding spectrum component (see > http://www.flickr.com/photos/41584197@N03/3832486029/in/set-72157622072708326/ > for example). Each column of the image is computed with a FFT of the > audio data every 20 ms or so, and the whole image is displayed on screen > and "rolls" and time goes by. > > I am trying to use OpenGL to improve the performance of the part of the > application that displays the image on the screen, but I can't manage to > get it really faster than a simple 2D blitting, so I am asking for your > help ! > > Currently, I first set up a texture with the following : > > # Create Texture > GL.glGenTextures(1, self.texture) # generate one texture name > GL.glBindTexture(GL.GL_TEXTURE_2D, self.texture) # bind a 2d > texture to the generated name > GL.glPixelStorei(GL.GL_UNPACK_ALIGNMENT, 1) > GL.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, > 2*self.canvas_width, height, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, None) > GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_S, > GL.GL_CLAMP) > GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_T, > GL.GL_CLAMP) > GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_S, > GL.GL_REPEAT) > GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_T, > GL.GL_REPEAT) > GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MAG_FILTER, > GL.GL_NEAREST) > GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, > GL.GL_NEAREST) > GL.glTexEnvf(GL.GL_TEXTURE_ENV, GL.GL_TEXTURE_ENV_MODE, GL.GL_DECAL) > > Then, every 20 ms or so, I modify two columns of that texture with : > GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset, 0, 1, > self.height, GL.GL_BGRA, GL.GL_UNSIGNED_BYTE, byteString) > GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset + > self.canvas_width, 0, 1, self.height, GL.GL_BGRA, GL.GL_UNSIGNED_BYTE, > byteString) > > And I draw part of that texture to my widget with : > GL.glLoadIdentity() > GL.glBegin(GL.GL_QUADS) > xoff = float(self.offset)/(2*self.canvas_width) > GL.glTexCoord2f(xoff, 0.) > GL.glVertex2f(0, 0) > GL.glTexCoord2f(1.+xoff , 0.) > GL.glVertex2f(2*self.canvas_width, 0) > GL.glTexCoord2f(1.+xoff, 1.) > GL.glVertex2f(2*self.canvas_width, self.height) > GL.glTexCoord2f(xoff, 1.) > GL.glVertex2f(0, self.height) > GL.glEnd() > > Profiling shows that the two GL.glTexSubImage2D take a lot of time (more > than twice as much as the whole drawing part) whereas it's "just" two > single columns of the texture being updated... What can I do to optimize > this ? Is there a smarter way to achieve the same result ? > > Thanks for your help ! > > Timothée Lecomte > > > ------------------------------------------------------------------------------ > Join us December 9, 2009 for the Red Hat Virtual Experience, > a free event focused on virtualization and cloud computing. > Attend in-depth sessions from your desk. Your couch. Anywhere. > http://p.sf.net/sfu/redhat-sfdev2dev > _______________________________________________ > PyOpenGL Homepage > http://pyopengl.sourceforge.net > _______________________________________________ > PyOpenGL-Users mailing list > PyO...@li... > https://lists.sourceforge.net/lists/listinfo/pyopengl-users > |
From: Ian M. <geo...@gm...> - 2009-12-03 23:38:19
|
Hi, I would definitely use a shader. You can render the screen to a texture every frame, then simply draw the texture offset by one texel in the shader. The new data can just be added on. You could use FBOs for even greater speed, although you'd need ping-ponging. You could also simply draw the texture offset by one, which I think is what you're doing. It's probably better to use glCopyTexSubImage2D or something. What sort of speed do you need? All of these should run at least as fast as the refresh rate . . . Ian |
From: Timothée L. <tim...@lp...> - 2009-12-06 12:26:40
|
Le 4 déc. 09 à 00:38, Ian Mallett a écrit : > Hi, > > I would definitely use a shader. You can render the screen to a > texture every frame, then simply draw the texture offset by one > texel in the shader. The new data can just be added on. You could > use FBOs for even greater speed, although you'd need ping-ponging. > > You could also simply draw the texture offset by one, which I think > is what you're doing. It's probably better to use > glCopyTexSubImage2D or something. > > What sort of speed do you need? All of these should run at least as > fast as the refresh rate . . . > > Ian Hi Ian, I precisely need it to run as fast as the refresh rate, and it already is running fast enough, either with opengl or with pure 2D. However, it burns too much CPU to my taste, since the drawing part takes more time than the processing part, although the latter is quite heavy (FFT...), so I think there is room for large improvements. As you rightly say, I am drawing the texture off by one. As far as alternatives to pure glTexSubImage2D, I have considered and tried to use PBO, but it does not decrease the time needed by the two glTexSubImage2D calls. I don't really see how I could use glCopyTexSubImage2D instead of glTexSubImage2D. Finally, if I use a shader, how can the two columns of data can be transferred to it ? Will it be faster than a call to glTexSubImage2D, with or without PBO ? Thanks for your kind help, Timothée |
From: Timothée L. <tim...@lp...> - 2009-12-06 12:35:51
|
Le 3 déc. 09 à 17:10, Gijs a écrit : > Hello Timothée, > > You could try to use two textures instead of one. When you display > one, you can write to the other and swap them and continue (also > called the "ping-pong" technique). Another possibility is to use > shaders to change your texture, but I'm not too sure if that is any > faster. Also, since Python is actually quite slow when compared to > languages like C, even OpenGL slows down considerably when used in > Python. So if possible, push as much commands to display lists, > vertex arrays, or VBOs. You can push all commands you use to draw > the quad to a display list, which basically brings down the number > of calls to one (since you only need to call the display list). > Hello Gijs, I am very new to both OpenGL and pyopengl worlds, but I'll try to answer to your suggestions : Since I'm doing things synchronously in one thread, I don't see how using two textures could make the whole thing faster. Moreover, there's one single bottleneck which is the writing of the two columns of the texture (that's two times approx. 500 pixels every 20 ms, with the glTexSubImage2D calls). The display in itself appears much below in the profile, so converting it to a display list is not my first priority. > Regards, Gijs > > PS: While it is not necessary, I would supply the glTexImage2D call, > that you use to create the texture, an array with zeros the size of > the texture. This way you know that the texture is zero everywhere. Right, that would make the initialization more deterministic :) Thanks for your comments ! Timothée > > On 3-12-2009 15:19, Timothée Lecomte wrote: >> Dear pyopengl users, >> >> I am writing (as a hobby) an application that does real-time >> visualization of audio data. The main widget is a rolling >> spectrogram, >> that is a colored image where the horizontal axis is the time, the >> vertical axis is the frequency, and the color of each pixel >> represents >> the intensity of the corresponding spectrum component (see >> http://www.flickr.com/photos/41584197@N03/3832486029/in/set-72157622072708326/ >> for example). Each column of the image is computed with a FFT of the >> audio data every 20 ms or so, and the whole image is displayed on >> screen >> and "rolls" and time goes by. >> >> I am trying to use OpenGL to improve the performance of the part of >> the >> application that displays the image on the screen, but I can't >> manage to >> get it really faster than a simple 2D blitting, so I am asking for >> your >> help ! >> >> Currently, I first set up a texture with the following : >> >> # Create Texture >> GL.glGenTextures(1, self.texture) # generate one texture name >> GL.glBindTexture(GL.GL_TEXTURE_2D, self.texture) # bind a 2d >> texture to the generated name >> GL.glPixelStorei(GL.GL_UNPACK_ALIGNMENT, 1) >> GL.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, >> 2*self.canvas_width, height, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, >> None) >> GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_S, >> GL.GL_CLAMP) >> GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_T, >> GL.GL_CLAMP) >> GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_S, >> GL.GL_REPEAT) >> GL.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_WRAP_T, >> GL.GL_REPEAT) >> GL.glTexParameterf(GL.GL_TEXTURE_2D, >> GL.GL_TEXTURE_MAG_FILTER, >> GL.GL_NEAREST) >> GL.glTexParameterf(GL.GL_TEXTURE_2D, >> GL.GL_TEXTURE_MIN_FILTER, >> GL.GL_NEAREST) >> GL.glTexEnvf(GL.GL_TEXTURE_ENV, GL.GL_TEXTURE_ENV_MODE, >> GL.GL_DECAL) >> >> Then, every 20 ms or so, I modify two columns of that texture with : >> GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset, 0, 1, >> self.height, GL.GL_BGRA, GL.GL_UNSIGNED_BYTE, byteString) >> GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset + >> self.canvas_width, 0, 1, self.height, GL.GL_BGRA, >> GL.GL_UNSIGNED_BYTE, >> byteString) >> >> And I draw part of that texture to my widget with : >> GL.glLoadIdentity() >> GL.glBegin(GL.GL_QUADS) >> xoff = float(self.offset)/(2*self.canvas_width) >> GL.glTexCoord2f(xoff, 0.) >> GL.glVertex2f(0, 0) >> GL.glTexCoord2f(1.+xoff , 0.) >> GL.glVertex2f(2*self.canvas_width, 0) >> GL.glTexCoord2f(1.+xoff, 1.) >> GL.glVertex2f(2*self.canvas_width, self.height) >> GL.glTexCoord2f(xoff, 1.) >> GL.glVertex2f(0, self.height) >> GL.glEnd() >> >> Profiling shows that the two GL.glTexSubImage2D take a lot of time >> (more >> than twice as much as the whole drawing part) whereas it's "just" two >> single columns of the texture being updated... What can I do to >> optimize >> this ? Is there a smarter way to achieve the same result ? >> >> Thanks for your help ! >> >> Timothée Lecomte >> >> >> ------------------------------------------------------------------------------ >> Join us December 9, 2009 for the Red Hat Virtual Experience, >> a free event focused on virtualization and cloud computing. >> Attend in-depth sessions from your desk. Your couch. Anywhere. >> http://p.sf.net/sfu/redhat-sfdev2dev >> _______________________________________________ >> PyOpenGL Homepage >> http://pyopengl.sourceforge.net >> _______________________________________________ >> PyOpenGL-Users mailing list >> PyO...@li... >> https://lists.sourceforge.net/lists/listinfo/pyopengl-users >> |
From: Gijs <in...@bs...> - 2009-12-06 14:54:19
|
On 6-12-2009 12:58, Timothée Lecomte wrote: > > Le 3 déc. 09 à 17:10, Gijs a écrit : > >> Hello Timothée, >> >> You could try to use two textures instead of one. When you display >> one, you can write to the other and swap them and continue (also >> called the "ping-pong" technique). Another possibility is to use >> shaders to change your texture, but I'm not too sure if that is any >> faster. Also, since Python is actually quite slow when compared to >> languages like C, even OpenGL slows down considerably when used in >> Python. So if possible, push as much commands to display lists, >> vertex arrays, or VBOs. You can push all commands you use to draw the >> quad to a display list, which basically brings down the number of >> calls to one (since you only need to call the display list). >> > > Hello Gijs, > > I am very new to both OpenGL and pyopengl worlds, but I'll try to > answer to your suggestions : > Since I'm doing things synchronously in one thread, I don't see how > using two textures could make the whole thing faster. > Moreover, there's one single bottleneck which is the writing of the > two columns of the texture (that's two times approx. 500 pixels every > 20 ms, with the glTexSubImage2D calls). The display in itself appears > much below in the profile, so converting it to a display list is not > my first priority. > Even if it looks to be executing synchronously, underneath, the GPU actually performs commands asynchronously. You provide the GPU with a whole bunch of commands and the GPU itself decides when and how it will execute the commands. For certain commands like glReadPixels, the CPU waits for the GPU to finish before it continues to read the texture. For optimal GPU performance, you would want as little of those commands as possible. This is the very reason why ping-ponging gives you a performance boost. The GPU knows that you don't access texture A for instance, while you are writing to texture B, so it can schedule the execution of the commands more efficiently (that's the basic idea anyway). However, after reading the reply of Ian Mallett, I would also suggest using a shader. As Ian said, you can use shaders and render-to-texture techniques with FBOs (that's something different than PBOs) to achieve high performance. You can supply the shader with the old data and the new data in the form of textures. You offset everything by one texel, and for the last column of pixels you use the new data. You could use PBOs to transfer the new data to the input texture which should make it a little bit faster, if properly used. For some good tutorials regarding FBOs, PBOs and other GPGPU stuff, I'd suggest the site of Dominik Göddeke (http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial.html <http://www.mathematik.uni-dortmund.de/%7Egoeddeke/gpgpu/tutorial.html>). Even though it's C code, most of the code is practically the same in PyOpenGL. Hope this all makes sense :) Regards, Gijs |
From: Timothée L. <tim...@lp...> - 2009-12-07 08:18:48
|
Le 6 déc. 09 à 15:54, Gijs a écrit : >> Since I'm doing things synchronously in one thread, I don't see how >> using two textures could make the whole thing faster. >> Moreover, there's one single bottleneck which is the writing of the >> two columns of the texture (that's two times approx. 500 pixels >> every 20 ms, with the glTexSubImage2D calls). The display in itself >> appears much below in the profile, so converting it to a display >> list is not my first priority. >> > Even if it looks to be executing synchronously, underneath, the GPU > actually performs commands asynchronously. You provide the GPU with > a whole bunch of commands and the GPU itself decides when and how it > will execute the commands. For certain commands like glReadPixels, > the CPU waits for the GPU to finish before it continues to read the > texture. For optimal GPU performance, you would want as little of > those commands as possible. This is the very reason why ping-ponging > gives you a performance boost. The GPU knows that you don't access > texture A for instance, while you are writing to texture B, so it > can schedule the execution of the commands more efficiently (that's > the basic idea anyway). That's a very interesting candidate to explain why those cheap glTexSubImage2D calls take so much time. So maybe the GPU is still busy drawing to the screen when I'm going from one iteration to the next... I'll take a look at ping-ponging. Thanks. > > However, after reading the reply of Ian Mallett, I would also > suggest using a shader. As Ian said, you can use shaders and render- > to-texture techniques with FBOs (that's something different than > PBOs) to achieve high performance. You can supply the shader with > the old data and the new data in the form of textures. You offset > everything by one texel, and for the last column of pixels you use > the new data. You could use PBOs to transfer the new data to the > input texture which should make it a little bit faster, if properly > used. For some good tutorials regarding FBOs, PBOs and other GPGPU > stuff, I'd suggest the site of Dominik Göddeke (http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial.html > <http://www.mathematik.uni-dortmund.de/%7Egoeddeke/gpgpu/tutorial.html > >). Even though it's C code, most of the code is practically the > same in PyOpenGL. I'll look at shaders too, thanks ! > > Hope this all makes sense :) > > Regards, Gijs |
From: Ian M. <geo...@gm...> - 2009-12-06 18:18:28
|
2009/12/6 Timothée Lecomte <tim...@lp...> > Hi Ian, > > I precisely need it to run as fast as the refresh rate, and it already is > running fast enough, either with opengl or with pure 2D. However, it burns > too much CPU to my taste, since the drawing part takes more time than the > processing part, although the latter is quite heavy (FFT...), so I think > there is room for large improvements. > I'm confused. The CPU doesn't work too hard to coordinate the actions of the GPU--it just takes time to do it, because it must wait, as Gijs explained. If the CPU is working hard, that means something else is going on. You can reduce processing time by optimizing, if applicable, and/or using a JIT compiler (e.g. psyco). It sounds like you're computing the FFT (which I assume is for signal processing) on the CPU. If you're doing that for a thousand some times every frame, that is likely to be your speed problem. Using a shader would take all that load of the CPU. > As you rightly say, I am drawing the texture off by one. As far as > alternatives to pure glTexSubImage2D, I have considered and tried to use > PBO, but it does not decrease the time needed by the two glTexSubImage2D > calls. I don't really see how I could use glCopyTexSubImage2D instead of > glTexSubImage2D. > My mistake: try glCopyTexImage2D: #Draw the scene, then . . . glBindTexture(GL_TEXTURE_2D,texture) glCopyTexImage2D(GL_TEXTURE_2D,0,format,x,y,width,height,0) I can't help you with PBOs, as I don't have a working implementation myself. FBOs should work comparably, though. There's a Python FBO class and tutorial in my latest OpenGL Library, on pygame.org, which may be of use to you. Ian |
From: Greg E. <gre...@ca...> - 2009-12-06 22:49:31
|
Timothée Lecomte wrote: > As far as > alternatives to pure glTexSubImage2D, I have considered and tried to > use PBO, but it does not decrease the time needed by the two > glTexSubImage2D calls. This is puzzling, because transferring 1000 pixels of data shouldn't take very long, even if you're not doing it the most efficient way. The only thing I can think of is to try using different pixel formats for the texture data. Fastest CPU->GPU transfers occur when you use a pixel format that matches what the GPU uses, so that no conversions are needed. -- Greg |
From: Timothée L. <tim...@lp...> - 2009-12-07 08:13:39
|
Le 7 déc. 09 à 00:00, Greg Ewing a écrit : > Timothée Lecomte wrote: >> As far as alternatives to pure glTexSubImage2D, I have considered >> and tried to use PBO, but it does not decrease the time needed by >> the two glTexSubImage2D calls. > > This is puzzling, because transferring 1000 pixels of data > shouldn't take very long, even if you're not doing it the > most efficient way. > > The only thing I can think of is to try using different > pixel formats for the texture data. Fastest CPU->GPU transfers > occur when you use a pixel format that matches what the > GPU uses, so that no conversions are needed. > That's an advice I've seen very often on opengl pages on the web. Is there a way to know at runtime what is the native format ? Thanks Greg. Timothée |
From: René D. <re...@gm...> - 2009-12-08 16:43:05
|
2009/12/7 Timothée Lecomte <tim...@lp...> > > Le 7 déc. 09 à 00:00, Greg Ewing a écrit : > > > Timothée Lecomte wrote: > >> As far as alternatives to pure glTexSubImage2D, I have considered > >> and tried to use PBO, but it does not decrease the time needed by > >> the two glTexSubImage2D calls. > > > > This is puzzling, because transferring 1000 pixels of data > > shouldn't take very long, even if you're not doing it the > > most efficient way. > > > > The only thing I can think of is to try using different > > pixel formats for the texture data. Fastest CPU->GPU transfers > > occur when you use a pixel format that matches what the > > GPU uses, so that no conversions are needed. > > > > That's an advice I've seen very often on opengl pages on the web. Is > there a way to know at runtime what is the native format ? > > Thanks Greg. > > Timothée > > hi, Timing calls is the best way really. Using two textures has been faster for me on multiple cards. However switching to the best format gives a much bigger speed boost. The two texture trick works well because it doesn't stall the card(s) as much as reading from the frame buffer. There are also extensions on some cards/drivers to get the card to use system memory for specific textures... in which case it is just a memcpy to copy. However that can be complicated, and only faster in some situations. cu, |
From: René D. <re...@gm...> - 2009-12-10 10:41:44
|
2009/12/9 Timothée Lecomte <tim...@lp...>: > Now this profile brings another question: > wrapper:__call__ seems to use a lot of slow python calls > (calculate_pyArgs, calculate_cArguments...) whereas it does not seem to > spend so much time in the actual openGL (3% if I count correctly). Is > there a way to improve this ? > > I hope this was not too complicated ! Thanks for your help. > > Timothée > > yes, you can use the raw calls. eg, import OpenGL.raw.GL OpenGL.raw.GL.glTexSubImage2D They don't have the niceties of the other wrappers, but can work fine... and there are less calls. Otherwise wrap some calls with a cython or C extension for more speed. cu, |
From: Timothée L. <tim...@lp...> - 2009-12-10 14:48:04
|
René Dudfield a écrit : > 2009/12/9 Timothée Lecomte <tim...@lp...>: > >> Now this profile brings another question: >> wrapper:__call__ seems to use a lot of slow python calls >> (calculate_pyArgs, calculate_cArguments...) whereas it does not seem to >> spend so much time in the actual openGL (3% if I count correctly). Is >> there a way to improve this ? >> >> I hope this was not too complicated ! Thanks for your help. >> >> Timothée > <...> > > Otherwise wrap some calls with a cython or C extension for more speed. > Hi René, I have now written a little C extension to wrap, and openGL calls have almost disappeared from the profile : openGL calls went from 11.89 % of the total time in the previous profile, down to 1.84 % with the C extension ! I have kept the whole initialization in pyopengl since it's not as performance critical. Thanks for your sound advice ! Best regards, Timothée |
From: Ian M. <geo...@gm...> - 2009-12-11 03:23:02
|
Some things definitely change, though--at least they do for me. Simply replacing "from OpenGL.GL import *" with "from OpenGL.raw.GL import *" gives problems. For example, glGenTextures(1) whines about needing two arguments. Does it want a numpy array or something? Ian |
From: Mike C. F. <mcf...@vr...> - 2009-12-11 14:50:36
|
Ian Mallett wrote: > Some things definitely change, though--at least they do for me. > > Simply replacing "from OpenGL.GL import *" with "from OpenGL.raw.GL > <http://OpenGL.raw.GL> import *" gives problems. > > For example, glGenTextures(1) whines about needing two arguments. > Does it want a numpy array or something? Any of a numpy array, ctypes array, or ctypes byref( ctypes c_int ) for the single-int case. You'll find that pattern repeating all through your code-base, basically OpenGL seldom allocates memory, so anywhere something "returns" an array in PyOpenGL you will have to allocate the array yourself and pass it into the GL call for the raw APIs. All glGet* calls, for instance, will need to be altered. You'll also find that many array APIs will require the "full" form where you specify data-types, strides and the like (since you're using VBOs already, that shouldn't be a challenge for you). Upshot is, after you've done all that your code should work pretty-much unmodified with any raw OpenGL wrapper (SWIG, pyglet, Cython, etc). HTH, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: René D. <re...@gm...> - 2009-12-11 14:54:55
|
2009/12/11 Ian Mallett <geo...@gm...>: > Some things definitely change, though--at least they do for me. > > Simply replacing "from OpenGL.GL import *" with "from OpenGL.raw.GL import > *" gives problems. > > For example, glGenTextures(1) whines about needing two arguments. Does it > want a numpy array or something? > > Ian > hi, yeah, the raw ones don't do all the nice pythony things for you. So using import * from the raw ones is probably not a good idea... but to instead use it only for the slow, or problematic functions. In this case glGenTextures is much like the C version. void glGenTextures(GLsizei n, GLuint * textures); So you need to pass it a pointer to some data where it will write your data. You can see the argument types like this: >>> OpenGL.raw.GL.glGenTextures._argtypes_ (<class 'ctypes.c_long'>, <class 'OpenGL.arrays.arraydatatype.GLuintArray'>) And give it the array to store a GLuint. >>> a = numpy.array([1], numpy.uint32) >>> OpenGL.raw.GL.glGenTextures(1, a) cu, |
From: Timothée L. <tim...@lp...> - 2009-12-09 09:46:33
|
Gijs a écrit : > On 7-12-2009 9:13, Timothée Lecomte wrote: >> >> Le 7 déc. 09 à 00:00, Greg Ewing a écrit : >> >>> Timothée Lecomte wrote: >>>> As far as alternatives to pure glTexSubImage2D, I have considered >>>> and tried to use PBO, but it does not decrease the time needed by >>>> the two glTexSubImage2D calls. >>> >>> This is puzzling, because transferring 1000 pixels of data >>> shouldn't take very long, even if you're not doing it the >>> most efficient way. >>> >>> The only thing I can think of is to try using different >>> pixel formats for the texture data. Fastest CPU->GPU transfers >>> occur when you use a pixel format that matches what the >>> GPU uses, so that no conversions are needed. >>> >> >> That's an advice I've seen very often on opengl pages on the web. Is >> there a way to know at runtime what is the native format ? >> >> Thanks Greg. >> >> Timothée >> > Hello Timothée, > > I don't know of any method to determining at runtime which texture > transfers the fastest, except benchmarking it and see which performs > the best. However, most of the time GL_BGRA with > GL_UNSIGNED_INT_8_8_8_8 is usually the fastest. If you want to check > to make sure, run the code I supplied with this email. It transfers a > whole bunch of different formats and displays which are the fasted > pixel-wise and megabyte-wise. > > Regards, Gijs Hello Gijs, Thanks for your help. I found that on my machine, the best pixel rate is achieved with GL_BGRA and GL_UNSIGNED_INT_8_8_8_8_REV. I get 397.253 Mp/s. Does that look ok (it's NVidia Quadro NVS 290/PCI/SSE2, on ubuntu karmic) ? I have a question now: when glTexSubImage2D is called to update ~1000 pixels on a texture of 1Mpixels, does the whole texture gets transferred again from system memory to video memory ? Best regards, Timothée |
From: Timothée L. <tim...@lp...> - 2009-12-09 10:50:52
|
Ian Mallett a écrit : > 2009/12/6 Timothée Lecomte <tim...@lp... > <mailto:tim...@lp...>> > > Hi Ian, > > I precisely need it to run as fast as the refresh rate, and it > already is running fast enough, either with opengl or with pure > 2D. However, it burns too much CPU to my taste, since the drawing > part takes more time than the processing part, although the latter > is quite heavy (FFT...), so I think there is room for large > improvements. > > I'm confused. The CPU doesn't work too hard to coordinate the actions > of the GPU--it just takes time to do it, because it must wait, as Gijs > explained. If the CPU is working hard, that means something else is > going on. You can reduce processing time by optimizing, if > applicable, and/or using a JIT compiler (e.g. psyco). > > It sounds like you're computing the FFT (which I assume is for signal > processing) on the CPU. If you're doing that for a thousand some > times every frame, that is likely to be your speed problem. Using a > shader would take all that load of the CPU. > <...> > > Ian Hi Ian, Thanks for your comment. I understand your idea about the FFT. However in my case profiling tells that drawing is the bottleneck, not processing. To give you a more precise idea, here is the result of cProfile on my application: http://imgur.com/deMyT.png It's a bit crowded, so I'll try to explain where the relevant pieces of information are (it's also slightly different from my first post in this thread, since I've learnt to use PBO, VBO and a bit of shaders in the mean time): Starting from the top of the profile, we see: <built-in method exec_> 99.49% (73.94%) This is the main loop, which is running 100 % of the total application time (minus 0.51% for initialization). And it is idle 73.94% of the time (which is already quite good!!!). Then, we have three leaves below exec_, two of them are prominent: spectrogram_timer_slot 8.11% This is the function that retrieves the data from the audio card, does the FFT and other math processing. That's 8% of the total time, and FFT in particular is only 1% of total time. And we have: paintGL 17.10% This function does again a little bit of math, and the openGL calls. Among the maths things, there is mostly some interpolation (1.17%) and computation of pixel color (2.85%) (which could be done in a fragment shader, by the way). Finally the openGL calls are: send_data (2.14%) which copies my ~1000 pixels to a PBO with: GL.glBufferData(GL.GL_PIXEL_UNPACK_BUFFER_ARB, byteString, GL.GL_DYNAMIC_DRAW) put_data_in_texture (5.23%) which uses the PBO to update the texture with: GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset, 0, 1, height, GL.GL_BGRA, GL.GL_UNSIGNED_INT_8_8_8_8_REV, None) GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset + self.canvas_width, 0, 1, height, GL.GL_BGRA, GL.GL_UNSIGNED_INT_8_8_8_8_REV, None) realpaint (4.52%) which draws using two VBO with: GL.glBindBuffer(GL.GL_ARRAY_BUFFER, self.vertex_vbo) GL.glVertexPointer(2, GL.GL_FLOAT, 0, None) GL.glBindBuffer(GL.GL_ARRAY_BUFFER, self.texture_vbo) GL.glTexCoordPointer(2, GL.GL_FLOAT, 0, None) #self.program contains a pixel shader #that draws the texture with a small offset on each frame GL.glUseProgram(self.program) GL.glUniform1f(self.loc, xoff) GL.glDrawArrays(GL.GL_QUADS, 0, 4) The whole openGL commands boils down to a single: wrapper:__call__ (10.45%) In conclusion, the FFT is only ~1% while openGL drawing is ~11% ! That's why I want to improve the drawing and not the processing. Now this profile brings another question: wrapper:__call__ seems to use a lot of slow python calls (calculate_pyArgs, calculate_cArguments...) whereas it does not seem to spend so much time in the actual openGL (3% if I count correctly). Is there a way to improve this ? I hope this was not too complicated ! Thanks for your help. Timothée |
From: Silverstein <her...@sc...> - 2009-12-11 19:02:44
|
> Thanks for your comment. I understand your idea about the FFT. However > in my case profiling tells that drawing is the bottleneck, not > processing. To give you a more precise idea, here is the result > of cProfile on my application: > http://imgur.com/deMyT.png > > Can you say a bit about what tool you are using to do the profiling and where I can get it? It looks very useful. Herc |
From: Timothée L. <tim...@lp...> - 2009-12-15 14:10:52
|
Silverstein a écrit : > >> Thanks for your comment. I understand your idea about the FFT. However >> in my case profiling tells that drawing is the bottleneck, not >> processing. To give you a more precise idea, here is the result >> of cProfile on my application: >> http://imgur.com/deMyT.png >> >> > Can you say a bit about what tool you are using to do the profiling > and where I can get it? It looks very useful. > > Herc > Hi Herc, The profile itself is obtained with a standard python profiling module, called cProfile. My main script is called "friture.py", so I run the following command : python -m cProfile -o output.pstats ./friture.py Then, I get profile statistics in the "output.pstats" file, which is then transformed by a script called "gprof2dot.py" to a ".dot" file that can be processed by Graphviz, which finally gives the output png : ./gprof2dot.py -f pstats output.pstats -n 0.1 -e 0.02| dot -Tpng -o output2.png "dot" is part of Graphviz (www.graphviz.org), and "gprof2dot.py" can be found here: http://code.google.com/p/jrfonseca/wiki/Gprof2Dot Best regards, Timothée |
From: Frédéric <fre...@gb...> - 2010-01-11 06:26:57
|
On vendredi 11 décembre 2009, Silverstein wrote: > > Thanks for your comment. I understand your idea about the FFT. However > > in my case profiling tells that drawing is the bottleneck, not > > processing. To give you a more precise idea, here is the result > > of cProfile on my application: > > http://imgur.com/deMyT.png > > Can you say a bit about what tool you are using to do the profiling and > where I can get it? It looks very useful. I second this demand, for my textures problem... -- Frédéric http://www.gbiloba.org |
From: Mike C. F. <mcf...@vr...> - 2009-12-10 15:37:34
|
René Dudfield wrote: > 2009/12/9 Timothée Lecomte <tim...@lp...>: > >> Now this profile brings another question: >> wrapper:__call__ seems to use a lot of slow python calls >> (calculate_pyArgs, calculate_cArguments...) whereas it does not seem to >> spend so much time in the actual openGL (3% if I count correctly). Is >> there a way to improve this ? >> >> I hope this was not too complicated ! Thanks for your help. >> >> Timothée >> > yes, you can use the raw calls. > You could also install the PyOpenGL-accelerate module, which uses Cython to speed up the wrappers (but still uses Ctypes for the actual calls). Won't be as fast as the C-coded wrapper you created, though. HTH, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Massimo Di S. <mas...@ya...> - 2009-12-11 01:10:14
|
Hi All, " i'm not expert programmer, just a biologist student :-/ with self-teached python experience so apologize my questions if they are based on wrong assumptions " for my study i need to write a code to inteface a joystick device with an open source application similar to google earth. this application is a 3d globe based on openscenegraphic (OSG), it has a "tcp listner interface" that accept xml message to move a camera-view around the globe the message to move the globe is something like : go_to_lat_lon(latitude longitude roll picth head) i tried to code in python using pygame, the results is (my fault) an "ugly" code that is not able to update the position according with heading changes (view direction). exactly i have a 3 axis joystick axis_x -> +/- longitude axis_y -> +/- latitude axis_z -> +/- 0-360° (heading) axis_v = 0-1 (speed) the code i wrote use the axis_x and axis_y to change longitude and latitude and axis_z to change the view direction. but it is totally wrong beacouse if i have heading = 0 = 360 (e.g. look to north) the code works as aspected ... but if i change direction (view turns) the axis code is not update. so if i look yo east and put the joysrick up (go ahead) it don't go to east .. but go ever to north. move the axis_y up move the planet to north move the axis_y down move the planet to south move axis_y left and right move the planet to west and east but beacouse my code is totally separated by the heading changes, if i change heading the joystik x/y axis action is immutate. this beacouse i connect the x/y action to a code that simply do an increase/decrease lon-lat values. increase : for i in arange(j , j+1): lati = [sum(zlat)] a = abs(axis_v) * abs(axis_x) lati.insert( i+1,a ) zlat = array(lati) lat = sum(zlat) j = 0.1 for i in arange(j , j+1): lati = [sum(zlat)] a = abs(axis_v) * abs(axis_x) lati = [sum(lati) - a] zlat = array(lati) lat = sum(zlat) j = 0.1 i used the increase decrease code to handle all the avaiable condition, North, South, East, Ovest, Ne, Nw , Se , Sw : 1) axis_x < 0 axis_y == 0 2) axis_x == 0 axis_y > 0 3) axis_x > 0 axis_y == 0 4) axis_x == 0 axis_y < 0 5) axis_x < 0 axis_y > 0 6) axis_x > 0 axis_y > 0 7) axis_x > 0 axis_y < 0 8) axis_x < 0 axis_y < 0 i hope exists already a function in opengl that can help me to solve this problem reading a lot on google seems my problem can be solved using complex math like quaternion or rotation around an arbitrary axis. i can ignore roll and pich, beacouse i can chenge them using the joystick hat .. an roll and pich are ininfluent on the lon-lat position ... while the heading is strictly related to the position beacouse it represent the movment direction. plese apologize again my ugly code, this what i'm actually using : http://www.geofemengineering.it/data/epi_joy.py this the results : http://www.geofemengineering.it/Site/Media/ossimplanet_joystick-2.mp4 while here i tried to learn quaternion .. but my brain is not hable "yet" to undstand how to works with quaternion in a "longitude - latitude" space :-( http://www.geofemengineering.it/data/epiquath.py i hope some one can help me! please i'm italian so it is a bit hard to find the right word to describe my problem for eny more detailed explanaton, please ask me where i need to give more precise informations. thanks to help me! ciao, Massimo. |