Re: [PyOpenGL-Users] Rolling spectrogram with pyopengl

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ian Mallett a écrit :
> 2009/12/6 Timothée Lecomte <tim...@lp... 
> <mailto:tim...@lp...>>
>
>     Hi Ian,
>
>     I precisely need it to run as fast as the refresh rate, and it
>     already is running fast enough, either with opengl or with pure
>     2D. However, it burns too much CPU to my taste, since the drawing
>     part takes more time than the processing part, although the latter
>     is quite heavy (FFT...), so I think there is room for large
>     improvements.
>
> I'm confused.  The CPU doesn't work too hard to coordinate the actions 
> of the GPU--it just takes time to do it, because it must wait, as Gijs 
> explained.   If the CPU is working hard, that means something else is 
> going on.  You can reduce processing time by optimizing, if 
> applicable, and/or using a JIT compiler (e.g. psyco).
>
> It sounds like you're computing the FFT (which I assume is for signal 
> processing) on the CPU.  If you're doing that for a thousand some 
> times every frame, that is likely to be your speed problem.  Using a 
> shader would take all that load of the CPU. 
> <...>
>
> Ian
Hi Ian,

Thanks for your comment. I understand your idea about the FFT. However
in my case profiling tells that drawing is the bottleneck, not
processing. To give you a more precise idea, here is the result
of cProfile on my application:
http://imgur.com/deMyT.png

It's a bit crowded, so I'll try to explain where the relevant pieces of 
information are (it's also slightly different from my first post in this 
thread, since I've learnt to use PBO, VBO and a bit of shaders in the 
mean time):

Starting from the top of the profile, we see:
     <built-in method exec_> 99.49% (73.94%)
This is the main loop, which is running 100 % of the total application
time (minus 0.51% for initialization). And it is idle 73.94% of the time
(which is already quite good!!!).

Then, we have three leaves below exec_, two of them are prominent:
     spectrogram_timer_slot 8.11%
This is the function that retrieves the data from the audio card, does
the FFT and other math processing. That's 8% of the total time, and FFT
in particular is only 1% of total time.
And we have:
     paintGL 17.10%
This function does again a little bit of math, and the openGL calls.
Among the maths things, there is mostly some interpolation (1.17%) and
computation of pixel color (2.85%) (which could be done in a fragment
shader, by the way). Finally the openGL calls are:

     send_data (2.14%) which copies my ~1000 pixels to a PBO with:
         GL.glBufferData(GL.GL_PIXEL_UNPACK_BUFFER_ARB, byteString,
GL.GL_DYNAMIC_DRAW)

     put_data_in_texture (5.23%) which uses the PBO to update the texture
with:
         GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, self.offset, 0, 1,
		height, GL.GL_BGRA,
		GL.GL_UNSIGNED_INT_8_8_8_8_REV, None)
         GL.glTexSubImage2D(GL.GL_TEXTURE_2D, 0,
		self.offset + self.canvas_width, 0, 1, height,
		GL.GL_BGRA, GL.GL_UNSIGNED_INT_8_8_8_8_REV, None)

     realpaint (4.52%) which draws using two VBO with:
         GL.glBindBuffer(GL.GL_ARRAY_BUFFER, self.vertex_vbo)
         GL.glVertexPointer(2, GL.GL_FLOAT, 0, None)
         GL.glBindBuffer(GL.GL_ARRAY_BUFFER, self.texture_vbo)
         GL.glTexCoordPointer(2, GL.GL_FLOAT, 0, None)
         #self.program contains a pixel shader
	#that draws the texture with a small offset on each frame
         GL.glUseProgram(self.program)
         GL.glUniform1f(self.loc, xoff)
         GL.glDrawArrays(GL.GL_QUADS, 0, 4)

The whole openGL commands boils down to a single:
wrapper:__call__ (10.45%)
In conclusion, the FFT is only ~1% while openGL drawing is ~11% ! That's
why I want to improve the drawing and not the processing.

Now this profile brings another question:
wrapper:__call__ seems to use a lot of slow python calls
(calculate_pyArgs, calculate_cArguments...) whereas it does not seem to
spend so much time in the actual openGL (3% if I count correctly). Is
there a way to improve this ?

I hope this was not too complicated ! Thanks for your help.

Timothée