2009/12/6 Timothée Lecomte <timothee.lecomte@lpa.ens.fr>
Hi Ian,

I precisely need it to run as fast as the refresh rate, and it already is running fast enough, either with opengl or with pure 2D. However, it burns too much CPU to my taste, since the drawing part takes more time than the processing part, although the latter is quite heavy (FFT...), so I think there is room for large improvements.
I'm confused.  The CPU doesn't work too hard to coordinate the actions of the GPU--it just takes time to do it, because it must wait, as Gijs explained.   If the CPU is working hard, that means something else is going on.  You can reduce processing time by optimizing, if applicable, and/or using a JIT compiler (e.g. psyco).

It sounds like you're computing the FFT (which I assume is for signal processing) on the CPU.  If you're doing that for a thousand some times every frame, that is likely to be your speed problem.  Using a shader would take all that load of the CPU. 
As you rightly say, I am drawing the texture off by one. As far as alternatives to pure glTexSubImage2D, I have considered and tried to use PBO, but it does not decrease the time needed by the two glTexSubImage2D calls. I don't really see how I could use glCopyTexSubImage2D instead of glTexSubImage2D.
My mistake: try glCopyTexImage2D:
#Draw the scene, then . . .

I can't help you with PBOs, as I don't have a working implementation myself.  FBOs should work comparably, though.  There's a Python FBO class and tutorial in my latest OpenGL Library, on pygame.org, which may be of use to you.