Thread: [PyOpenGL-Users] Perl vs. Python OpenGL bindings benchmarks

Brought to you by: mcfletch

pyopengl-users

[PyOpenGL-Users] Perl vs. Python OpenGL bindings benchmarks

From: Aleksandar B. S. <asa...@gm...> - 2007-06-03 08:20:54

There was a set of benchmark results posted recently, comparing Perl
OpenGL (POGL) bindings performance with performance of some other
OpenGL bindings; results of comparison between Perl and Python OpenGL
bindings could be found here:
  http://graphcomp.com/pogl.cgi?v=0111s3B2&r=s3m3
PyOpenGL is claimed to be largely inferior in performance, especially
regarding using vertex array functionality.  Python benchmark code is
somewhat improved in the meantime, and PyOpenGL version 3, instead of
version 2, is used to retest, but claimed low PyOpenGL performance is
still there.  I was able to compare PyOpenGL based benchmark results
with Perl SDL based OpenGL binding results, and PyOpenGL perforance is
superior on my machine (as expected, because after all PyOpenGL in
version 3 is really thin layer above OpenGL C code); however I'm
unable to compile POGL properly, and run POGL based benchmarks on my
machine.  I do expect however, because of above mentioned reason, that
PyOpenGL performance must be at least in range with POGL performance,
thus I'd suggest to anyone interested to try  to run both benchmarks
on its machine, and to send results to POGL team; of course, further
improvements in Python benchmark code would be welcome too.

Regards,
Alex

Re: [PyOpenGL-Users] Perl vs. Python OpenGL bindings benchmarks

From: Mike C. F. <mcf...@vr...> - 2007-06-04 03:21:25

Aleksandar B. Samardzic wrote:
> There was a set of benchmark results posted recently, comparing Perl
> OpenGL (POGL) bindings performance with performance of some other
> OpenGL bindings; results of comparison between Perl and Python OpenGL
> bindings could be found here:
>   http://graphcomp.com/pogl.cgi?v=0111s3B2&r=s3m3
> PyOpenGL is claimed to be largely inferior in performance, especially
> regarding using vertex array functionality.  Python benchmark code is
> somewhat improved in the meantime, and PyOpenGL version 3, instead of
> version 2, is used to retest, but claimed low PyOpenGL performance is
> still there.  I was able to compare PyOpenGL based benchmark results
> with Perl SDL based OpenGL binding results, and PyOpenGL perforance is
> superior on my machine (as expected, because after all PyOpenGL in
> version 3 is really thin layer above OpenGL C code); however I'm
> unable to compile POGL properly, and run POGL based benchmarks on my
> machine.  I do expect however, because of above mentioned reason, that
> PyOpenGL performance must be at least in range with POGL performance,
> thus I'd suggest to anyone interested to try  to run both benchmarks
> on its machine, and to send results to POGL team; of course, further
> improvements in Python benchmark code would be welcome too.
>   
Actually, I would be *very* surprised if PyOpenGL is anywhere near the 
same speed.

PyOpenGL does a lot more than just wrap the C call.  In particular, it 
adds a glGetError call after *every* call to provide Python-like 
exception-on-error semantics instead of requiring explicit error checks 
in the code.  The current 3.0.0a6 version also does a logging-enabling 
call for every function (I'm thinking of making that feature default to 
off, though it would mean less informative error messages).

That said, I couldn't get their benchmark code to run in PyOpenGL 
("PyBench" doesn't seem to be the pybench module I'm familiar with), and 
didn't have enough time to try fixing it before checking to see if there 
are any gross issues with it.

As for the vertex array functionality; we do a *lot* of Python-side code 
to do a vertex translation.  The PyOpenGL 3.x line has a far more 
flexible engine for the array sources than 2.x did, allowing run-time 
plugging of new array data-types.  The cost of that is some performance 
penalty on passing each array.  Larger arrays push average that expense 
out over the whole array, so a 10,000 point array would have almost no 
appreciable overhead (and would likely be faster than PyOpenGL 2.x), 
while passing in hundreds of 3 or 4 element arrays would show a huge 
overhead compared to PyOpenGL 2.x (which did all the array conversions 
in C, but often wound up silently copying the arrays).

Have fun,
Mike

-- 
________________________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://www.vrplumber.com
  http://blog.vrplumber.com

Re: [PyOpenGL-Users] Perl vs. Python OpenGL bindings benchmarks

From: Aleksandar B. S. <asa...@gm...> - 2007-06-04 10:26:57

On 6/4/07, Mike C. Fletcher <mcf...@vr...> wrote:
> Aleksandar B. Samardzic wrote:
> > There was a set of benchmark results posted recently, comparing Perl
> > OpenGL (POGL) bindings performance with performance of some other
> > OpenGL bindings; results of comparison between Perl and Python OpenGL
> > bindings could be found here:
> >   http://graphcomp.com/pogl.cgi?v=0111s3B2&r=s3m3
> > PyOpenGL is claimed to be largely inferior in performance, especially
> > regarding using vertex array functionality.  Python benchmark code is
> > somewhat improved in the meantime, and PyOpenGL version 3, instead of
> > version 2, is used to retest, but claimed low PyOpenGL performance is
> > still there.  I was able to compare PyOpenGL based benchmark results
> > with Perl SDL based OpenGL binding results, and PyOpenGL perforance is
> > superior on my machine (as expected, because after all PyOpenGL in
> > version 3 is really thin layer above OpenGL C code); however I'm
> > unable to compile POGL properly, and run POGL based benchmarks on my
> > machine.  I do expect however, becauseof above mentioned reason, that
> > PyOpenGL performance must be at least in range with POGL performance,
> > thus I'd suggest to anyone interested to try  to run both benchmarks
> > on its machine, and to send results to POGL team; of course, further
> > improvements in Python benchmark code would be welcome too.
> >
> Actually, I would be *very* surprised if PyOpenGL is anywhere near the
> same speed.
>
> PyOpenGL does a lot more than just wrap the C call.  In particular, it
> adds a glGetError call after *every* call to provide Python-like
> exception-on-error semantics instead of requiring explicit error checks
> in the code.  The current 3.0.0a6 version also does a logging-enabling
> call for every function (I'm thinking of making that feature default to
> off, though it would mean less informative error messages).
>
> That said, I couldn't get their benchmark code to run in PyOpenGL
> ("PyBench" doesn't seem to be the pybench module I'm familiar with), and
> didn't have enough time to try fixing it before checking to see if there
> are any gross issues with it.
>
> As for the vertex array functionality; we do a *lot* of Python-side code
> to do a vertex translation.  The PyOpenGL 3.x line has a far more
> flexible engine for the array sources than 2.x did, allowing run-time
> plugging of new array data-types.  The cost of that is some performance
> penalty on passing each array.  Larger arrays push average that expense
> out over the whole array, so a 10,000 point array would have almost no
> appreciable overhead (and would likely be faster than PyOpenGL 2.x),
> while passing in hundreds of 3 or 4 element arrays would show a huge
> overhead compared to PyOpenGL 2.x (which did all the array conversions
> in C, but often wound up silently copying the arrays).

Take a look say into benchmark() function in trislam_pyopengl_ctp.py -
 benchmark consist of same drawing sequence rendered either 100 times
or for 10 seconds, whatever takes less.  For vertex arrays (for
example, draw_quads_va() function), drawing sequence consists
practically of glVertexPointer() and glDrawArrays() calls; further,
array of vertex coordinates that is passed to glVertexPointer() is
prepared beforehand (before benchmark() function loop entered).  Now I
guess you, as PyOpenGL primary developer, may know better, but if
array of vertex coordinates is already ctypes based array (and I sent
some tiny patches, so trislam_pyopengl_ctp.py is benchmark version
that is ctypes based now), then from what I can see in PyOpenGL code,
overhead has to be negligible...

On the other side, I do think that alike benchmark could be very
useful in pointing to performance bottlenecks of a wrapper; thus,
while it is indeed very simple benchmark, I guess PyOpenGL could only
benefit if you find some time, sometimes, to play with trislam.

Regards,
Alex

Re: [PyOpenGL-Users] Perl vs. Python OpenGL bindings benchmarks

From: Mike C. F. <mcf...@vr...> - 2007-06-05 03:02:57

Aleksandar B. Samardzic wrote:
> On 6/4/07, Mike C. Fletcher <mcf...@vr...> wrote:
>   
...
>> PyOpenGL does a lot more than just wrap the C call.  In particular, it
>> adds a glGetError call after *every* call to provide Python-like
>> exception-on-error semantics instead of requiring explicit error checks
>> in the code.  The current 3.0.0a6 version also does a logging-enabling
>> call for every function (I'm thinking of making that feature default to
>> off, though it would mean less informative error messages).
>>     
...
> Take a look say into benchmark() function in trislam_pyopengl_ctp.py -
>  benchmark consist of same drawing sequence rendered either 100 times
> or for 10 seconds, whatever takes less.  For vertex arrays (for
> example, draw_quads_va() function), drawing sequence consists
> practically of glVertexPointer() and glDrawArrays() calls; further,
> array of vertex coordinates that is passed to glVertexPointer() is
> prepared beforehand (before benchmark() function loop entered).  Now I
> guess you, as PyOpenGL primary developer, may know better, but if
> array of vertex coordinates is already ctypes based array (and I sent
> some tiny patches, so trislam_pyopengl_ctp.py is benchmark version
> that is ctypes based now), then from what I can see in PyOpenGL code,
> overhead has to be negligible...
>
> On the other side, I do think that alike benchmark could be very
> useful in pointing to performance bottlenecks of a wrapper; thus,
> while it is indeed very simple benchmark, I guess PyOpenGL could only
> benefit if you find some time, sometimes, to play with trislam.
>   
Here's what a cProfile run of their "benchmark" method shows (they had a 
.tar.gz on the site, I'd thought it was just the one file, it includes 
the whole suite):

    * In ~400s, 66s are taken up with the error-checking functionality,
      most of that time is in the individual-vertex versions (array
      versions amortize the cost of the operation over the size of the
      array)
    * 24.1s is due to overhead from the wrapper objects
          o 6.5s is due to direct overhead from the "wrapper" objects
            (mostly bookkeeping (e.g. temporary list creation) and
            function-call overhead)
          o Array conversion overhead
                + 8.4s is due to overhead in determining size of ctypes
                  arrays automatically
                + 7.9s is due to overhead in determining the "stride
                  size" of ctypes arrays automatically

The ctypes arrays are spending so much time in those two functions 
because they don't have a "dims" member, as seen in numpy arrays, so 
they have a little while loop that adds up the length of the 
sub-component arrays, producing lots of intermediate objects as a result.

Making the error-checking an optional C module would be doable, but it's 
still going to be a fairly large part of total run-time (we would see a 
maximum of 16% speedup).  Similarly, we might get a 1.5% speedup by 
writing the wrapper function in C.  We might get that to 6% if we were 
to write the whole of the wrapping system in C (which I really do *not* 
want to do).  Altogether we could get maybe 25% from the lower-hanging 
fruit of optimisation.

We also seem to spend 7.9s (2%) in the time.time function and almost 
207s (52% of time) waiting for glFlush calls to complete.  That would 
suggest to me that we're seeing synchronization issues skewing results 
at least part of the time.  I'm running all of this on Python 2.5 on a 
Gentoo Linux box, by the way.

Anyway, hope that helps somewhat,
Mike

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      338    0.237    0.001    0.242    0.001 PyBench.py:311(fade_to_white)
   369465    1.329    0.000    1.826    0.000 
arraydatatype.py:17(getHandler)
   123155    0.557    0.000    1.278    0.000 
arraydatatype.py:45(dataPointer)
   123155    0.542    0.000    1.820    0.000 
arraydatatype.py:55(voidDataPointer)
   123155    0.797    0.000    1.649    0.000 arraydatatype.py:59(asArray)
   123155    0.636    0.000    7.987    0.000 arraydatatype.py:71(unitSize)
   123155    0.452    0.000    1.941    0.000 arrayhelpers.py:52(__call__)
   123155    0.449    0.000    8.436    0.000 
arrayhelpers.py:78(arraySizeOfFirst)
   123155    0.348    0.000    0.348    0.000 contextdata.py:22(getContext)
   123155    0.815    0.000    1.489    0.000 contextdata.py:35(setValue)
   123155    0.449    0.000    2.097    0.000 converters.py:111(__call__)
   123155    0.185    0.000    0.185    0.000 converters.py:155(__call__)
   123155    0.192    0.000    0.192    0.000 converters.py:169(__call__)
   492620    1.779    0.000    2.866    0.000 ctypesarrays.py:55(types)
   369465    1.727    0.000    5.339    0.000 ctypesarrays.py:63(dims)
   123155    0.169    0.000    0.169    0.000 ctypesarrays.py:69(asArray)
   123155    1.441    0.000    6.781    0.000 ctypesarrays.py:72(unitSize)
 17225384   48.363    0.000   66.352    0.000 error.py:162(glCheckError)
 14887425   17.989    0.000   17.989    0.000 error.py:191(nullGetError)
   125123    0.217    0.000    0.217    0.000 error.py:209(onBegin)
   125123    0.208    0.000    0.208    0.000 error.py:212(onEnd)
   125123    0.875    0.000    1.632    0.000 exceptional.py:44(glBegin)
   125123    1.020    0.000    1.552    0.000 exceptional.py:48(glEnd)
       49    0.000    0.000    0.000    0.000 
formathandler.py:90(registerEquivalent)
    49176    0.058    0.000    0.058    0.000 
trislam_pyopengl_ctp.py:156(draw_empty)
       26    0.000    0.000    0.000    0.000 
trislam_pyopengl_ctp.py:159(stats_empty)
     5583   17.723    0.003   34.072    0.006 
trislam_pyopengl_ctp.py:162(draw_quads)
    20661    2.306    0.000    6.676    0.000 
trislam_pyopengl_ctp.py:173(draw_quads_va)
       52    0.000    0.000    0.000    0.000 
trislam_pyopengl_ctp.py:181(stats_quads)
     6829   12.125    0.002   24.593    0.004 
trislam_pyopengl_ctp.py:189(draw_qs)
    18933    4.886    0.000   10.204    0.001 
trislam_pyopengl_ctp.py:198(draw_qs_va)
    24497    0.351    0.000    0.409    0.000 
trislam_pyopengl_ctp.py:210(draw_qs_dl)
    22370    0.540    0.000    4.787    0.000 
trislam_pyopengl_ctp.py:214(draw_qs_va_dl)
      104    0.000    0.000    0.000    0.000 
trislam_pyopengl_ctp.py:224(stats_qs)
     4010   22.822    0.006   44.041    0.011 
trislam_pyopengl_ctp.py:232(draw_tris)
    20307    2.815    0.000    7.291    0.000 
trislam_pyopengl_ctp.py:246(draw_tris_va)
       52    0.000    0.000    0.000    0.000 
trislam_pyopengl_ctp.py:256(stats_tris)
     9041   12.119    0.001   24.633    0.003 
trislam_pyopengl_ctp.py:264(draw_ts)
    20087    4.529    0.000    9.986    0.000 
trislam_pyopengl_ctp.py:273(draw_ts_va)
    29773    0.335    0.000    0.405    0.000 
trislam_pyopengl_ctp.py:285(draw_ts_dl)
    20797    0.501    0.000    4.496    0.000 
trislam_pyopengl_ctp.py:289(draw_ts_va_dl)
      104    0.000    0.000    0.000    0.000 
trislam_pyopengl_ctp.py:299(stats_ts)
   252064    2.706    0.000    3.402    0.000 
trislam_pyopengl_ctp.py:699(start_frame)
   252064  205.511    0.001  207.370    0.001 
trislam_pyopengl_ctp.py:703(end_frame)
      339    7.990    0.024  398.633    1.176 
trislam_pyopengl_ctp.py:707(benchmark)
   123155    6.467    0.000   24.188    0.000 wrapper.py:294(wrapperCall)
   123155    0.148    0.000    0.148    0.000 {_ctypes.addressof}
   492620    0.552    0.000    0.552    0.000 {callable}
   738930    1.246    0.000    1.246    0.000 {getattr}
       49    0.000    0.000    0.000    0.000 {hasattr}
   369465    0.587    0.000    0.587    0.000 {isinstance}
   247663    0.325    0.000    0.325    0.000 {len}
  1108733    1.279    0.000    1.279    0.000 {method 'append' of 'list' 
objects}
      339    0.001    0.000    0.001    0.000 {method 'disable' of 
'_lsprof.Profiler' objects}
   615775    0.823    0.000    0.823    0.000 {method 'get' of 'dict' 
objects}
       98    0.000    0.000    0.000    0.000 {method 'has_key' of 
'dict' objects}
      338    0.001    0.000    0.001    0.000 {method 'keys' of 'dict' 
objects}
       13    0.000    0.000    0.000    0.000 {method 'write' of 'file' 
objects}
   246655    0.555    0.000    0.555    0.000 {range}
  3848500    7.971    0.000    7.971    0.000 {time.time}
   246310    0.584    0.000    0.584    0.000 {zip}

For completeness, the diff against the trislam version in their archive, 
this version will run under Python 2.5 (which provides the cProfile module):

mcfletch@raistlin:~/tmp/trislam$ diff trislam_pyopengl_ctp.py 
trislam_pyopengl_ctp_profile.py
2a3
 > from __future__ import division
15a17,18
 > ##from OpenGL.error import ErrorChecker
 > ##ErrorChecker.registerChecker( ErrorChecker.nullGetError )
20d22
< from __future__ import division
25a28,29
 > import cProfile
 > PROFILER = cProfile.Profile()
662a667,668
 >     PROFILER.print_stats(  )
 >     PROFILER.dump_stats( 'test.profile' )
670a677
 >
671a679
 >     import cProfile
676c684
<         benchmark()
---
 >         PROFILER.runcall( benchmark )
914,919c922,928
< init()
< print "Benchmarks:",
< glutDisplayFunc(display)
< glutIdleFunc(display)
< glutKeyboardFunc(keyboard)
< glutMainLoop()
---
 > if __name__ == "__main__":
 >       init()
 >       print "Benchmarks:",
 >       glutDisplayFunc(display)
 >       glutIdleFunc(display)
 >       glutKeyboardFunc(keyboard)
 >       glutMainLoop()

-- 
________________________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://www.vrplumber.com
  http://blog.vrplumber.com

Re: [PyOpenGL-Users] Perl vs. Python OpenGL bindings benchmarks

From: Aleksandar B. S. <asa...@gm...> - 2007-06-05 10:09:52

On 6/5/07, Mike C. Fletcher <mcf...@vr...> wrote:
> Here's what a cProfile run of their "benchmark" method shows (they had a
> .tar.gz on the site, I'd thought it was just the one file, it includes
> the whole suite):
>
>    [ snip ]
>

Mike,

Thanks for running cProfile on this benchmark, and for providing
results.  Everything is as expected, I'd say - overhead regarding
using Python arrays is not that big after all, error checking is
indeed taking lots of running time, and most of running time is spent
in actually awaiting OpenGL to complete drawing (which is very
interesting in itself, because it shows that wrappers are probably not
a bottleneck at all).  So overall, I think we're good, I'll try to
work with POGL guys to have PyOpenGL based benchmark properly run on
their machines, so that they could eventually confirm what I'm seeing
on my machine (that both wrappers are approximately of same
performance), and update results on their site.

As for passing arrays to glVertexPointer: so what would be your
suggestion on best approach to accomplish this (if we're talking about
large arrays, and if we put aside that we should be doing all of this
in C instead of Python)?  I tried to replace using ctypes arrays with
using numpy arrays in trislam_pyopengl_ctp.py, and then to use
glVertexPointer() (with 4 arguments) instead of glVertexPointerf()
(with 1 argument); diff is attached (against trislam_pyopengl_ctp.py
from POGL site) and benchmark results are practically unchanged...

Regards,
Alex

------------
[alex@r51 trislam]$ diff trislam_pyopengl_ctp.py trislam_pyopengl_num.py
24c24
< import ctypes
---
> import numpy
88c88
<     data = (ctypes.c_float * 2 * (count * count * 4))()
---
>     data = numpy.empty((count * count * 4, 2), numpy.float32)
104c104
<     data = (ctypes.c_float * 2 * (count * (count + 1) * 2))()
---
>     data = numpy.empty((count * (count + 1) * 2, 2), numpy.float32)
116c116
<     data = (ctypes.c_float * 2 * (count * count * 6))()
---
>     data = numpy.empty((count * count * 6, 2), numpy.float32)
137c137
<     data = (ctypes.c_float * 2 * (count * (count + 1) * 2))()
---
>     data = numpy.empty((count * (count + 1) * 2, 2), numpy.float32)
171c171
<     glVertexPointerf(va)
---
>     glVertexPointer(2, GL_FLOAT, 0, va)
198c198
<     glVertexPointerf(va)
---
>     glVertexPointer(2, GL_FLOAT, 0, va)
213c213
<     glVertexPointerf(va)
---
>     glVertexPointer(2, GL_FLOAT, 0, va)
245c245
<     glVertexPointerf(va)
---
>     glVertexPointer(2, GL_FLOAT, 0, va)
273c273
<     glVertexPointerf(va)
---
>     glVertexPointer(2, GL_FLOAT, 0, va)
288c288
<     glVertexPointerf(va)
---
>     glVertexPointer(2, GL_FLOAT, 0, va)