Thread: [PyOpenGL-Users] Perl vs. Python OpenGL bindings benchmarks
Brought to you by:
mcfletch
From: Aleksandar B. S. <asa...@gm...> - 2007-06-03 08:20:54
|
There was a set of benchmark results posted recently, comparing Perl OpenGL (POGL) bindings performance with performance of some other OpenGL bindings; results of comparison between Perl and Python OpenGL bindings could be found here: http://graphcomp.com/pogl.cgi?v=0111s3B2&r=s3m3 PyOpenGL is claimed to be largely inferior in performance, especially regarding using vertex array functionality. Python benchmark code is somewhat improved in the meantime, and PyOpenGL version 3, instead of version 2, is used to retest, but claimed low PyOpenGL performance is still there. I was able to compare PyOpenGL based benchmark results with Perl SDL based OpenGL binding results, and PyOpenGL perforance is superior on my machine (as expected, because after all PyOpenGL in version 3 is really thin layer above OpenGL C code); however I'm unable to compile POGL properly, and run POGL based benchmarks on my machine. I do expect however, because of above mentioned reason, that PyOpenGL performance must be at least in range with POGL performance, thus I'd suggest to anyone interested to try to run both benchmarks on its machine, and to send results to POGL team; of course, further improvements in Python benchmark code would be welcome too. Regards, Alex |
From: Mike C. F. <mcf...@vr...> - 2007-06-04 03:21:25
|
Aleksandar B. Samardzic wrote: > There was a set of benchmark results posted recently, comparing Perl > OpenGL (POGL) bindings performance with performance of some other > OpenGL bindings; results of comparison between Perl and Python OpenGL > bindings could be found here: > http://graphcomp.com/pogl.cgi?v=0111s3B2&r=s3m3 > PyOpenGL is claimed to be largely inferior in performance, especially > regarding using vertex array functionality. Python benchmark code is > somewhat improved in the meantime, and PyOpenGL version 3, instead of > version 2, is used to retest, but claimed low PyOpenGL performance is > still there. I was able to compare PyOpenGL based benchmark results > with Perl SDL based OpenGL binding results, and PyOpenGL perforance is > superior on my machine (as expected, because after all PyOpenGL in > version 3 is really thin layer above OpenGL C code); however I'm > unable to compile POGL properly, and run POGL based benchmarks on my > machine. I do expect however, because of above mentioned reason, that > PyOpenGL performance must be at least in range with POGL performance, > thus I'd suggest to anyone interested to try to run both benchmarks > on its machine, and to send results to POGL team; of course, further > improvements in Python benchmark code would be welcome too. > Actually, I would be *very* surprised if PyOpenGL is anywhere near the same speed. PyOpenGL does a lot more than just wrap the C call. In particular, it adds a glGetError call after *every* call to provide Python-like exception-on-error semantics instead of requiring explicit error checks in the code. The current 3.0.0a6 version also does a logging-enabling call for every function (I'm thinking of making that feature default to off, though it would mean less informative error messages). That said, I couldn't get their benchmark code to run in PyOpenGL ("PyBench" doesn't seem to be the pybench module I'm familiar with), and didn't have enough time to try fixing it before checking to see if there are any gross issues with it. As for the vertex array functionality; we do a *lot* of Python-side code to do a vertex translation. The PyOpenGL 3.x line has a far more flexible engine for the array sources than 2.x did, allowing run-time plugging of new array data-types. The cost of that is some performance penalty on passing each array. Larger arrays push average that expense out over the whole array, so a 10,000 point array would have almost no appreciable overhead (and would likely be faster than PyOpenGL 2.x), while passing in hundreds of 3 or 4 element arrays would show a huge overhead compared to PyOpenGL 2.x (which did all the array conversions in C, but often wound up silently copying the arrays). Have fun, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Aleksandar B. S. <asa...@gm...> - 2007-06-04 10:26:57
|
On 6/4/07, Mike C. Fletcher <mcf...@vr...> wrote: > Aleksandar B. Samardzic wrote: > > There was a set of benchmark results posted recently, comparing Perl > > OpenGL (POGL) bindings performance with performance of some other > > OpenGL bindings; results of comparison between Perl and Python OpenGL > > bindings could be found here: > > http://graphcomp.com/pogl.cgi?v=0111s3B2&r=s3m3 > > PyOpenGL is claimed to be largely inferior in performance, especially > > regarding using vertex array functionality. Python benchmark code is > > somewhat improved in the meantime, and PyOpenGL version 3, instead of > > version 2, is used to retest, but claimed low PyOpenGL performance is > > still there. I was able to compare PyOpenGL based benchmark results > > with Perl SDL based OpenGL binding results, and PyOpenGL perforance is > > superior on my machine (as expected, because after all PyOpenGL in > > version 3 is really thin layer above OpenGL C code); however I'm > > unable to compile POGL properly, and run POGL based benchmarks on my > > machine. I do expect however, becauseof above mentioned reason, that > > PyOpenGL performance must be at least in range with POGL performance, > > thus I'd suggest to anyone interested to try to run both benchmarks > > on its machine, and to send results to POGL team; of course, further > > improvements in Python benchmark code would be welcome too. > > > Actually, I would be *very* surprised if PyOpenGL is anywhere near the > same speed. > > PyOpenGL does a lot more than just wrap the C call. In particular, it > adds a glGetError call after *every* call to provide Python-like > exception-on-error semantics instead of requiring explicit error checks > in the code. The current 3.0.0a6 version also does a logging-enabling > call for every function (I'm thinking of making that feature default to > off, though it would mean less informative error messages). > > That said, I couldn't get their benchmark code to run in PyOpenGL > ("PyBench" doesn't seem to be the pybench module I'm familiar with), and > didn't have enough time to try fixing it before checking to see if there > are any gross issues with it. > > As for the vertex array functionality; we do a *lot* of Python-side code > to do a vertex translation. The PyOpenGL 3.x line has a far more > flexible engine for the array sources than 2.x did, allowing run-time > plugging of new array data-types. The cost of that is some performance > penalty on passing each array. Larger arrays push average that expense > out over the whole array, so a 10,000 point array would have almost no > appreciable overhead (and would likely be faster than PyOpenGL 2.x), > while passing in hundreds of 3 or 4 element arrays would show a huge > overhead compared to PyOpenGL 2.x (which did all the array conversions > in C, but often wound up silently copying the arrays). Take a look say into benchmark() function in trislam_pyopengl_ctp.py - benchmark consist of same drawing sequence rendered either 100 times or for 10 seconds, whatever takes less. For vertex arrays (for example, draw_quads_va() function), drawing sequence consists practically of glVertexPointer() and glDrawArrays() calls; further, array of vertex coordinates that is passed to glVertexPointer() is prepared beforehand (before benchmark() function loop entered). Now I guess you, as PyOpenGL primary developer, may know better, but if array of vertex coordinates is already ctypes based array (and I sent some tiny patches, so trislam_pyopengl_ctp.py is benchmark version that is ctypes based now), then from what I can see in PyOpenGL code, overhead has to be negligible... On the other side, I do think that alike benchmark could be very useful in pointing to performance bottlenecks of a wrapper; thus, while it is indeed very simple benchmark, I guess PyOpenGL could only benefit if you find some time, sometimes, to play with trislam. Regards, Alex |
From: Mike C. F. <mcf...@vr...> - 2007-06-05 03:02:57
|
Aleksandar B. Samardzic wrote: > On 6/4/07, Mike C. Fletcher <mcf...@vr...> wrote: > ... >> PyOpenGL does a lot more than just wrap the C call. In particular, it >> adds a glGetError call after *every* call to provide Python-like >> exception-on-error semantics instead of requiring explicit error checks >> in the code. The current 3.0.0a6 version also does a logging-enabling >> call for every function (I'm thinking of making that feature default to >> off, though it would mean less informative error messages). >> ... > Take a look say into benchmark() function in trislam_pyopengl_ctp.py - > benchmark consist of same drawing sequence rendered either 100 times > or for 10 seconds, whatever takes less. For vertex arrays (for > example, draw_quads_va() function), drawing sequence consists > practically of glVertexPointer() and glDrawArrays() calls; further, > array of vertex coordinates that is passed to glVertexPointer() is > prepared beforehand (before benchmark() function loop entered). Now I > guess you, as PyOpenGL primary developer, may know better, but if > array of vertex coordinates is already ctypes based array (and I sent > some tiny patches, so trislam_pyopengl_ctp.py is benchmark version > that is ctypes based now), then from what I can see in PyOpenGL code, > overhead has to be negligible... > > On the other side, I do think that alike benchmark could be very > useful in pointing to performance bottlenecks of a wrapper; thus, > while it is indeed very simple benchmark, I guess PyOpenGL could only > benefit if you find some time, sometimes, to play with trislam. > Here's what a cProfile run of their "benchmark" method shows (they had a .tar.gz on the site, I'd thought it was just the one file, it includes the whole suite): * In ~400s, 66s are taken up with the error-checking functionality, most of that time is in the individual-vertex versions (array versions amortize the cost of the operation over the size of the array) * 24.1s is due to overhead from the wrapper objects o 6.5s is due to direct overhead from the "wrapper" objects (mostly bookkeeping (e.g. temporary list creation) and function-call overhead) o Array conversion overhead + 8.4s is due to overhead in determining size of ctypes arrays automatically + 7.9s is due to overhead in determining the "stride size" of ctypes arrays automatically The ctypes arrays are spending so much time in those two functions because they don't have a "dims" member, as seen in numpy arrays, so they have a little while loop that adds up the length of the sub-component arrays, producing lots of intermediate objects as a result. Making the error-checking an optional C module would be doable, but it's still going to be a fairly large part of total run-time (we would see a maximum of 16% speedup). Similarly, we might get a 1.5% speedup by writing the wrapper function in C. We might get that to 6% if we were to write the whole of the wrapping system in C (which I really do *not* want to do). Altogether we could get maybe 25% from the lower-hanging fruit of optimisation. We also seem to spend 7.9s (2%) in the time.time function and almost 207s (52% of time) waiting for glFlush calls to complete. That would suggest to me that we're seeing synchronization issues skewing results at least part of the time. I'm running all of this on Python 2.5 on a Gentoo Linux box, by the way. Anyway, hope that helps somewhat, Mike Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 338 0.237 0.001 0.242 0.001 PyBench.py:311(fade_to_white) 369465 1.329 0.000 1.826 0.000 arraydatatype.py:17(getHandler) 123155 0.557 0.000 1.278 0.000 arraydatatype.py:45(dataPointer) 123155 0.542 0.000 1.820 0.000 arraydatatype.py:55(voidDataPointer) 123155 0.797 0.000 1.649 0.000 arraydatatype.py:59(asArray) 123155 0.636 0.000 7.987 0.000 arraydatatype.py:71(unitSize) 123155 0.452 0.000 1.941 0.000 arrayhelpers.py:52(__call__) 123155 0.449 0.000 8.436 0.000 arrayhelpers.py:78(arraySizeOfFirst) 123155 0.348 0.000 0.348 0.000 contextdata.py:22(getContext) 123155 0.815 0.000 1.489 0.000 contextdata.py:35(setValue) 123155 0.449 0.000 2.097 0.000 converters.py:111(__call__) 123155 0.185 0.000 0.185 0.000 converters.py:155(__call__) 123155 0.192 0.000 0.192 0.000 converters.py:169(__call__) 492620 1.779 0.000 2.866 0.000 ctypesarrays.py:55(types) 369465 1.727 0.000 5.339 0.000 ctypesarrays.py:63(dims) 123155 0.169 0.000 0.169 0.000 ctypesarrays.py:69(asArray) 123155 1.441 0.000 6.781 0.000 ctypesarrays.py:72(unitSize) 17225384 48.363 0.000 66.352 0.000 error.py:162(glCheckError) 14887425 17.989 0.000 17.989 0.000 error.py:191(nullGetError) 125123 0.217 0.000 0.217 0.000 error.py:209(onBegin) 125123 0.208 0.000 0.208 0.000 error.py:212(onEnd) 125123 0.875 0.000 1.632 0.000 exceptional.py:44(glBegin) 125123 1.020 0.000 1.552 0.000 exceptional.py:48(glEnd) 49 0.000 0.000 0.000 0.000 formathandler.py:90(registerEquivalent) 49176 0.058 0.000 0.058 0.000 trislam_pyopengl_ctp.py:156(draw_empty) 26 0.000 0.000 0.000 0.000 trislam_pyopengl_ctp.py:159(stats_empty) 5583 17.723 0.003 34.072 0.006 trislam_pyopengl_ctp.py:162(draw_quads) 20661 2.306 0.000 6.676 0.000 trislam_pyopengl_ctp.py:173(draw_quads_va) 52 0.000 0.000 0.000 0.000 trislam_pyopengl_ctp.py:181(stats_quads) 6829 12.125 0.002 24.593 0.004 trislam_pyopengl_ctp.py:189(draw_qs) 18933 4.886 0.000 10.204 0.001 trislam_pyopengl_ctp.py:198(draw_qs_va) 24497 0.351 0.000 0.409 0.000 trislam_pyopengl_ctp.py:210(draw_qs_dl) 22370 0.540 0.000 4.787 0.000 trislam_pyopengl_ctp.py:214(draw_qs_va_dl) 104 0.000 0.000 0.000 0.000 trislam_pyopengl_ctp.py:224(stats_qs) 4010 22.822 0.006 44.041 0.011 trislam_pyopengl_ctp.py:232(draw_tris) 20307 2.815 0.000 7.291 0.000 trislam_pyopengl_ctp.py:246(draw_tris_va) 52 0.000 0.000 0.000 0.000 trislam_pyopengl_ctp.py:256(stats_tris) 9041 12.119 0.001 24.633 0.003 trislam_pyopengl_ctp.py:264(draw_ts) 20087 4.529 0.000 9.986 0.000 trislam_pyopengl_ctp.py:273(draw_ts_va) 29773 0.335 0.000 0.405 0.000 trislam_pyopengl_ctp.py:285(draw_ts_dl) 20797 0.501 0.000 4.496 0.000 trislam_pyopengl_ctp.py:289(draw_ts_va_dl) 104 0.000 0.000 0.000 0.000 trislam_pyopengl_ctp.py:299(stats_ts) 252064 2.706 0.000 3.402 0.000 trislam_pyopengl_ctp.py:699(start_frame) 252064 205.511 0.001 207.370 0.001 trislam_pyopengl_ctp.py:703(end_frame) 339 7.990 0.024 398.633 1.176 trislam_pyopengl_ctp.py:707(benchmark) 123155 6.467 0.000 24.188 0.000 wrapper.py:294(wrapperCall) 123155 0.148 0.000 0.148 0.000 {_ctypes.addressof} 492620 0.552 0.000 0.552 0.000 {callable} 738930 1.246 0.000 1.246 0.000 {getattr} 49 0.000 0.000 0.000 0.000 {hasattr} 369465 0.587 0.000 0.587 0.000 {isinstance} 247663 0.325 0.000 0.325 0.000 {len} 1108733 1.279 0.000 1.279 0.000 {method 'append' of 'list' objects} 339 0.001 0.000 0.001 0.000 {method 'disable' of '_lsprof.Profiler' objects} 615775 0.823 0.000 0.823 0.000 {method 'get' of 'dict' objects} 98 0.000 0.000 0.000 0.000 {method 'has_key' of 'dict' objects} 338 0.001 0.000 0.001 0.000 {method 'keys' of 'dict' objects} 13 0.000 0.000 0.000 0.000 {method 'write' of 'file' objects} 246655 0.555 0.000 0.555 0.000 {range} 3848500 7.971 0.000 7.971 0.000 {time.time} 246310 0.584 0.000 0.584 0.000 {zip} For completeness, the diff against the trislam version in their archive, this version will run under Python 2.5 (which provides the cProfile module): mcfletch@raistlin:~/tmp/trislam$ diff trislam_pyopengl_ctp.py trislam_pyopengl_ctp_profile.py 2a3 > from __future__ import division 15a17,18 > ##from OpenGL.error import ErrorChecker > ##ErrorChecker.registerChecker( ErrorChecker.nullGetError ) 20d22 < from __future__ import division 25a28,29 > import cProfile > PROFILER = cProfile.Profile() 662a667,668 > PROFILER.print_stats( ) > PROFILER.dump_stats( 'test.profile' ) 670a677 > 671a679 > import cProfile 676c684 < benchmark() --- > PROFILER.runcall( benchmark ) 914,919c922,928 < init() < print "Benchmarks:", < glutDisplayFunc(display) < glutIdleFunc(display) < glutKeyboardFunc(keyboard) < glutMainLoop() --- > if __name__ == "__main__": > init() > print "Benchmarks:", > glutDisplayFunc(display) > glutIdleFunc(display) > glutKeyboardFunc(keyboard) > glutMainLoop() -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com |
From: Aleksandar B. S. <asa...@gm...> - 2007-06-05 10:09:52
|
On 6/5/07, Mike C. Fletcher <mcf...@vr...> wrote: > Here's what a cProfile run of their "benchmark" method shows (they had a > .tar.gz on the site, I'd thought it was just the one file, it includes > the whole suite): > > [ snip ] > Mike, Thanks for running cProfile on this benchmark, and for providing results. Everything is as expected, I'd say - overhead regarding using Python arrays is not that big after all, error checking is indeed taking lots of running time, and most of running time is spent in actually awaiting OpenGL to complete drawing (which is very interesting in itself, because it shows that wrappers are probably not a bottleneck at all). So overall, I think we're good, I'll try to work with POGL guys to have PyOpenGL based benchmark properly run on their machines, so that they could eventually confirm what I'm seeing on my machine (that both wrappers are approximately of same performance), and update results on their site. As for passing arrays to glVertexPointer: so what would be your suggestion on best approach to accomplish this (if we're talking about large arrays, and if we put aside that we should be doing all of this in C instead of Python)? I tried to replace using ctypes arrays with using numpy arrays in trislam_pyopengl_ctp.py, and then to use glVertexPointer() (with 4 arguments) instead of glVertexPointerf() (with 1 argument); diff is attached (against trislam_pyopengl_ctp.py from POGL site) and benchmark results are practically unchanged... Regards, Alex ------------ [alex@r51 trislam]$ diff trislam_pyopengl_ctp.py trislam_pyopengl_num.py 24c24 < import ctypes --- > import numpy 88c88 < data = (ctypes.c_float * 2 * (count * count * 4))() --- > data = numpy.empty((count * count * 4, 2), numpy.float32) 104c104 < data = (ctypes.c_float * 2 * (count * (count + 1) * 2))() --- > data = numpy.empty((count * (count + 1) * 2, 2), numpy.float32) 116c116 < data = (ctypes.c_float * 2 * (count * count * 6))() --- > data = numpy.empty((count * count * 6, 2), numpy.float32) 137c137 < data = (ctypes.c_float * 2 * (count * (count + 1) * 2))() --- > data = numpy.empty((count * (count + 1) * 2, 2), numpy.float32) 171c171 < glVertexPointerf(va) --- > glVertexPointer(2, GL_FLOAT, 0, va) 198c198 < glVertexPointerf(va) --- > glVertexPointer(2, GL_FLOAT, 0, va) 213c213 < glVertexPointerf(va) --- > glVertexPointer(2, GL_FLOAT, 0, va) 245c245 < glVertexPointerf(va) --- > glVertexPointer(2, GL_FLOAT, 0, va) 273c273 < glVertexPointerf(va) --- > glVertexPointer(2, GL_FLOAT, 0, va) 288c288 < glVertexPointerf(va) --- > glVertexPointer(2, GL_FLOAT, 0, va) |