Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Rightclick on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: Bruce Hahne <hahne@io...>  20080831 23:17:22

I'm working on a simple BVH player that presently uses Tk for display and cgkit for the matrix math, but I found that the matrix routines are killing me on performance. My program presently takes over 6 minutes on a modern machine to do some moderately simple joint rotation precomputations on a 2700frame BVH file, to preconvert all joint positions for each keyframe to worldspace. Wondering if I was doing something wrong, I installed numpy and wrote a few comparison programs. Numpy was a bit tricky because it supports both "array" and "matrix" types, and they each have their own way of invoking matrix multiplication. Here's a mini table of the results I got:  (Context: core 2 Quad 6600 2.4GHz/core, Fedora 8, Python 2.5.1, all scripts used only 1 core) 4x4 matrix creation, all zeroes: cgkit mat4(): 28000 creations/sec. numpy "array": 28000 creations/sec numpy "matrix": 8800 creations/sec 4x4 identity matrix creation: cgkit mat4(1): 19900 creations/sec numpy "array": 28000 creations/sec numpy "matrix": 8700 creations/sec 4x4 matrix multiplication: cgkit mat4(): 3100 matrix multiplies/sec (ouch) numpy "array": 79300 matrix multiplies/sec numpy "matrix": 10800 matrix multiplies/sec  So at least for the test code snippets I used, it seems like numpy's "array" type does matrix multiplication about 25x faster than cgkit. I also reran the numpy test using some simple nonzero floatingpoint entries in the 4x4 arrays and results were the same. It seems like I ought to switch the matrix math to numpy  I can't live with only 3100 matrix multiplies per second when I need to do precomputations for a few thousand keyframes on a 40joint skeleton. Is this pretty much what I should expect? Anything horribly wrong in my test code? (Of course, when I installed numpy, its installer fired up gcc and built all kinds of crud, so it's probably going down to the bare metal with its matrix multiply optimizations.) A few code extracts and results are below so that people can reproduce on your own machine as desired. Bruce Hahne hahne at io dot com Disclaimer: programming is not my day job, so I don't know what I'm doing :)  SAMPLE 1: 100,000 4x4 matrix multiplies using cgkit #!/usr/bin/python import profile from cgkit.cgtypes import mat4 def profile_me(): mymat1 = mat4() mymat2 = mat4() multiply_me(mymat1, mymat2) def multiply_me(mat1, mat2): for x in range(100000): out = mat1 * mat2 profile.run('profile_me()')  RESULT OF SAMPLE 1: (100K multiplies in 32.5 seconds) 2700012 function calls in 32.504 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 300000 3.443 0.000 3.443 0.000 :0(isinstance) 500004 2.903 0.000 2.903 0.000 :0(len) 100000 7.734 0.000 14.905 0.000 :0(map) 1 0.003 0.003 0.003 0.003 :0(range) 1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.000 0.000 32.504 32.504 <string>:1(<module>) 100000 5.911 0.000 31.295 0.000 mat4.py:161(__mul__) 100002 4.134 0.000 21.942 0.000 mat4.py:60(__init__) 1600000 7.171 0.000 7.171 0.000 mat4.py:97(<lambda>) 1 1.205 1.205 32.503 32.503 profile3.py:12(multiply_me) 1 0.000 0.000 32.504 32.504 profile3.py:7(profile_me) 1 0.000 0.000 32.504 32.504 profile:0(profile_me()) 0 0.000 0.000 profile:0(profiler)  SAMPLE 2: 100,000 4x4 multiplies using numpy "array" #!/usr/bin/python import profile from numpy import * def profile_me(): mymat1 = array([ [0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0] ]) mymat2 = array([ [0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0] ]) multiply_me(mymat1,mymat2) def multiply_me(mat1, mat2): for x in range(100000): out = dot(mat1,mat2) profile.run('profile_me()')  RESULT OF SAMPLE 2: (100K multiplies in 1.251 sec.) 100008 function calls in 1.261 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 2 0.000 0.000 0.000 0.000 :0(array) 100000 0.742 0.000 0.742 0.000 :0(dot) 1 0.003 0.003 0.003 0.003 :0(range) 1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.000 0.000 1.261 1.261 <string>:1(<module>) 1 0.516 0.516 1.261 1.261 profile6.py:12(multiply_me) 1 0.000 0.000 1.261 1.261 profile6.py:7(profile_me) 1 0.000 0.000 1.261 1.261 profile:0(profile_me()) 0 0.000 0.000 profile:0(profiler)  
From: Matthias Baas <matthias.baas@gm...>  20080901 21:42:13

Bruce Hahne wrote: > So at least for the test code snippets I used, it seems like numpy's > "array" type does matrix multiplication about 25x faster than cgkit. > [...] > 100000 5.911 0.000 31.295 0.000 mat4.py:161(__mul__) > 100002 4.134 0.000 21.942 0.000 mat4.py:60(__init__) > 1600000 7.171 0.000 7.171 0.000 mat4.py:97(<lambda>) These lines refer to a file mat4.py which indicates that you are using the "light" version of cgkit. In that version, all the vector and matrix types are implemented in pure Python which is why they are so much slower than numpy. You have to install the full version of cgkit to get implementations that are written in C++. When I run your example scripts with that version, cgkit is about twice as fast as numpy over here. By the way, you are also comparing different data types. When you construct the numpy arrays you only pass integers to the array functions which will result in an integer array. You should set the type explicitly to float64, so that the data types are actually the same: mymat1 = array([ [0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0] ], dtype=float64)  Matthias  
Sign up for the SourceForge newsletter:
No, thanks