From: Perry G. <pe...@st...> - 2003-02-05 15:05:44
|
Tim Hochberg writes: > I was inspired by Armin's latest Psyco version to try and see how well > one could do with NumPy/NumArray implemented in Psycotic Python. I wrote > a bare bones, pure Python, Numeric array class loosely based on Jnumeric > (which in turn was loosely based on Numeric). The buffer is just > Python's array.array. At the moment, all that one can do to the arrays > is add and index them and the code is still a bit of a mess. I plan to > clean things up over the next week in my copius free time <0.999 wink> > and at that point it should be easy add the remaining operations. > > I benchmarked this code, which I'm calling Psymeric for the moment, > against NumPy and Numarray to see how it did. I used a variety of array > sizes, but mostly relatively large arrays of shape (500,100) and of type > Float64 and Int32 (mixed and with consistent types) as well as scalar > values. Looking at the benchmark data one comes to three main conclusions: > * For small arrays NumPy always wins. Both Numarray and Psymeric have > much larger overhead. > * For large, contiguouse arrays, Numarray is about twice as fast as > either of the other two. > * For large, noncontiguous arrays, Psymeric and NumPy are ~20% faster > than Numarray > The impressive thing is that Psymeric is generally slightly faster than > NumPy when adding two arrays. It's slightly slower (~10%) when adding an > array and a scalar although I suspect that could be fixed by some > special casing a la Numarray. Adding two (500,100) arrays of type > Float64 together results in the following timings: > psymeric numpy numarray > contiguous 0.0034 s 0.0038 s 0.0019 s > stride-2 0.0020 s 0.0023 s 0.0033 s > > I'm not sure if this is important, but it is an impressive demonstration > of Psyco! More later when I get the code a bit more cleaned up. > > -tim > 0.002355 > > 0.002355 > The "psymeric" results are indeed interesting. However, I'd like to make some remarks about numarray benchmarks. At this stage, most of the focus has been on large, contiguous array performance (and as can be seen that is where numarray does best). There are a number of other improvements that can and will be made to numarray performance so some of the other benchmarks are bound to improve (how much is uncertain). For example, the current behavior with strided arrays results in looping over subblocks of the array, and that looping is done on relatively small blocks in Python. We haven't done any tuning yet to see what the optimum size of block should be (it may be machine dependent as well), and it is likely that the loop will eventually be moved into C. Small array performance should improve quite a bit, we are looking into how to do that now and should have a better idea soon of whether we can beat Numeric's performance or not. But "psymeric" approach raises an obvious question (implied I guess, but not explicitly stated). With Psyco, is there a need for Numeric or numarray at all? I haven't thought this through in great detail, but at least one issue seems tough to address in this approach, and that is handling numeric types not supported by Python (e.g., Int8, Int16 UInt16, Float32, etc.). Are you offering the possiblity of the "pysmeric" approach as being the right way to go, and if so, how would you handle this issue? On the other hand, there are lots of algorithms that cannot be handled well with array manipulations. It would seem that psyco would be a natural alternative in such cases (as long as one is content to use Float64 or Int32), but it isn't obivious that these require arrays as anything but data structures (e.g. places to obtain and store scalars). Perry Greenfield |