From: Perry G. <pe...@st...> - 2003-02-11 20:15:22
|
Tim Hochberg writes: > Overhead (c) Overhead (nc) > TimePerElement (c) TimePerElement (nc) > NumPy 10 us 10 > us 85 ps 95 ps > NumArray 200 us 530 us > 45 ps 135 ps > Psymeric 50 us 65 > us 80 ps 80 ps > > > The times shown above are for Float64s and are pretty approximate, and > they happen to be a particularly favorable array shape for Psymeric. I > have seen pymeric as much as 50% slower than NumPy for large arrays of > certain shapes. > > The overhead for NumArray is surprisingly large. After doing this > experiment I'm certainly more sympathetic to Konrad wanting less > overhead for NumArray before he adopts it. > Wow! Do you really mean picoseconds? I never suspected that either Numeric or numarray were that fast. ;-) Anyway, this issue is timely [Err...]. As it turns out we started looking at ways of improving small array performance a couple weeks ago and are coming closer to trying out an approach that should reduce the overhead significantly. But I have some questions about your benchmarks. Could you show me the code that is used to generate the above timings? In particular I'm interested in the kinds of arrays that are being operated on. It turns out that that the numarray overhead depends on more than just contiguity and it isn't obvious to me which case you are testing. For example, Todd's benchmarks indicate that numarray's overhead is about a factor of 5 larger than numpy when the input arrays are contiguous and of the same type. On the other hand, if the array is not contiguous or requires a type conversion, the overhead is much larger. (Also, these cases require blocking loops over large arrays; we have done nothing yet to optimize the block size or the speed of that loop.) If you are doing the benchmark on contiguous, same type arrays, I'd like to get a copy of the benchmark program to try to see where the disagreement arises. The very preliminary indications are that we should be able to make numarray overheads approximately 3 times higher for all ufunc cases. That's still slower, but not by a factor of 20 as shown above. How much work it would take to reduce it further is unclear (the main bottleneck at that point appears to be how long it takes to create new output arrays) We are still mainly in the analysis and design phase of how to improve performance for small arrays and block looping. We believe that this first step will not require moving very much of the existing Python code into C (but some will be). Hopefully we will have some working code in a couple weeks. Thanks, Perry |