|
From: Tim H. <tim...@co...> - 2006-02-17 21:18:05
|
David M. Cooke wrote:
>Tim Hochberg <tim...@co...> writes:
>
>
>
>>Here's a little progress report: I now have A**2 running as fast as
>>square(A). This is done by special casing stuff in array_power so that
>>A**2 acutally calls square(A) instead of going through power(A,2).
>>Things still need a bunch of cleaning up (in fact right now A**1
>>segfaults, but I know why and it should be an easy fix). However, I
>>think I've discovered why you couldn't get your special cased power to
>>run as fast for A**2 as square(A) or A*A. It appears that the overhead
>>of creating a new array object from the integer 2 is the bottleneck. I
>>was running into the same mysterious overhead, even when dispatching
>>from array_power, until I special cased on PyInt to avoid the array
>>creation in that case.
>>
>>
>
>Hmm, if that's true about the overhead, that'll hit all computations
>of the type op(x, <some scalar>). Something to look at. That ufunc code
>for casting the arguments is pretty big and hairy, so I'm not going to
>look at right now :-)
>
>
Well, it's just a guess based on the fact that the extra time went away
when I stopped calling PyArray_EnsureArray(o2) for python ints. For what
it's worth, numpy scalars seem to have much less overhead. As indicated
below (note that numpy scalars are not currently special cased like
PyInts are). The overhead from PyInts was closer to 75% versus about 15%
for numpy scalars. Of course, the percentage of overhead is going to go
up for smaller arrays.
>>> Timer('a**2', 'from numpy import arange;a = arange(10000.); b =
a[2]').timeit(10000)
0.28345055027943999
>>> Timer('a**b', 'from numpy import arange;a = arange(10000.); b =
a[2]').timeit(10000)
0.32190487897204889
>>> Timer('a*a', 'from numpy import arange;a = arange(10000.); b =
a[2]').timeit(10000)
0.27305732991204223
>>> Timer('square(a)', 'from numpy import arange, square;a =
arange(10000.); b = a[2]').timeit(10000)
0.27989618792332749
-tim
|