numpy-discussion Mailing List for Numerical Python (Page 423)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> From: cbarker@localhost.localdomain
[mailto:cbarker@localhost.localdomain]
> 
> eric jones wrote:
> > The default axis choice influences how people choose to lay out
their
> > data in arrays.  If the default is to sum down columns, then users
lay
> > out their data so that this is the order of computation.
> 
> This is absolutely true. I definitely choose my data layout to that
the
> various rank reducing operators do what I want. Another reason to have
> consistency. So I don't really care which way is default, so the
default
> might as well be the better performing option.
> 
> Of course, compatibility with previous versions is helpful
too...arrrgg!
> 
> What kind of a performance difference are we talking here anyway?

Guess I ought to test instead of just saying it is so...

I ran the following test of summing 200 sets of 10000 
numbers.  I expected a speed-up of about 2... I didn't get it.  They 
are pretty much the same speed on my machine.?? (more later)

C:\WINDOWS\system32>python
ActivePython 2.2.1 Build 222 (ActiveState Corp.) based on
Python 2.2.1 (#34, Apr 15 2002, 09:51:39) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from Numeric import *
>>> import time
>>> a = ones((10000,200),Float) * arange(10000)[:,NewAxis]
>>> b = ones((200,10000),Float) * arange(10000)[NewAxis,:]
>>> t1 = time.clock();x=sum(a,axis=0);t2 = time.clock();print t2-t1
0.0772411018719
>>> t1 = time.clock();x=sum(b,axis=-1);t2 = time.clock();print t2-t1
0.079615705348

I also tried FFT, and did see a difference -- a speed up of 1.5+:

>>> q = ones((1024,1024),Float)
>>> t1 = time.clock();x = FFT.fft(q,axis=0);t2 = time.clock();print
t2-t1
0.907373143793
>>> t1 = time.clock();x= FFT.fft(q,axis=-1);t2 = time.clock();print
t2-t1
0.581641800843
>>> .907/.581
1.5611015490533564

Same in scipy

>>> from scipy import *
>>> a = ones((1024,1024),Float)
>>> import time
>>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1
0.870259488287
>>> t1 = time.clock(); q = fft(a,axis=-1); t2 = time.clock();print t2-t1
0.489512214541
>>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1
0.849266317367
>>> .849/.489
1.7361963190184049

So why is sum() the same speed for both cases? I don't know.  I wrote
a quick C program that is similar to how Numeric loops work, and I saw 
about a factor of 4 improvement by summing rows instead columns:

C:\home\eric\wrk\axis_speed>gcc -O2 axis.c
C:\home\eric\wrk\axis_speed>a
summing rows (sec): 0.040000
summing columns (sec): 0.160000
pass

These numbers are more like what I expected to see in the Numeric tests,

but they are strange when compared to the Numeric timings -- the row sum

is twice as fast as Numeric while the column sum is twice as slow.  
Because all the work is done in C and we're summing reasonably long 
arrays, the Numeric and C versions should be roughly the same speed.  
I can understand why summing rows is twice as fast in my C routine -- 
the Numeric loop code is not going to win awards for being optimal.  
What I don't understand is why the column summation is twice as slow in
my C code as in Numeric.  This should not be.  I've posted it below in 
case someone can enlighten me.

I think in general, you should see a speed up of 1.5+ when the summing
over the "faster" axis.  This holds true for fft in Python and my sum
in C.  As to why I don't in Numeric's sum(), I'm not sure.  It is 
certainly true that non-strided access makes the best use of 
cache and *usually* is faster.

eric

------------------------------------------------------------------------
--
#include <malloc.h>
#include <time.h>

int main()
{
    double *a, *sum1, *sum2;
    int i, j, si, sj, ind, I, J;
    int small=200, big=10000;
    time_t t1, t2;

    I = small;
    J = big;
    si = big;
    sj = 1;
    a = (double*)malloc(I*J*sizeof(double));
    sum1 = (double*)malloc(small*sizeof(double));
    sum2 = (double*)malloc(small*sizeof(double));

    //set memory
    for(i = 0; i < I; i++)
    {
        sum1[i] = 0;
        sum2[i] = 0;
        ind = si * i;
        for(j = 0; j < J; j++)
        {
            a[ind] = (double)j;            
            ind += sj;            
        }
        ind += si;
    }

    t1 = clock();
    for(i = 0; i < I; i++)
    {
        sum1[i] = 0;
        ind = si * i;
        for(j = 0; j < J; j++)
        {
            sum1[i] += a[ind];            
            ind += sj;            
        }
        ind += si;
    }
    t2 = clock();
    printf("summing rows (sec): %f\n", (t2-t1)/(float)CLOCKS_PER_SEC);

    I = big;
    J = small;
    sj = big;
    si = 1;
    t1 = clock();

    //set memory
    for(i = 0; i < I; i++)
    {        
        ind = si * i;
        for(j = 0; j < J; j++)
        {
            a[ind] = (double)i;            
            ind += sj;            
        }
        ind += si;
    }

    for(j = 0; j < J; j++)
    {
        sum2[j] = 0;
        ind = sj * j;
        for(i = 0; i < I; i++)
        {
            sum2[j] += a[ind];               
            ind += si;            
        }
    }
    t2 = clock();
    printf("summing columns (sec): %f\n",
(t2-t1)/(float)CLOCKS_PER_SEC);

    for (i=0; i < small; i++)
    {
        if(sum1[i] != sum2[i])
            printf("failure %d, %f %f\n", i, sum1[i], sum2[i]);
    }            
    printf("pass %f\n", sum1[0]);
    return 0;
}

2000	Jan (8)	Feb (49)	Mar (48)	Apr (28)	May (37)	Jun (28)	Jul (16)	Aug (16)	Sep (44)	Oct (61)	Nov (31)	Dec (24)
2001	Jan (56)	Feb (54)	Mar (41)	Apr (71)	May (48)	Jun (32)	Jul (53)	Aug (91)	Sep (56)	Oct (33)	Nov (81)	Dec (54)
2002	Jan (72)	Feb (37)	Mar (126)	Apr (62)	May (34)	Jun (124)	Jul (36)	Aug (34)	Sep (60)	Oct (37)	Nov (23)	Dec (104)
2003	Jan (110)	Feb (73)	Mar (42)	Apr (8)	May (76)	Jun (14)	Jul (52)	Aug (26)	Sep (108)	Oct (82)	Nov (89)	Dec (94)
2004	Jan (117)	Feb (86)	Mar (75)	Apr (55)	May (75)	Jun (160)	Jul (152)	Aug (86)	Sep (75)	Oct (134)	Nov (62)	Dec (60)
2005	Jan (187)	Feb (318)	Mar (296)	Apr (205)	May (84)	Jun (63)	Jul (122)	Aug (59)	Sep (66)	Oct (148)	Nov (120)	Dec (70)
2006	Jan (460)	Feb (683)	Mar (589)	Apr (559)	May (445)	Jun (712)	Jul (815)	Aug (663)	Sep (559)	Oct (930)	Nov (373)	Dec

numpy-discussion Mailing List for Numerical Python (Page 423)

A package for scientific computing with Python

numpy-discussion — Discussion list for all users of Numerical Python