|
From: Travis O. <oli...@ie...> - 2006-02-10 06:27:59
|
Sasha wrote: >Well, my results are different. > >SVN r2087: > > >>python -m timeit -s "from numpy import arange" "arange(10000.0)" >> >> >10000 loops, best of 3: 21.1 usec per loop > >SVN r2088: > > >>python -m timeit -s "from numpy import arange" "arange(10000.0)" >> >> >10000 loops, best of 3: 25.6 usec per loop > >I am using gcc version 3.3.4 with the following flags: -msse2 >-mfpmath=sse -fno-strict-aliasing -DNDEBUG -g -O3. > >The timing is consistent with the change in the DOUBLE_fill loop: > >r2087: > 1b8f0: f2 0f 11 08 movsd %xmm1,(%eax) > 1b8f4: f2 0f 58 ca addsd %xmm2,%xmm1 > 1b8f8: 83 c0 08 add $0x8,%eax > 1b8fb: 39 c8 cmp %ecx,%eax > 1b8fd: 72 f1 jb 1b8f0 <DOUBLE_fill+0x30> > >r2088: > 1b9d0: f2 0f 2a c2 cvtsi2sd %edx,%xmm0 > 1b9d4: 42 inc %edx > 1b9d5: f2 0f 59 c1 mulsd %xmm1,%xmm0 > 1b9d9: f2 0f 58 c2 addsd %xmm2,%xmm0 > 1b9dd: f2 0f 11 00 movsd %xmm0,(%eax) > 1b9e1: 83 c0 08 add $0x8,%eax > 1b9e4: 39 ca cmp %ecx,%edx > 1b9e6: 7c e8 jl 1b9d0 <DOUBLE_fill+0x20> > > > Nice to see some real hacking on this list :-) >My change may be worth commiting because C code is shorter and >arguably more understandable (at least by Fortran addicts :-). >Travis? > > Yes, I think it's worth submitting. Most of the suggestions for pointer-arithmetic for fast C-code were developed when processors spent more time computing than fetching memory. Now it seem it's all about fetching memory intelligently. The buffer[i]= style is even recommended according to the AMD-optimization book Sasha pointed out. So, I say go ahead unless somebody can point out something we are missing... -Travis |