Re: [Numpy-discussion] Vectorizing code, for loops, and all that

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 02/10/06, Travis Oliphant <oli...@ee...> wrote:

> Perhaps those inner 1-d loops could be optimized (using prefetch or
> something) to reduce the number of cache misses on the inner
> computation, and the concept of looping over the largest dimension
> (instead of the last dimension) should be re-considered.

Cache control seems to be the main factor deciding the speed of many
algorithms. Prefectching could make a huge difference, particularly on
NUMA machines (like a dual opteron). I think GCC has a moderately
portable way to request it (though it may be only in beta versions as
yet).

More generally, all the tricks that ATLAS uses to accelerate BLAS
routines would (in principle) be applicable here. The implementation
would be extremely difficult, though, even if all the basic loops
could be expressed in a few primitives.

A. M. Archibald

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

A package for scientific computing with Python

Re: [Numpy-discussion] Vectorizing code, for loops, and all that