Thread: [Numpy-discussion] fast way of doing "cross-multiplications" ?

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

[Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Eric E. <ems...@ob...> - 2006-07-18 14:25:44

Hi,

I have a specific quantity to derive from an array, and I am at the
moment unable to do it for a too large array because it just takes too
long! So I am looking for an advice on how to efficiently compute such a
quantity:

I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do:

result = 0.
for i in range(N) :
   for j in range(i+1,N,1) :
      result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] -
z[i])**2)


Of course the procedure written above is very inefficient and I thought
of doing:

result = 0.
for i in range(N) :
   result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 +
(z[i+1:] - z[i])**2)

Still, this is quite slow and not workable for very large arrays (> 10^6
floats per array).

Any hint on how to speed things up here?

Thanks in advance!

Eric

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Bill B. <wb...@gm...> - 2006-07-18 14:45:38

Maybe this will help -- it computes the squared distances between a
bunch of points:

def dist2(x,c):
    """
    Calculates squared distance between two sets of points.

    n2 = dist2(x, c)
    D = DIST2(X, C) takes two matrices of vectors and calculates the
    squared Euclidean distance between them.  Both matrices must be of
    the same column dimension.  If X has M rows and N columns, and C has
    L rows and N columns, then the result has M rows and L columns.  The
    I, Jth entry is the  squared distance from the Ith row of X to the
    Jth row of C.
    """
    ndata, dimx = x.shape
    ncentres, dimc = c.shape
    if dimx != dimc:
        raise ValueError('Data dimension does not match dimension of centres')

    n2 = (x*x).sum(1)[:,numpy.newaxis] + (c*c).sum(1) - 2*dot(x,T(c))

    # Rounding errors occasionally cause negative entries in n2
    #if (n2<0).any():
    #   n2[n2<0] = 0

    return n2

Take 1.0/numpy.sqrt(dist2(V,V)) and from there you can probably get
the sum with sum() calls.  It's obviously not as efficient as it could
be since it'll be computing both halves of a symmetric matrix, but it
might be faster than nested Python loops.
(The above was adapted from a routine in Netlab).

--bb

On 7/18/06, Eric Emsellem <ems...@ob...> wrote:
> Hi,
>
> I have a specific quantity to derive from an array, and I am at the
> moment unable to do it for a too large array because it just takes too
> long! So I am looking for an advice on how to efficiently compute such a
> quantity:
>
> I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do:
>
> result = 0.
> for i in range(N) :
>    for j in range(i+1,N,1) :
>       result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] -
> z[i])**2)
>
>
> Of course the procedure written above is very inefficient and I thought
> of doing:
>
> result = 0.
> for i in range(N) :
>    result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 +
> (z[i+1:] - z[i])**2)
>
> Still, this is quite slow and not workable for very large arrays (> 10^6
> floats per array).
>
> Any hint on how to speed things up here?
>
> Thanks in advance!
>
> Eric

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Perry G. <pe...@st...> - 2006-07-18 15:23:44

On Jul 18, 2006, at 10:23 AM, Eric Emsellem wrote:

> Hi,
>
> I have a specific quantity to derive from an array, and I am at the
> moment unable to do it for a too large array because it just takes too
> long! So I am looking for an advice on how to efficiently compute  
> such a
> quantity:
>
> I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do:
>
> result = 0.
> for i in range(N) :
>    for j in range(i+1,N,1) :
>       result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z 
> [j] -
> z[i])**2)
>
>
> Of course the procedure written above is very inefficient and I  
> thought
> of doing:
>
> result = 0.
> for i in range(N) :
>    result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 +
> (z[i+1:] - z[i])**2)
>
> Still, this is quite slow and not workable for very large arrays (>  
> 10^6
> floats per array).
>
> Any hint on how to speed things up here?
>
> Thanks in advance!
>
> Eric

Perhaps I'm misunderstanding the last variant but don't you want  
something like:

result = 0.
for i in range(N) :
    result += add.reduce(1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y 
[i])**2 +
(z[i+1:] - z[i])**2))

instead since the expression yields an array with a decreasing size  
each iteration?

But besides that, it seems you are asking to do roughly 10^12 of  
these computations  for 10^6 points. I don't see any way to avoid  
that given what you are computing. The solution Bill Baxter gives is  
fine (I think, I haven't looked at it closely), but the usual problem  
of doing it without any looping is that it requires an enormous  
amount of memory (~10^12 element arrays) if I'm not mistaken. Since  
your second example is iterating over large arrays (most of the time,  
not near the end), I'd be surprised if you can do much better than  
that (the looping overhead should be negligible for such large  
arrays). Do you have examples written in other languages that run  
much faster? I guess I would be surprised to see it possible to do  
more than a few times faster in any language without some very clever  
optimizations.

Perry

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Eric E. <ems...@ob...> - 2006-07-18 15:37:54

thanks for the tips. (indeed your "add.reduce" is correct: I just wrote
this down too quickly, in the script I have a "sum" included).

And yes you are right for the memory issue, so I may just keep the loop
in and try to make it work on a fast PC...(or use parallel processes)

(is "sum" different than "add.reduce"?)

thanks again to both Bill Baxter and Perry Greenfield for their fast
(and helpful!) answers.

cheers

Eric

Perry Greenfield wrote:
>
> On Jul 18, 2006, at 10:23 AM, Eric Emsellem wrote:
>
>> Hi,
>>
>> I have a specific quantity to derive from an array, and I am at the
>> moment unable to do it for a too large array because it just takes too
>> long! So I am looking for an advice on how to efficiently compute such a
>> quantity:
>>
>> I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do:
>>
>> result = 0.
>> for i in range(N) :
>>    for j in range(i+1,N,1) :
>>       result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] -
>> z[i])**2)
>>
>>
>> Of course the procedure written above is very inefficient and I thought
>> of doing:
>>
>> result = 0.
>> for i in range(N) :
>>    result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 +
>> (z[i+1:] - z[i])**2)
>>
>> Still, this is quite slow and not workable for very large arrays (> 10^6
>> floats per array).
>>
>> Any hint on how to speed things up here?
>>
>> Thanks in advance!
>>
>> Eric
>
> Perhaps I'm misunderstanding the last variant but don't you want
> something like:
>
> result = 0.
> for i in range(N) :
>    result += add.reduce(1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] -
> y[i])**2 +
> (z[i+1:] - z[i])**2))
>
> instead since the expression yields an array with a decreasing size
> each iteration?
>
> But besides that, it seems you are asking to do roughly 10^12 of these
> computations  for 10^6 points. I don't see any way to avoid that given
> what you are computing. The solution Bill Baxter gives is fine (I
> think, I haven't looked at it closely), but the usual problem of doing
> it without any looping is that it requires an enormous amount of
> memory (~10^12 element arrays) if I'm not mistaken. Since your second
> example is iterating over large arrays (most of the time, not near the
> end), I'd be surprised if you can do much better than that (the
> looping overhead should be negligible for such large arrays). Do you
> have examples written in other languages that run much faster? I guess
> I would be surprised to see it possible to do more than a few times
> faster in any language without some very clever optimizations.
>
> Perry

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Sasha <nd...@ma...> - 2006-07-18 16:34:30

On 7/18/06, Eric Emsellem <ems...@ob...> wrote:
[...]
> (is "sum" different than "add.reduce"?)

"sum" is a wrapper around ndarray.sum method, while add.reduce is a
ufunc method.  At the C level they are the same, but "sum" may be a
little slower for the small arrays.

> python -m timeit -s "from numpy import add, zeros, sum; x = zeros(10)" "sum(x)"
100000 loops, best of 3: 5.53 usec per loop
> python -m timeit -s "from numpy import add, zeros, sum; x = zeros(10)" "add.reduce(x)"
100000 loops, best of 3: 2.71 usec per loop

In the new code, I would recommend using the ndarray.sum method
instead of either of the two.


BTW, is this an optimization opportunity: compare

> python -m timeit -s "from numpy import add, zeros, sum; x = zeros(10); r=add.reduce" "r(x)"
100000 loops, best of 3: 2.55 usec per loop

and

> python -m timeit -s "from numpy import zeros; x = zeros(10)" "x.sum()"
100000 loops, best of 3: 3.88 usec per loop

?

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Tim H. <tim...@co...> - 2006-07-18 16:13:42

Eric Emsellem wrote:
> thanks for the tips. (indeed your "add.reduce" is correct: I just wrote
> this down too quickly, in the script I have a "sum" included).
>
> And yes you are right for the memory issue, so I may just keep the loop
> in and try to make it work on a fast PC...(or use parallel processes)
>
> (is "sum" different than "add.reduce"?)
>
> thanks again to both Bill Baxter and Perry Greenfield for their fast
> (and helpful!) answers.
>   
I just wanted to add that there are faster, but considerably complicated 
ways to attack this class of problems. The one I've looked at in the 
past was the fast multipole method and I believe there are others. I'm 
not sure whether these can be implemented efficiently in numpy, but you 
may want to take a look into this kind of more sophisticated/complicated 
approach if brute forcing the calculation doesn't work.

-tim


> cheers
>
> Eric
>
> Perry Greenfield wrote:
>   
>> On Jul 18, 2006, at 10:23 AM, Eric Emsellem wrote:
>>
>>     
>>> Hi,
>>>
>>> I have a specific quantity to derive from an array, and I am at the
>>> moment unable to do it for a too large array because it just takes too
>>> long! So I am looking for an advice on how to efficiently compute such a
>>> quantity:
>>>
>>> I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do:
>>>
>>> result = 0.
>>> for i in range(N) :
>>>    for j in range(i+1,N,1) :
>>>       result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] -
>>> z[i])**2)
>>>
>>>
>>> Of course the procedure written above is very inefficient and I thought
>>> of doing:
>>>
>>> result = 0.
>>> for i in range(N) :
>>>    result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 +
>>> (z[i+1:] - z[i])**2)
>>>
>>> Still, this is quite slow and not workable for very large arrays (> 10^6
>>> floats per array).
>>>
>>> Any hint on how to speed things up here?
>>>
>>> Thanks in advance!
>>>
>>> Eric
>>>       
>> Perhaps I'm misunderstanding the last variant but don't you want
>> something like:
>>
>> result = 0.
>> for i in range(N) :
>>    result += add.reduce(1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] -
>> y[i])**2 +
>> (z[i+1:] - z[i])**2))
>>
>> instead since the expression yields an array with a decreasing size
>> each iteration?
>>
>> But besides that, it seems you are asking to do roughly 10^12 of these
>> computations  for 10^6 points. I don't see any way to avoid that given
>> what you are computing. The solution Bill Baxter gives is fine (I
>> think, I haven't looked at it closely), but the usual problem of doing
>> it without any looping is that it requires an enormous amount of
>> memory (~10^12 element arrays) if I'm not mistaken. Since your second
>> example is iterating over large arrays (most of the time, not near the
>> end), I'd be surprised if you can do much better than that (the
>> looping overhead should be negligible for such large arrays). Do you
>> have examples written in other languages that run much faster? I guess
>> I would be surprised to see it possible to do more than a few times
>> faster in any language without some very clever optimizations.
>>
>> Perry
>>     
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Robert K. <rob...@gm...> - 2006-07-18 17:02:34

Tim Hochberg wrote:
> I just wanted to add that there are faster, but considerably complicated 
> ways to attack this class of problems. The one I've looked at in the 
> past was the fast multipole method and I believe there are others. I'm 
> not sure whether these can be implemented efficiently in numpy, but you 
> may want to take a look into this kind of more sophisticated/complicated 
> approach if brute forcing the calculation doesn't work.

<teaser>
Idesbald Van den Bosch will be giving a talk at SciPy'06 on implementing FMM and 
the multilevel fast multipole algorithm (MLFMA) using SciPy and weave.

http://www.scipy.org/SciPy2006
</teaser>

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Fernando P. <fpe...@gm...> - 2006-07-18 17:25:39

On 7/18/06, Tim Hochberg <tim...@co...> wrote:
> Eric Emsellem wrote:
> > thanks for the tips. (indeed your "add.reduce" is correct: I just wrote
> > this down too quickly, in the script I have a "sum" included).
> >
> > And yes you are right for the memory issue, so I may just keep the loop
> > in and try to make it work on a fast PC...(or use parallel processes)
> >
> > (is "sum" different than "add.reduce"?)
> >
> > thanks again to both Bill Baxter and Perry Greenfield for their fast
> > (and helpful!) answers.
> >
> I just wanted to add that there are faster, but considerably complicated
> ways to attack this class of problems. The one I've looked at in the
> past was the fast multipole method and I believe there are others. I'm
> not sure whether these can be implemented efficiently in numpy, but you
> may want to take a look into this kind of more sophisticated/complicated
> approach if brute forcing the calculation doesn't work.

Indeed, FMM is the best known method that can turn this O(n^2) problem
into O(n*log(n)).  As Tim indicates, there is no easy way out of this
one.  Incidentally, there is a talk about FMM on the scipy'06
schedule, in case you are going to attend.

An alternative approach to FMM (conceptually similar in some ways) is
described here:

http://amath.colorado.edu/faculty/fperez/talks/0604_sanum_mwadap.pdf

Unfortunately this isn't exactly a half-hour optimization effort, as
the required machinery is pretty heavy duty.  And yes, this has all
been done in python and runs on current numpy/scipy (though it has
Fortran, C and C++ sprinkled as needed).

Cheers,

f

Re: [Numpy-discussion] fast way of doing "cross-multiplications" ?

From: Tim H. <tim...@co...> - 2006-07-20 23:35:47

Fernando Perez wrote:
> On 7/18/06, Tim Hochberg <tim...@co...> wrote:
>> Eric Emsellem wrote:
>> > thanks for the tips. (indeed your "add.reduce" is correct: I just 
>> wrote
>> > this down too quickly, in the script I have a "sum" included).
>> >
>> > And yes you are right for the memory issue, so I may just keep the 
>> loop
>> > in and try to make it work on a fast PC...(or use parallel processes)
>> >
>> > (is "sum" different than "add.reduce"?)
>> >
>> > thanks again to both Bill Baxter and Perry Greenfield for their fast
>> > (and helpful!) answers.
>> >
>> I just wanted to add that there are faster, but considerably complicated
>> ways to attack this class of problems. The one I've looked at in the
>> past was the fast multipole method and I believe there are others. I'm
>> not sure whether these can be implemented efficiently in numpy, but you
>> may want to take a look into this kind of more sophisticated/complicated
>> approach if brute forcing the calculation doesn't work.
>
> Indeed, FMM is the best known method that can turn this O(n^2) problem
> into O(n*log(n)).  As Tim indicates, there is no easy way out of this
> one.  Incidentally, there is a talk about FMM on the scipy'06
> schedule, in case you are going to attend.
>
> An alternative approach to FMM (conceptually similar in some ways) is
> described here:
>
> http://amath.colorado.edu/faculty/fperez/talks/0604_sanum_mwadap.pdf
>
> Unfortunately this isn't exactly a half-hour optimization effort, as
> the required machinery is pretty heavy duty.  And yes, this has all
> been done in python and runs on current numpy/scipy (though it has
> Fortran, C and C++ sprinkled as needed).
I'm interested in hearing in what situations people end up dropping into 
C/C++ or Fortran. It would be interesting to see if general solutions 
could be found for making some of them fast enough at the Python level. 
Here are some situations that I've run into where numpy has trouble 
computing results efficiently.

   1. The value at a given point is the computed one of two (or more)
      ways depending on some condition. The subcomputations are
      expensive, so one does not want to compute them both and then
      chose a result using 'where'.  I think I have a decent way to
      attack this at the Python level using argsort: basically, sort all
      of the operands, find the boundaries between the regions where
      different computations are done, compute the expensive
      subcomputations on the appropriate blocks and then unsort the
      data. It would be interesting to see if this could be automated in
      a clean way, perhaps within numexpr.
   2. Generalized reduce. We want to reduce an array using an arbitrary
      function f(x, y) that is not a standard ufunc. In the particular
      case I've run into, x and y would be vectors or perhaps record
      arrays, and the functions would couple the various records when
      producing the output.  I don't see how this case can be handled
      efficiently at the Python level, short of using something like
      Weave to produce a custom ufunc and calling reduce on it. I don't
      see how something like numexpr could handle it, unfortunately,
      since it gains it's speed by operating on blocks of data.

I'm sure there are other cases that I've run into, but that's all I'm 
coming up with right now.

-tim