From: Eric E. <ems...@ob...> - 2006-07-18 14:25:44
|
Hi, I have a specific quantity to derive from an array, and I am at the moment unable to do it for a too large array because it just takes too long! So I am looking for an advice on how to efficiently compute such a quantity: I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do: result = 0. for i in range(N) : for j in range(i+1,N,1) : result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] - z[i])**2) Of course the procedure written above is very inefficient and I thought of doing: result = 0. for i in range(N) : result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 + (z[i+1:] - z[i])**2) Still, this is quite slow and not workable for very large arrays (> 10^6 floats per array). Any hint on how to speed things up here? Thanks in advance! Eric |
From: Bill B. <wb...@gm...> - 2006-07-18 14:45:38
|
Maybe this will help -- it computes the squared distances between a bunch of points: def dist2(x,c): """ Calculates squared distance between two sets of points. n2 = dist2(x, c) D = DIST2(X, C) takes two matrices of vectors and calculates the squared Euclidean distance between them. Both matrices must be of the same column dimension. If X has M rows and N columns, and C has L rows and N columns, then the result has M rows and L columns. The I, Jth entry is the squared distance from the Ith row of X to the Jth row of C. """ ndata, dimx = x.shape ncentres, dimc = c.shape if dimx != dimc: raise ValueError('Data dimension does not match dimension of centres') n2 = (x*x).sum(1)[:,numpy.newaxis] + (c*c).sum(1) - 2*dot(x,T(c)) # Rounding errors occasionally cause negative entries in n2 #if (n2<0).any(): # n2[n2<0] = 0 return n2 Take 1.0/numpy.sqrt(dist2(V,V)) and from there you can probably get the sum with sum() calls. It's obviously not as efficient as it could be since it'll be computing both halves of a symmetric matrix, but it might be faster than nested Python loops. (The above was adapted from a routine in Netlab). --bb On 7/18/06, Eric Emsellem <ems...@ob...> wrote: > Hi, > > I have a specific quantity to derive from an array, and I am at the > moment unable to do it for a too large array because it just takes too > long! So I am looking for an advice on how to efficiently compute such a > quantity: > > I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do: > > result = 0. > for i in range(N) : > for j in range(i+1,N,1) : > result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] - > z[i])**2) > > > Of course the procedure written above is very inefficient and I thought > of doing: > > result = 0. > for i in range(N) : > result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 + > (z[i+1:] - z[i])**2) > > Still, this is quite slow and not workable for very large arrays (> 10^6 > floats per array). > > Any hint on how to speed things up here? > > Thanks in advance! > > Eric |
From: Perry G. <pe...@st...> - 2006-07-18 15:23:44
|
On Jul 18, 2006, at 10:23 AM, Eric Emsellem wrote: > Hi, > > I have a specific quantity to derive from an array, and I am at the > moment unable to do it for a too large array because it just takes too > long! So I am looking for an advice on how to efficiently compute > such a > quantity: > > I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do: > > result = 0. > for i in range(N) : > for j in range(i+1,N,1) : > result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z > [j] - > z[i])**2) > > > Of course the procedure written above is very inefficient and I > thought > of doing: > > result = 0. > for i in range(N) : > result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 + > (z[i+1:] - z[i])**2) > > Still, this is quite slow and not workable for very large arrays (> > 10^6 > floats per array). > > Any hint on how to speed things up here? > > Thanks in advance! > > Eric Perhaps I'm misunderstanding the last variant but don't you want something like: result = 0. for i in range(N) : result += add.reduce(1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y [i])**2 + (z[i+1:] - z[i])**2)) instead since the expression yields an array with a decreasing size each iteration? But besides that, it seems you are asking to do roughly 10^12 of these computations for 10^6 points. I don't see any way to avoid that given what you are computing. The solution Bill Baxter gives is fine (I think, I haven't looked at it closely), but the usual problem of doing it without any looping is that it requires an enormous amount of memory (~10^12 element arrays) if I'm not mistaken. Since your second example is iterating over large arrays (most of the time, not near the end), I'd be surprised if you can do much better than that (the looping overhead should be negligible for such large arrays). Do you have examples written in other languages that run much faster? I guess I would be surprised to see it possible to do more than a few times faster in any language without some very clever optimizations. Perry |
From: Eric E. <ems...@ob...> - 2006-07-18 15:37:54
|
thanks for the tips. (indeed your "add.reduce" is correct: I just wrote this down too quickly, in the script I have a "sum" included). And yes you are right for the memory issue, so I may just keep the loop in and try to make it work on a fast PC...(or use parallel processes) (is "sum" different than "add.reduce"?) thanks again to both Bill Baxter and Perry Greenfield for their fast (and helpful!) answers. cheers Eric Perry Greenfield wrote: > > On Jul 18, 2006, at 10:23 AM, Eric Emsellem wrote: > >> Hi, >> >> I have a specific quantity to derive from an array, and I am at the >> moment unable to do it for a too large array because it just takes too >> long! So I am looking for an advice on how to efficiently compute such a >> quantity: >> >> I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do: >> >> result = 0. >> for i in range(N) : >> for j in range(i+1,N,1) : >> result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] - >> z[i])**2) >> >> >> Of course the procedure written above is very inefficient and I thought >> of doing: >> >> result = 0. >> for i in range(N) : >> result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 + >> (z[i+1:] - z[i])**2) >> >> Still, this is quite slow and not workable for very large arrays (> 10^6 >> floats per array). >> >> Any hint on how to speed things up here? >> >> Thanks in advance! >> >> Eric > > Perhaps I'm misunderstanding the last variant but don't you want > something like: > > result = 0. > for i in range(N) : > result += add.reduce(1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - > y[i])**2 + > (z[i+1:] - z[i])**2)) > > instead since the expression yields an array with a decreasing size > each iteration? > > But besides that, it seems you are asking to do roughly 10^12 of these > computations for 10^6 points. I don't see any way to avoid that given > what you are computing. The solution Bill Baxter gives is fine (I > think, I haven't looked at it closely), but the usual problem of doing > it without any looping is that it requires an enormous amount of > memory (~10^12 element arrays) if I'm not mistaken. Since your second > example is iterating over large arrays (most of the time, not near the > end), I'd be surprised if you can do much better than that (the > looping overhead should be negligible for such large arrays). Do you > have examples written in other languages that run much faster? I guess > I would be surprised to see it possible to do more than a few times > faster in any language without some very clever optimizations. > > Perry |
From: Sasha <nd...@ma...> - 2006-07-18 16:34:30
|
On 7/18/06, Eric Emsellem <ems...@ob...> wrote: [...] > (is "sum" different than "add.reduce"?) "sum" is a wrapper around ndarray.sum method, while add.reduce is a ufunc method. At the C level they are the same, but "sum" may be a little slower for the small arrays. > python -m timeit -s "from numpy import add, zeros, sum; x = zeros(10)" "sum(x)" 100000 loops, best of 3: 5.53 usec per loop > python -m timeit -s "from numpy import add, zeros, sum; x = zeros(10)" "add.reduce(x)" 100000 loops, best of 3: 2.71 usec per loop In the new code, I would recommend using the ndarray.sum method instead of either of the two. BTW, is this an optimization opportunity: compare > python -m timeit -s "from numpy import add, zeros, sum; x = zeros(10); r=add.reduce" "r(x)" 100000 loops, best of 3: 2.55 usec per loop and > python -m timeit -s "from numpy import zeros; x = zeros(10)" "x.sum()" 100000 loops, best of 3: 3.88 usec per loop ? |
From: Tim H. <tim...@co...> - 2006-07-18 16:13:42
|
Eric Emsellem wrote: > thanks for the tips. (indeed your "add.reduce" is correct: I just wrote > this down too quickly, in the script I have a "sum" included). > > And yes you are right for the memory issue, so I may just keep the loop > in and try to make it work on a fast PC...(or use parallel processes) > > (is "sum" different than "add.reduce"?) > > thanks again to both Bill Baxter and Perry Greenfield for their fast > (and helpful!) answers. > I just wanted to add that there are faster, but considerably complicated ways to attack this class of problems. The one I've looked at in the past was the fast multipole method and I believe there are others. I'm not sure whether these can be implemented efficiently in numpy, but you may want to take a look into this kind of more sophisticated/complicated approach if brute forcing the calculation doesn't work. -tim > cheers > > Eric > > Perry Greenfield wrote: > >> On Jul 18, 2006, at 10:23 AM, Eric Emsellem wrote: >> >> >>> Hi, >>> >>> I have a specific quantity to derive from an array, and I am at the >>> moment unable to do it for a too large array because it just takes too >>> long! So I am looking for an advice on how to efficiently compute such a >>> quantity: >>> >>> I have 3 arrays of N floats (x[...], y[..], z[..]) and I wish to do: >>> >>> result = 0. >>> for i in range(N) : >>> for j in range(i+1,N,1) : >>> result += 1. / sqrt((x[j] - x[i])**2 + (y[j] - y[i])**2 + (z[j] - >>> z[i])**2) >>> >>> >>> Of course the procedure written above is very inefficient and I thought >>> of doing: >>> >>> result = 0. >>> for i in range(N) : >>> result += 1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - y[i])**2 + >>> (z[i+1:] - z[i])**2) >>> >>> Still, this is quite slow and not workable for very large arrays (> 10^6 >>> floats per array). >>> >>> Any hint on how to speed things up here? >>> >>> Thanks in advance! >>> >>> Eric >>> >> Perhaps I'm misunderstanding the last variant but don't you want >> something like: >> >> result = 0. >> for i in range(N) : >> result += add.reduce(1. / sqrt((x[i+1:] - x[i])**2 + (y[i+1:] - >> y[i])**2 + >> (z[i+1:] - z[i])**2)) >> >> instead since the expression yields an array with a decreasing size >> each iteration? >> >> But besides that, it seems you are asking to do roughly 10^12 of these >> computations for 10^6 points. I don't see any way to avoid that given >> what you are computing. The solution Bill Baxter gives is fine (I >> think, I haven't looked at it closely), but the usual problem of doing >> it without any looping is that it requires an enormous amount of >> memory (~10^12 element arrays) if I'm not mistaken. Since your second >> example is iterating over large arrays (most of the time, not near the >> end), I'd be surprised if you can do much better than that (the >> looping overhead should be negligible for such large arrays). Do you >> have examples written in other languages that run much faster? I guess >> I would be surprised to see it possible to do more than a few times >> faster in any language without some very clever optimizations. >> >> Perry >> > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > |
From: Robert K. <rob...@gm...> - 2006-07-18 17:02:34
|
Tim Hochberg wrote: > I just wanted to add that there are faster, but considerably complicated > ways to attack this class of problems. The one I've looked at in the > past was the fast multipole method and I believe there are others. I'm > not sure whether these can be implemented efficiently in numpy, but you > may want to take a look into this kind of more sophisticated/complicated > approach if brute forcing the calculation doesn't work. <teaser> Idesbald Van den Bosch will be giving a talk at SciPy'06 on implementing FMM and the multilevel fast multipole algorithm (MLFMA) using SciPy and weave. http://www.scipy.org/SciPy2006 </teaser> -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Fernando P. <fpe...@gm...> - 2006-07-18 17:25:39
|
On 7/18/06, Tim Hochberg <tim...@co...> wrote: > Eric Emsellem wrote: > > thanks for the tips. (indeed your "add.reduce" is correct: I just wrote > > this down too quickly, in the script I have a "sum" included). > > > > And yes you are right for the memory issue, so I may just keep the loop > > in and try to make it work on a fast PC...(or use parallel processes) > > > > (is "sum" different than "add.reduce"?) > > > > thanks again to both Bill Baxter and Perry Greenfield for their fast > > (and helpful!) answers. > > > I just wanted to add that there are faster, but considerably complicated > ways to attack this class of problems. The one I've looked at in the > past was the fast multipole method and I believe there are others. I'm > not sure whether these can be implemented efficiently in numpy, but you > may want to take a look into this kind of more sophisticated/complicated > approach if brute forcing the calculation doesn't work. Indeed, FMM is the best known method that can turn this O(n^2) problem into O(n*log(n)). As Tim indicates, there is no easy way out of this one. Incidentally, there is a talk about FMM on the scipy'06 schedule, in case you are going to attend. An alternative approach to FMM (conceptually similar in some ways) is described here: http://amath.colorado.edu/faculty/fperez/talks/0604_sanum_mwadap.pdf Unfortunately this isn't exactly a half-hour optimization effort, as the required machinery is pretty heavy duty. And yes, this has all been done in python and runs on current numpy/scipy (though it has Fortran, C and C++ sprinkled as needed). Cheers, f |
From: Tim H. <tim...@co...> - 2006-07-20 23:35:47
|
Fernando Perez wrote: > On 7/18/06, Tim Hochberg <tim...@co...> wrote: >> Eric Emsellem wrote: >> > thanks for the tips. (indeed your "add.reduce" is correct: I just >> wrote >> > this down too quickly, in the script I have a "sum" included). >> > >> > And yes you are right for the memory issue, so I may just keep the >> loop >> > in and try to make it work on a fast PC...(or use parallel processes) >> > >> > (is "sum" different than "add.reduce"?) >> > >> > thanks again to both Bill Baxter and Perry Greenfield for their fast >> > (and helpful!) answers. >> > >> I just wanted to add that there are faster, but considerably complicated >> ways to attack this class of problems. The one I've looked at in the >> past was the fast multipole method and I believe there are others. I'm >> not sure whether these can be implemented efficiently in numpy, but you >> may want to take a look into this kind of more sophisticated/complicated >> approach if brute forcing the calculation doesn't work. > > Indeed, FMM is the best known method that can turn this O(n^2) problem > into O(n*log(n)). As Tim indicates, there is no easy way out of this > one. Incidentally, there is a talk about FMM on the scipy'06 > schedule, in case you are going to attend. > > An alternative approach to FMM (conceptually similar in some ways) is > described here: > > http://amath.colorado.edu/faculty/fperez/talks/0604_sanum_mwadap.pdf > > Unfortunately this isn't exactly a half-hour optimization effort, as > the required machinery is pretty heavy duty. And yes, this has all > been done in python and runs on current numpy/scipy (though it has > Fortran, C and C++ sprinkled as needed). I'm interested in hearing in what situations people end up dropping into C/C++ or Fortran. It would be interesting to see if general solutions could be found for making some of them fast enough at the Python level. Here are some situations that I've run into where numpy has trouble computing results efficiently. 1. The value at a given point is the computed one of two (or more) ways depending on some condition. The subcomputations are expensive, so one does not want to compute them both and then chose a result using 'where'. I think I have a decent way to attack this at the Python level using argsort: basically, sort all of the operands, find the boundaries between the regions where different computations are done, compute the expensive subcomputations on the appropriate blocks and then unsort the data. It would be interesting to see if this could be automated in a clean way, perhaps within numexpr. 2. Generalized reduce. We want to reduce an array using an arbitrary function f(x, y) that is not a standard ufunc. In the particular case I've run into, x and y would be vectors or perhaps record arrays, and the functions would couple the various records when producing the output. I don't see how this case can be handled efficiently at the Python level, short of using something like Weave to produce a custom ufunc and calling reduce on it. I don't see how something like numexpr could handle it, unfortunately, since it gains it's speed by operating on blocks of data. I'm sure there are other cases that I've run into, but that's all I'm coming up with right now. -tim |