From: Sebastian B. <seb...@gm...> - 2006-06-16 08:56:24
|
Hi, I'm working with NumPy/SciPy on some algorithms and i've run into some important speed differences wrt Matlab 7. I've narrowed the main speed problem down to the operation of finding the euclidean distance between two matrices that share one dimension rank (dist in Matlab): Python: def dtest(): A = random( [4,2]) B = random( [1000,2]) d = zeros([4, 1000], dtype='f') for i in range(4): for j in range(1000): d[i, j] = sqrt( sum( (A[i] - B[j])**2 ) ) return d Matlab: A = rand( [4,2]) B = rand( [1000,2]) d = dist(A, B') Running both of these 100 times, I've found the python version to run between 10-20 times slower. My question is if there is a faster way to do this? Perhaps I'm not using the correct functions/structures? Or this is as good as it gets? Thanks on beforehand, Sebastian Beca Department of Computer Science Engineering University of Chile PD: I'm using NumPy 0.9.8, SciPy 0.4.8. I also understand I have ATLAS, BLAS and LAPACK all installed, but I havn't confirmed that. |
From: Bruce S. <bso...@gm...> - 2006-06-16 14:20:45
|
Hi, Please run the exact same code in Matlab that you are running in NumPy. Many of Matlab functions are very highly optimized so these are provided as binary functions. I think that you are running into this so you are not doing the correct comparison So the ways around it are to write an extension in C or Fortran, use Pysco etc if possible, and vectorize your algorithm to remove the loops (especially the inner one). Bruce On 6/14/06, Sebastian Beca <seb...@gm...> wrote: > Hi, > I'm working with NumPy/SciPy on some algorithms and i've run into some > important speed differences wrt Matlab 7. I've narrowed the main speed > problem down to the operation of finding the euclidean distance > between two matrices that share one dimension rank (dist in Matlab): > > Python: > def dtest(): > A = random( [4,2]) > B = random( [1000,2]) > > d = zeros([4, 1000], dtype='f') > for i in range(4): > for j in range(1000): > d[i, j] = sqrt( sum( (A[i] - B[j])**2 ) ) > return d > > Matlab: > A = rand( [4,2]) > B = rand( [1000,2]) > d = dist(A, B') > > Running both of these 100 times, I've found the python version to run > between 10-20 times slower. My question is if there is a faster way to > do this? Perhaps I'm not using the correct functions/structures? Or > this is as good as it gets? > > Thanks on beforehand, > > Sebastian Beca > Department of Computer Science Engineering > University of Chile > > PD: I'm using NumPy 0.9.8, SciPy 0.4.8. I also understand I have > ATLAS, BLAS and LAPACK all installed, but I havn't confirmed that. > > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Christopher B. <Chr...@no...> - 2006-06-16 17:05:42
|
Bruce Southey wrote: > Please run the exact same code in Matlab that you are running in > NumPy. Many of Matlab functions are very highly optimized so these are > provided as binary functions. I think that you are running into this > so you are not doing the correct comparison He is doing the correct comparison: if Matlab has some built-in compiled utility functions that numpy doesn't -- it really is faster! It looks like other's suggestions show that well written numpy code is plenty fast, however. One more suggestion I don't think I've seen: numpy provides a built-in compiled utility function: hypot() >>> x = N.arange(5) >>> y = N.arange(5) >>> N.hypot(x,y) array([ 0. , 1.41421356, 2.82842712, 4.24264069, 5.65685425]) >>> N.sqrt(x**2 + y**2) array([ 0. , 1.41421356, 2.82842712, 4.24264069, 5.65685425]) Timings: >>> timeit.Timer('N.sqrt(x**2 + y**2)','import numpy as N; x = N.arange(5000); y = N.arange(5000)').timeit(100) 0.49785208702087402 >>> timeit.Timer('N.hypot(x,y)','import numpy as N; x = N.arange(5000); y = N.arange(5000)').timeit(100) 0.081479072570800781 A factor of 6 improvement. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Tim H. <tim...@co...> - 2006-06-16 17:49:07
|
Christopher Barker wrote: >Bruce Southey wrote: > > >>Please run the exact same code in Matlab that you are running in >>NumPy. Many of Matlab functions are very highly optimized so these are >>provided as binary functions. I think that you are running into this >>so you are not doing the correct comparison >> >> > >He is doing the correct comparison: if Matlab has some built-in compiled >utility functions that numpy doesn't -- it really is faster! > >It looks like other's suggestions show that well written numpy code is >plenty fast, however. > >One more suggestion I don't think I've seen: numpy provides a built-in >compiled utility function: hypot() > > > >>> x = N.arange(5) > >>> y = N.arange(5) > >>> N.hypot(x,y) >array([ 0. , 1.41421356, 2.82842712, 4.24264069, 5.65685425]) > >>> N.sqrt(x**2 + y**2) >array([ 0. , 1.41421356, 2.82842712, 4.24264069, 5.65685425]) > >Timings: > >>> timeit.Timer('N.sqrt(x**2 + y**2)','import numpy as N; x = >N.arange(5000); y = N.arange(5000)').timeit(100) >0.49785208702087402 > >>> timeit.Timer('N.hypot(x,y)','import numpy as N; x = N.arange(5000); >y = N.arange(5000)').timeit(100) >0.081479072570800781 > >A factor of 6 improvement. > > Here's another thing to note: much of the time distance**2 works as well as distance (for instance if you are looking for the nearest point). If you're in that situation, computing the square of the distance is much cheaper: def d_2(): d = zeros([4, 10000], dtype=float) for i in range(4): xy = A[i] - B d[i] = xy[:,0]**2 + xy[:,1]**2 return d This is something like 250 times as fast as the naive Python solution; another five times faster than the fastest distance computing version that I could come up with (using hypot). -tim |
From: Sebastian B. <seb...@gm...> - 2006-06-17 03:06:35
|
Thanks! Avoiding the inner loop is MUCH faster (~20-300 times than the original). Nevertheless I don't think I can use hypot as it only works for two dimensions. The general problem I have is: A = random( [C, K] ) B = random( [N, K] ) C ~ 1-10 N ~ Large (thousands, millions.. i.e. my dataset) K ~ 2-100 (dimensions of my problem, i.e. not fixed a priori.) I adapted your proposed version to this for K dimensions: def d4(): d = zeros([4, 1000], dtype=float) for i in range(4): xy = A[i] - B d[i] = sqrt( sum(xy**2, axis=1) ) return d Maybe there's another alternative to d4? Thanks again, Sebastian. > def d_2(): > d = zeros([4, 10000], dtype=float) > for i in range(4): > xy = A[i] - B > d[i] = xy[:,0]**2 + xy[:,1]**2 > return d > > This is something like 250 times as fast as the naive Python solution; > another five times faster than the fastest distance computing version > that I could come up with (using hypot). > > -tim > > > > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Sebastian B. <seb...@gm...> - 2006-06-17 04:38:14
|
Please replace: C = 4 N = 1000 > d = zeros([C, N], dtype=float) BK. |
From: Johannes L. <a.u...@gm...> - 2006-06-17 06:47:32
|
Hi, > def d4(): > d = zeros([4, 1000], dtype=float) > for i in range(4): > xy = A[i] - B > d[i] = sqrt( sum(xy**2, axis=1) ) > return d > > Maybe there's another alternative to d4? > Thanks again, I think this is the fastest you can get. Maybe it would be nicer to use the .sum() method instead of sum function, but that is just my personal opinion. I am curious how this compares to the matlab version. :) Johannes |
From: Sebastian B. <seb...@gm...> - 2006-06-18 22:49:29
|
I checked the matlab version's code and it does the same as discussed here. The only thing to check is to make sure you loop around the shorter dimension of the output array. Speedwise the Matlab code still runs about twice as fast for large sets of data (by just taking time by hand and comparing), nevetheless the improvement over calculating each value as in d1 is significant (10-300 times) and enough for my needs. Thanks to all. Sebastian Beca PD: I also tried the d5 version Alex sent but the results are not the same so I couldn't compare. My final version was: K = 10 C = 3 N = 2500 # One could switch around C and N now. A = random.random( [N, K]) B = random.random( [C, K]) def dist(): d = zeros([N, C], dtype=float) if N < C: for i in range(N): xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) return d else: for j in range(C): xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) return d On 6/17/06, Johannes Loehnert <a.u...@gm...> wrote: > Hi, > > > def d4(): > > d = zeros([4, 1000], dtype=float) > > for i in range(4): > > xy = A[i] - B > > d[i] = sqrt( sum(xy**2, axis=1) ) > > return d > > > > Maybe there's another alternative to d4? > > Thanks again, > > I think this is the fastest you can get. Maybe it would be nicer to use > the .sum() method instead of sum function, but that is just my personal > opinion. > > I am curious how this compares to the matlab version. :) > > Johannes > > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Alan G I. <ai...@am...> - 2006-06-19 01:58:23
|
On Sun, 18 Jun 2006, Sebastian Beca apparently wrote:=20 > def dist(): > d =3D zeros([N, C], dtype=3Dfloat) > if N < C: for i in range(N): > xy =3D A[i] - B d[i,:] =3D sqrt(sum(xy**2, axis=3D1)) > return d > else: > for j in range(C): > xy =3D A - B[j] d[:,j] =3D sqrt(sum(xy**2, axis=3D1)) > return d=20 But that is 50% slower than Johannes's version: def dist_loehner1(): d =3D A[:, newaxis, :] - B[newaxis, :, :] d =3D sqrt((d**2).sum(axis=3D2)) =09return d Cheers, Alan Isaac |
From: Tim H. <tim...@co...> - 2006-06-19 03:18:40
|
Alan G Isaac wrote: >On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: > > >>def dist(): >>d = zeros([N, C], dtype=float) >>if N < C: for i in range(N): >> xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) >> return d >>else: >> for j in range(C): >> xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) >>return d >> >> > > >But that is 50% slower than Johannes's version: > >def dist_loehner1(): > d = A[:, newaxis, :] - B[newaxis, :, :] > d = sqrt((d**2).sum(axis=2)) > return d > > Are you sure about that? I just ran it through timeit, using Sebastian's array sizes and I get Sebastian's version being 150% *faster*. This could well be cache size dependant, so may vary from box to box, but I'd expect Sebastian's current version to scale better in general. -tim |
From: Alan G I. <ai...@am...> - 2006-06-19 04:22:36
|
On Sun, 18 Jun 2006, Tim Hochberg apparently wrote:=20 > Alan G Isaac wrote:=20 >> On Sun, 18 Jun 2006, Sebastian Beca apparently wrote:=20 >>> def dist():=20 >>> d =3D zeros([N, C], dtype=3Dfloat)=20 >>> if N < C: for i in range(N):=20 >>> xy =3D A[i] - B d[i,:] =3D sqrt(sum(xy**2, axis=3D1))=20 >>> return d=20 >>> else:=20 >>> for j in range(C):=20 >>> xy =3D A - B[j] d[:,j] =3D sqrt(sum(xy**2, axis=3D1))=20 >>> return d=20 >> But that is 50% slower than Johannes's version:=20 >> def dist_loehner1():=20 >> d =3D A[:, newaxis, :] - B[newaxis, :, :]=20 >> d =3D sqrt((d**2).sum(axis=3D2))=20 >> =09return d=20 > Are you sure about that? I just ran it through timeit, using Sebastian's= =20 > array sizes and I get Sebastian's version being 150% faster. This=20 > could well be cache size dependant, so may vary from box to box, but I'd= =20 > expect Sebastian's current version to scale better in general.=20 No, I'm not sure. Script attached bottom. Most recent output follows: for reasons I have not determined, it doesn't match my previous runs ... Alan >>> execfile(r'c:\temp\temp.py') dist_beca : 3.042277 dist_loehner1: 3.170026 ################################# #THE SCRIPT import sys sys.path.append("c:\\temp") import numpy from numpy import * import timeit K =3D 10 C =3D 2500 N =3D 3 # One could switch around C and N now. A =3D numpy.random.random( [N, K] ) B =3D numpy.random.random( [C, K] ) # beca def dist_beca(): d =3D zeros([N, C], dtype=3Dfloat) if N < C: for i in range(N): xy =3D A[i] - B d[i,:] =3D sqrt(sum(xy**2, axis=3D1)) return d else: for j in range(C): xy =3D A - B[j] d[:,j] =3D sqrt(sum(xy**2, axis=3D1)) return d #loehnert def dist_loehner1(): =09# drawback: memory usage temporarily doubled =09# solution see below =09d =3D A[:, newaxis, :] - B[newaxis, :, :] =09# written as 3 expressions for more clarity =09d =3D sqrt((d**2).sum(axis=3D2)) =09return d if __name__ =3D=3D "__main__": =09t1 =3D timeit.Timer('dist_beca()', 'from temp import dist_beca').timeit(= 100) =09t8 =3D timeit.Timer('dist_loehner1()', 'from temp import dist_loehner1')= .timeit(100) =09fmt=3D"%-10s:\t"+"%10.6f" =09print fmt%('dist_beca', t1) =09print fmt%('dist_loehner1', t8) |
From: Sebastian B. <seb...@gm...> - 2006-06-19 20:04:36
|
I just ran Alan's script and I don't get consistent results for 100 repetitions. I boosted it to 1000, and ran it several times. The faster one varied alot, but both came into a ~ +-1.5% difference. When it comes to scaling, for my problem(fuzzy clustering), N is the size of the dataset, which should span from thousands to millions. C is the amount of clusters, usually less than 10, and K the amount of features (the dimension I want to sum over) is also usually less than 100. So mainly I'm concerned with scaling across N. I tried C=3, K=4, N=1000, 2500, 5000, 7500, 10000. Also using 1000 runs, the results were: dist_beca: 1.1, 4.5, 16, 28, 37 dist_loehner1: 1.7, 6.5, 22, 35, 47 I also tried scaling across K, with C=3, N=2500, and K=5-50. I couldn't get any consistent results for small K, but both tend to perform as well (+-2%) for large K (K>15). I'm not sure how these work in the backend so I can't argument as to why one should scale better than the other. Regards, Sebastian. On 6/19/06, Alan G Isaac <ai...@am...> wrote: > On Sun, 18 Jun 2006, Tim Hochberg apparently wrote: > > > Alan G Isaac wrote: > > >> On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: > > >>> def dist(): > >>> d = zeros([N, C], dtype=float) > >>> if N < C: for i in range(N): > >>> xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) > >>> return d > >>> else: > >>> for j in range(C): > >>> xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) > >>> return d > > >> But that is 50% slower than Johannes's version: > > >> def dist_loehner1(): > >> d = A[:, newaxis, :] - B[newaxis, :, :] > >> d = sqrt((d**2).sum(axis=2)) > >> return d > > > Are you sure about that? I just ran it through timeit, using Sebastian's > > array sizes and I get Sebastian's version being 150% faster. This > > could well be cache size dependant, so may vary from box to box, but I'd > > expect Sebastian's current version to scale better in general. > > No, I'm not sure. > Script attached bottom. > Most recent output follows: > for reasons I have not determined, > it doesn't match my previous runs ... > Alan > > >>> execfile(r'c:\temp\temp.py') > dist_beca : 3.042277 > dist_loehner1: 3.170026 > > > ################################# > #THE SCRIPT > import sys > sys.path.append("c:\\temp") > import numpy > from numpy import * > import timeit > > > K = 10 > C = 2500 > N = 3 # One could switch around C and N now. > A = numpy.random.random( [N, K] ) > B = numpy.random.random( [C, K] ) > > # beca > def dist_beca(): > d = zeros([N, C], dtype=float) > if N < C: > for i in range(N): > xy = A[i] - B > d[i,:] = sqrt(sum(xy**2, axis=1)) > return d > else: > for j in range(C): > xy = A - B[j] > d[:,j] = sqrt(sum(xy**2, axis=1)) > return d > > #loehnert > def dist_loehner1(): > # drawback: memory usage temporarily doubled > # solution see below > d = A[:, newaxis, :] - B[newaxis, :, :] > # written as 3 expressions for more clarity > d = sqrt((d**2).sum(axis=2)) > return d > > > if __name__ == "__main__": > t1 = timeit.Timer('dist_beca()', 'from temp import dist_beca').timeit(100) > t8 = timeit.Timer('dist_loehner1()', 'from temp import dist_loehner1').timeit(100) > fmt="%-10s:\t"+"%10.6f" > print fmt%('dist_beca', t1) > print fmt%('dist_loehner1', t8) > > > > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Tim H. <tim...@co...> - 2006-06-19 20:29:10
|
Sebastian Beca wrote: >I just ran Alan's script and I don't get consistent results for 100 >repetitions. I boosted it to 1000, and ran it several times. The >faster one varied alot, but both came into a ~ +-1.5% difference. > >When it comes to scaling, for my problem(fuzzy clustering), N is the >size of the dataset, which should span from thousands to millions. C >is the amount of clusters, usually less than 10, and K the amount of >features (the dimension I want to sum over) is also usually less than >100. So mainly I'm concerned with scaling across N. I tried C=3, K=4, >N=1000, 2500, 5000, 7500, 10000. Also using 1000 runs, the results >were: >dist_beca: 1.1, 4.5, 16, 28, 37 >dist_loehner1: 1.7, 6.5, 22, 35, 47 > >I also tried scaling across K, with C=3, N=2500, and K=5-50. I >couldn't get any consistent results for small K, but both tend to >perform as well (+-2%) for large K (K>15). > >I'm not sure how these work in the backend so I can't argument as to >why one should scale better than the other. > > The reason I suspect that dist_beca should scale better is that dist_loehner1 generates an intermediate array of size NxCxK, while dist_beca produces intermediate matrices that are only NxK or CxK. For large problems, allocating that extra memory and fetching it into and out of the cache can be a bottleneck. Here's another version that allocates even less in the way of temporaries at the expenses of being borderline incomprehensible. It still allocates an NxK temporary array, but it allocates it once ahead of time and then reuses it for all subsequent calculations. Your welcome to use it, but I'm not sure I'd recomend it unless this function is really a speed bottleneck as it could end up being hard to read later (I left implementing the N<C case as an exercise to the reader....). I have another idea that might reduce the memory overhead still further, if I get a chance I'll try it out and let you know if it results in a further speed up. -tim def dist2(A, B): d = zeros([N, C], dtype=float) if N < C: raise NotImplemented else: tmp = empty([N, K], float) tmp0 = tmp[:,0] rangek = range(1,K) for j in range(C): subtract(A, B[j], tmp) tmp *= tmp for k in rangek: tmp0 += tmp[:,k] sqrt(tmp0, d[:,j]) return d >Regards, > >Sebastian. > >On 6/19/06, Alan G Isaac <ai...@am...> wrote: > > >>On Sun, 18 Jun 2006, Tim Hochberg apparently wrote: >> >> >> >>>Alan G Isaac wrote: >>> >>> >>>>On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: >>>> >>>> >>>>>def dist(): >>>>>d = zeros([N, C], dtype=float) >>>>>if N < C: for i in range(N): >>>>>xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) >>>>>return d >>>>>else: >>>>>for j in range(C): >>>>>xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) >>>>>return d >>>>> >>>>> >>>>But that is 50% slower than Johannes's version: >>>> >>>> >>>>def dist_loehner1(): >>>> d = A[:, newaxis, :] - B[newaxis, :, :] >>>> d = sqrt((d**2).sum(axis=2)) >>>> return d >>>> >>>> >>>Are you sure about that? I just ran it through timeit, using Sebastian's >>>array sizes and I get Sebastian's version being 150% faster. This >>>could well be cache size dependant, so may vary from box to box, but I'd >>>expect Sebastian's current version to scale better in general. >>> >>> >>No, I'm not sure. >>Script attached bottom. >>Most recent output follows: >>for reasons I have not determined, >>it doesn't match my previous runs ... >>Alan >> >> >> >>>>>execfile(r'c:\temp\temp.py') >>>>> >>>>> >>dist_beca : 3.042277 >>dist_loehner1: 3.170026 >> >> >>################################# >>#THE SCRIPT >>import sys >>sys.path.append("c:\\temp") >>import numpy >>from numpy import * >>import timeit >> >> >>K = 10 >>C = 2500 >>N = 3 # One could switch around C and N now. >>A = numpy.random.random( [N, K] ) >>B = numpy.random.random( [C, K] ) >> >># beca >>def dist_beca(): >> d = zeros([N, C], dtype=float) >> if N < C: >> for i in range(N): >> xy = A[i] - B >> d[i,:] = sqrt(sum(xy**2, axis=1)) >> return d >> else: >> for j in range(C): >> xy = A - B[j] >> d[:,j] = sqrt(sum(xy**2, axis=1)) >> return d >> >>#loehnert >>def dist_loehner1(): >> # drawback: memory usage temporarily doubled >> # solution see below >> d = A[:, newaxis, :] - B[newaxis, :, :] >> # written as 3 expressions for more clarity >> d = sqrt((d**2).sum(axis=2)) >> return d >> >> >>if __name__ == "__main__": >> t1 = timeit.Timer('dist_beca()', 'from temp import dist_beca').timeit(100) >> t8 = timeit.Timer('dist_loehner1()', 'from temp import dist_loehner1').timeit(100) >> fmt="%-10s:\t"+"%10.6f" >> print fmt%('dist_beca', t1) >> print fmt%('dist_loehner1', t8) >> >> >> >> >>_______________________________________________ >>Numpy-discussion mailing list >>Num...@li... >>https://lists.sourceforge.net/lists/listinfo/numpy-discussion >> >> >> > > >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > |
From: Tim H. <tim...@co...> - 2006-06-19 21:39:32
|
Tim Hochberg wrote: >Sebastian Beca wrote: > > > >>I just ran Alan's script and I don't get consistent results for 100 >>repetitions. I boosted it to 1000, and ran it several times. The >>faster one varied alot, but both came into a ~ +-1.5% difference. >> >>When it comes to scaling, for my problem(fuzzy clustering), N is the >>size of the dataset, which should span from thousands to millions. C >>is the amount of clusters, usually less than 10, and K the amount of >>features (the dimension I want to sum over) is also usually less than >>100. So mainly I'm concerned with scaling across N. I tried C=3, K=4, >>N=1000, 2500, 5000, 7500, 10000. Also using 1000 runs, the results >>were: >>dist_beca: 1.1, 4.5, 16, 28, 37 >>dist_loehner1: 1.7, 6.5, 22, 35, 47 >> >>I also tried scaling across K, with C=3, N=2500, and K=5-50. I >>couldn't get any consistent results for small K, but both tend to >>perform as well (+-2%) for large K (K>15). >> >>I'm not sure how these work in the backend so I can't argument as to >>why one should scale better than the other. >> >> >> >> >The reason I suspect that dist_beca should scale better is that >dist_loehner1 generates an intermediate array of size NxCxK, while >dist_beca produces intermediate matrices that are only NxK or CxK. For >large problems, allocating that extra memory and fetching it into and >out of the cache can be a bottleneck. > >Here's another version that allocates even less in the way of >temporaries at the expenses of being borderline incomprehensible. It >still allocates an NxK temporary array, but it allocates it once ahead >of time and then reuses it for all subsequent calculations. Your welcome >to use it, but I'm not sure I'd recomend it unless this function is >really a speed bottleneck as it could end up being hard to read later (I >left implementing the N<C case as an exercise to the reader....). > >I have another idea that might reduce the memory overhead still further, >if I get a chance I'll try it out and let you know if it results in a >further speed up. > >-tim > > > def dist2(A, B): > d = zeros([N, C], dtype=float) > if N < C: > raise NotImplemented > else: > tmp = empty([N, K], float) > tmp0 = tmp[:,0] > rangek = range(1,K) > for j in range(C): > subtract(A, B[j], tmp) > tmp *= tmp > for k in rangek: > tmp0 += tmp[:,k] > sqrt(tmp0, d[:,j]) > return d > > Speaking of scaling: I tried this with K=25000 (10 x greater than Sebastian's original numbers). Much to my suprise it performed somewhat worse than the Sebastian's dist() with large K. Below is a modified dist2 that performs about the same (marginally better here) for large K as well as a dist3 that performs about 50% better at both K=2500 and K=25000. -tim def dist2(A, B): d = empty([N, C], dtype=float) if N < C: raise NotImplemented else: tmp = empty([N, K], float) tmp0 = tmp[:,0] for j in range(C): subtract(A, B[j], tmp) tmp **= 2 d[:,j] = sum(tmp, axis=1) sqrt(d[:,j], d[:,j]) return d def dist3(A, B): d = zeros([N, C], dtype=float) rangek = range(K) if N < C: raise NotImplemented else: tmp = empty([N], float) for j in range(C): for k in rangek: subtract(A[:,k], B[j,k], tmp) tmp **= 2 d[:,j] += tmp sqrt(d[:,j], d[:,j]) return d > > > >>Regards, >> >>Sebastian. >> >>On 6/19/06, Alan G Isaac <ai...@am...> wrote: >> >> >> >> >>>On Sun, 18 Jun 2006, Tim Hochberg apparently wrote: >>> >>> >>> >>> >>> >>>>Alan G Isaac wrote: >>>> >>>> >>>> >>>> >>>>>On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: >>>>> >>>>> >>>>> >>>>> >>>>>>def dist(): >>>>>>d = zeros([N, C], dtype=float) >>>>>>if N < C: for i in range(N): >>>>>>xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) >>>>>>return d >>>>>>else: >>>>>>for j in range(C): >>>>>>xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) >>>>>>return d >>>>>> >>>>>> >>>>>> >>>>>> >>>>>But that is 50% slower than Johannes's version: >>>>> >>>>> >>>>>def dist_loehner1(): >>>>> d = A[:, newaxis, :] - B[newaxis, :, :] >>>>> d = sqrt((d**2).sum(axis=2)) >>>>> return d >>>>> >>>>> >>>>> >>>>> >>>>Are you sure about that? I just ran it through timeit, using Sebastian's >>>>array sizes and I get Sebastian's version being 150% faster. This >>>>could well be cache size dependant, so may vary from box to box, but I'd >>>>expect Sebastian's current version to scale better in general. >>>> >>>> >>>> >>>> >>>No, I'm not sure. >>>Script attached bottom. >>>Most recent output follows: >>>for reasons I have not determined, >>>it doesn't match my previous runs ... >>>Alan >>> >>> >>> >>> >>> >>>>>>execfile(r'c:\temp\temp.py') >>>>>> >>>>>> >>>>>> >>>>>> >>>dist_beca : 3.042277 >>>dist_loehner1: 3.170026 >>> >>> >>>################################# >>>#THE SCRIPT >>>import sys >>>sys.path.append("c:\\temp") >>>import numpy >>> >>> >>>from numpy import * >> >> >>>import timeit >>> >>> >>>K = 10 >>>C = 2500 >>>N = 3 # One could switch around C and N now. >>>A = numpy.random.random( [N, K] ) >>>B = numpy.random.random( [C, K] ) >>> >>># beca >>>def dist_beca(): >>> d = zeros([N, C], dtype=float) >>> if N < C: >>> for i in range(N): >>> xy = A[i] - B >>> d[i,:] = sqrt(sum(xy**2, axis=1)) >>> return d >>> else: >>> for j in range(C): >>> xy = A - B[j] >>> d[:,j] = sqrt(sum(xy**2, axis=1)) >>> return d >>> >>>#loehnert >>>def dist_loehner1(): >>> # drawback: memory usage temporarily doubled >>> # solution see below >>> d = A[:, newaxis, :] - B[newaxis, :, :] >>> # written as 3 expressions for more clarity >>> d = sqrt((d**2).sum(axis=2)) >>> return d >>> >>> >>>if __name__ == "__main__": >>> t1 = timeit.Timer('dist_beca()', 'from temp import dist_beca').timeit(100) >>> t8 = timeit.Timer('dist_loehner1()', 'from temp import dist_loehner1').timeit(100) >>> fmt="%-10s:\t"+"%10.6f" >>> print fmt%('dist_beca', t1) >>> print fmt%('dist_loehner1', t8) >>> >>> >>> >>> >>>_______________________________________________ >>>Numpy-discussion mailing list >>>Num...@li... >>>https://lists.sourceforge.net/lists/listinfo/numpy-discussion >>> >>> >>> >>> >>> >>_______________________________________________ >>Numpy-discussion mailing list >>Num...@li... >>https://lists.sourceforge.net/lists/listinfo/numpy-discussion >> >> >> >> >> >> > > > > >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > |
From: Alan G I. <ai...@am...> - 2006-06-20 07:08:01
|
I think the distance matrix version below is about as good=20 as it gets with these basic strategies. fwiw, Alan Isaac def dist(A,B): rowsA, rowsB =3D A.shape[0], B.shape[0] distanceAB =3D empty( [rowsA,rowsB] , dtype=3Dfloat) if rowsA <=3D rowsB: temp =3D empty_like(B) for i in range(rowsA): #store A[i]-B in temp subtract( A[i], B, temp ) temp *=3D temp sqrt( temp.sum(axis=3D1), distanceAB[i,:]) else: temp =3D empty_like(A) for j in range(rowsB): #store A-B[j] in temp temp =3D subtract( A, B[j], temp ) temp *=3D temp sqrt( temp.sum(axis=3D1), distanceAB[:,j]) return distanceAB |
From: Alex C. <ac...@gm...> - 2006-06-17 21:41:22
|
How about this? def d5(): return add.outer(sum(A*A, axis=1), sum(B*B, axis=1)) - \ 2.*dot(A, transpose(B)) |
From: Robert K. <rob...@gm...> - 2006-06-17 21:49:37
|
Alex Cannon wrote: > How about this? > > def d5(): > return add.outer(sum(A*A, axis=1), sum(B*B, axis=1)) - \ > 2.*dot(A, transpose(B)) You might lose some precision with that approach, so the OP should compare results and timings to look at the tradeoffs. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |