From: Sebastian B. <seb...@gm...> - 2006-06-16 06:09:04
|
Hi, I'm working with NumPy/SciPy on some algorithms and i've run into some important speed differences wrt Matlab 7. I've narrowed the main speed problem down to the operation of finding the euclidean distance between two matrices that share one dimension rank (dist in Matlab): Python: def dtest(): A = random( [4,2]) B = random( [1000,2]) d = zeros([4, 1000], dtype='f') for i in range(4): for j in range(1000): d[i, j] = sqrt( sum( (A[i] - B[j])**2 ) ) return d Matlab: A = rand( [4,2]) B = rand( [1000,2]) d = dist(A, B') Running both of these 100 times, I've found the python version to run between 10-20 times slower. My question is if there is a faster way to do this? Perhaps I'm not using the correct functions/structures? Or this is as good as it gets? Thanks on beforehand, Sebastian Beca Department of Computer Science Engineering University of Chile PD: I'm using NumPy 0.9.8, SciPy 0.4.8. I also understand I have ATLAS, BLAS and LAPACK all installed, but I havn't confirmed that. |
From: Michael S. <mic...@gm...> - 2006-06-16 06:26:39
|
Hi Sebastian, I am not sure if there is a function already defined in numpy, but something like this may be what you are after def distance(a1, a2): return sqrt(sum((a1[:,newaxis,:] - a2[newaxis,:,:])**2, axis=2)) The general idea is to avoid loops if you want the code to execute fast. I hope this helps. Mike On 6/16/06, Sebastian Beca <seb...@gm...> wrote: > Hi, > I'm working with NumPy/SciPy on some algorithms and i've run into some > important speed differences wrt Matlab 7. I've narrowed the main speed > problem down to the operation of finding the euclidean distance > between two matrices that share one dimension rank (dist in Matlab): > > Python: > def dtest(): > A = random( [4,2]) > B = random( [1000,2]) > > d = zeros([4, 1000], dtype='f') > for i in range(4): > for j in range(1000): > d[i, j] = sqrt( sum( (A[i] - B[j])**2 ) ) > return d > > Matlab: > A = rand( [4,2]) > B = rand( [1000,2]) > d = dist(A, B') > > Running both of these 100 times, I've found the python version to run > between 10-20 times slower. My question is if there is a faster way to > do this? Perhaps I'm not using the correct functions/structures? Or > this is as good as it gets? > > Thanks on beforehand, > > Sebastian Beca > Department of Computer Science Engineering > University of Chile > > PD: I'm using NumPy 0.9.8, SciPy 0.4.8. I also understand I have > ATLAS, BLAS and LAPACK all installed, but I havn't confirmed that. > > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Johannes L. <a.u...@gm...> - 2006-06-16 06:28:31
|
Hi, def dtest(): =A0 =A0 A =3D random( [4,2]) =A0 =A0 B =3D random( [1000,2]) # drawback: memory usage temporarily doubled # solution see below d =3D A[:, newaxis, :] - B[newaxis, :, :] # written as 3 expressions for more clarity d =3D sqrt((d**2).sum(axis=3D2)) return d def dtest_lowmem(): A =3D random( [4,2]) B =3D random( [1000,2]) d =3D zeros([4, 1000], dtype=3D'f') # stores result for i in range(len(A)): # the loop should not impose much loss in speed dtemp =3D A[i, newaxis, :] - B[:, :] dtemp =3D sqrt((dtemp**2).sum(axis=3D1)) d[i] =3D dtemp return d (both functions untested....) HTH, Johannes |
From: David D. <dav...@lo...> - 2006-06-16 07:53:51
|
Hi, On Fri, Jun 16, 2006 at 08:28:18AM +0200, Johannes Loehnert wrote: > Hi, >=20 > def dtest(): > =A0 =A0 A =3D random( [4,2]) > =A0 =A0 B =3D random( [1000,2]) >=20 > # drawback: memory usage temporarily doubled > # solution see below > d =3D A[:, newaxis, :] - B[newaxis, :, :] Unless I'm wrong, one can simplify a (very) little bit this line:=20 d =3D A[:, newaxis, :] - B > # written as 3 expressions for more clarity > d =3D sqrt((d**2).sum(axis=3D2)) > return d >=20 --=20 David Douard LOGILAB, Paris (France) Formations Python, Zope, Plone, Debian : http://www.logilab.fr/formations D=E9veloppement logiciel sur mesure : http://www.logilab.fr/services Informatique scientifique : http://www.logilab.fr/science |
From: Tim H. <tim...@co...> - 2006-06-16 13:18:08
|
Sebastian Beca wrote: >Hi, >I'm working with NumPy/SciPy on some algorithms and i've run into some >important speed differences wrt Matlab 7. I've narrowed the main speed >problem down to the operation of finding the euclidean distance >between two matrices that share one dimension rank (dist in Matlab): > >Python: >def dtest(): > A = random( [4,2]) > B = random( [1000,2]) > > d = zeros([4, 1000], dtype='f') > for i in range(4): > for j in range(1000): > d[i, j] = sqrt( sum( (A[i] - B[j])**2 ) ) > return d > >Matlab: > A = rand( [4,2]) > B = rand( [1000,2]) > d = dist(A, B') > >Running both of these 100 times, I've found the python version to run >between 10-20 times slower. My question is if there is a faster way to >do this? Perhaps I'm not using the correct functions/structures? Or >this is as good as it gets? > > Here's one faster way. from numpy import * import timeit A = random.random( [4,2]) B = random.random( [1000,2]) def d1(): d = zeros([4, 1000], dtype=float) for i in range(4): for j in range(1000): d[i, j] = sqrt( sum( (A[i] - B[j])**2 ) ) return d def d2(): d = zeros([4, 1000], dtype=float) for i in range(4): xy = A[i] - B d[i] = hypot(xy[:,0], xy[:,1]) return d if __name__ == "__main__": t1 = timeit.Timer('d1()', 'from scratch import d1').timeit(100) t2 =timeit.Timer('d2()', 'from scratch import d2').timeit(100) print t1, t2, t1 / t2 In this case, d2 is 50x faster than d1 on my box. Making some extremely dubious assumptions about transitivity of measurements, that would implt that d2 is twice as fast as matlab. Oh, and I didn't actually test that the output is correct.... -tim >Thanks on beforehand, > >Sebastian Beca >Department of Computer Science Engineering >University of Chile > >PD: I'm using NumPy 0.9.8, SciPy 0.4.8. I also understand I have >ATLAS, BLAS and LAPACK all installed, but I havn't confirmed that. > > >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > |