From: Tim H. <tim...@ie...> - 2006-10-03 00:17:58
|
Tim Hochberg wrote: > Travis Oliphant wrote: > >> Albert Strasheim wrote: >> >> >> >>> In [571]: x1 = N.random.randn(2000,39) >>> >>> In [572]: y1 = N.random.randn(64,39) >>> >>> In [574]: %timeit z1=x1[...,N.newaxis,...]-y1 10 loops, best of 3: 703 ms >>> per loop >>> >>> In [575]: z1.shape >>> Out[575]: (2000, 64, 39) >>> >>> As far as I can figure, this operation is doing 2000*64*39 subtractions. >>> Doing this straight up yields the following: >>> >>> In [576]: x2 = N.random.randn(2000,64,39) >>> >>> In [577]: y2 = N.random.randn(2000,64,39) >>> >>> In [578]: %timeit z2 = x2-y2 >>> 10 loops, best of 3: 108 ms per loop >>> >>> Does anybody have any ideas on why this is so much faster? Hopefully I >>> didn't mess up somewhere... >>> >>> >>> [SNIP] > > > I just spent a while playing with this, and assuming I've correctly > translated your original intent I've come up with two alternative, > looping versions that run, respectively 2 and 3 times faster. I've a > feeling that kmean3, the fastest one, still has a little more room to be > sped up, but I'm out of time now. Code is below > > -tim > > > One more iterations, this time a little algorithmic improvement, and were up to 4x as fast as the original code. Here we take advantage of the fact that the term di**2 is constant across the axis we are minimizing on, computation of ci**2 can be hoisted out of the loop and the -2*di*ci term can be rejiggered to di*ci by appropriate rescaling. This reduces the computation in the inner loop from a subtract and a multiply to just a multiply. def kmean4(data): nclusters = 64 naxes = data.shape[-1] code = data[:nclusters] transdata = data.transpose().copy() totals = N.empty([nclusters, len(data)], float) code2 = (code**2).sum(-1) code2 *= -0.5 totals[:] = code2[:, N.newaxis] for cluster, tot in zip(code, totals): for di, ci in zip(transdata, cluster): tot += di*ci return totals.argmax(axis=0) [SNIP CODE] |