From: David C. <da...@ar...> - 2006-10-03 10:54:16
|
Hi, I was wondering if there was any way to speed up the following code: y = N.zeros((n, K)) for i in range(K): y[:, i] = gauss_den(data, mu[i, :], va[i, :]) Where K is of order 1e1, n of order 1e5. Normally, gauss_den is a quite expensive function, but the profiler tells me that the indexing y[:,i] takes almost as much time as the gauss_den computation (which computes n exp !). To see if the profiler is "right", i replaces with the (non valid) following function: y = N.zeros((n, K)) for i in range(K): yt = gauss_den(data, mu[i, :], va[i, :]) return y Where more than 99% of the code is spent inside gauss_den. I guess the problem is coming from the fact that y being C order, y[:, i] needs accessing data in a non 'linear' way. Is there a way to speed this up ? I did something like this: y = N.zeros((K, n)) for i in range(K): y[i] = gauss_den(data, mu[i, :], va[i, :]) return y.T which works, but I don't like it very much. Isn't there any other way ? David |
From: Tim H. <tim...@ie...> - 2006-10-03 16:24:26
|
David Cournapeau wrote: > Hi, > > I was wondering if there was any way to speed up the following code: > > y = N.zeros((n, K)) > for i in range(K): > y[:, i] = gauss_den(data, mu[i, :], va[i, :]) > > Where K is of order 1e1, n of order 1e5. Normally, gauss_den is a quite > expensive function, but the profiler tells me that the indexing y[:,i] > takes almost as much time as the gauss_den computation (which computes n > exp !). To see if the profiler is "right", i replaces with the (non > valid) following function: > > y = N.zeros((n, K)) > for i in range(K): > yt = gauss_den(data, mu[i, :], va[i, :]) > return y > > Where more than 99% of the code is spent inside gauss_den. > > I guess the problem is coming from the fact that y being C order, y[:, > i] needs accessing data in a non 'linear' way. Is there a way to speed > this up ? I did something like this: > > y = N.zeros((K, n)) > for i in range(K): > y[i] = gauss_den(data, mu[i, :], va[i, :]) > return y.T > > which works, but I don't like it very much. Why not? > Isn't there any other way That depends on the details of gauss_den. A general note: for this kind of microoptimization puzzle, it's much easier to help if you can post a self contained example, preferably something fairly simple that still illustrates the speed issue, that we can experiment with. -tim |
From: Bruce S. <bso...@gm...> - 2006-10-04 15:36:22
|
Hi, I think what you are after is the multivariate normal distribution. Assuming I have it correctly, it is clearer to see (and probably more accurate to compute) in the log form as: -(N/2)*log(2*PI) - 0.5*log(determinant of V) - 0.5*(transpose of (x-mu))*inverse(V)*(x-mu) where N is the number of observations, PI is math constant, V is the known variance co-variance matrix, x is vector of values, mu is the known mean. If so, then you can vectorize the density calculation. Of course if V has a simple structure (it is unclear from your code on this) and can be factored out, then the only 'hard' part to compute is the product of the vector transpose with the vector. If mu and va are scalars for each calculation, then you can factor them out for so the only multidimensional calculations are squares and sums of each frame. Regards Bruce On 10/4/06, David Cournapeau <da...@ar...> wrote: > Tim Hochberg wrote: > >> > >> > >> I guess the problem is coming from the fact that y being C order, y[:, > >> i] needs accessing data in a non 'linear' way. Is there a way to speed > >> this up ? I did something like this: > >> > >> y = N.zeros((K, n)) > >> for i in range(K): > >> y[i] = gauss_den(data, mu[i, :], va[i, :]) > >> return y.T > >> > >> which works, but I don't like it very much. > > Why not? > > > Mainly because using those transpose do not really reflect the > intention, and this does not seem natural. > >> Isn't there any other way > > That depends on the details of gauss_den. > > > > A general note: for this kind of microoptimization puzzle, it's much > > easier to help if you can post a self contained example, preferably > > something fairly simple that still illustrates the speed issue, that we > > can experiment with. > > > Here we are (the difference may not seem that much between the two > multiple_ga, but in reality, _diag_gauss_den is an internal function > which is done in C, and is much faster... By writing this example, I've > just realized that the function _diag_gauss_den may be slow for exactly > the same reasons): > > > #! /usr/bin/env python > # Last Change: Wed Oct 04 07:00 PM 2006 J > > import numpy as N > from numpy.random import randn > > def _diag_gauss_den(x, mu, va): > """ This function is the actual implementation > of gaussian pdf in scalar case. It assumes all args > are conformant, so it should not be used directly > > Call gauss_den instead""" > # Diagonal matrix case > d = mu.size > inva = 1/va[0] > fac = (2*N.pi) ** (-d/2.0) * N.sqrt(inva) > y = (x[:,0] - mu[0]) ** 2 * inva * -0.5 > for i in range(1, d): > inva = 1/va[i] > fac *= N.sqrt(inva) > y += (x[:,i] - mu[i]) ** 2 * inva * -0.5 > y = fac * N.exp(y) > > def multiple_gauss_den1(data, mu, va): > """Helper function to generate several Gaussian > pdf (different parameters) from the same data: unoptimized version""" > K = mu.shape[0] > n = data.shape[0] > d = data.shape[1] > > y = N.zeros((n, K)) > for i in range(K): > y[:, i] = _diag_gauss_den(data, mu[i, :], va[i, :]) > > return y > > def multiple_gauss_den2(data, mu, va): > """Helper function to generate several Gaussian > pdf (different parameters) from the same data: optimized version""" > K = mu.shape[0] > n = data.shape[0] > d = data.shape[1] > > y = N.zeros((K, n)) > for i in range(K): > y[i] = _diag_gauss_den(data, mu[i, :], va[i, :]) > > return y.T > > def bench(): > #=========================================== > # GMM of 30 comp, 15 dimension, 1e4 frames > #=========================================== > d = 15 > k = 30 > nframes = 1e4 > niter = 10 > mode = 'diag' > > mu = randn(k, d) > va = randn(k, d) ** 2 > X = randn(nframes, d) > > print "=============================================================" > print "(%d dim, %d components) GMM with %d iterations, for %d frames" \ > % (d, k, niter, nframes) > > for i in range(niter): > y1 = multiple_gauss_den1(X, mu, va) > > for i in range(niter): > y2 = multiple_gauss_den2(X, mu, va) > > se = N.sum(y1-y2) > > print se > > if __name__ == '__main__': > import hotshot, hotshot.stats > profile_file = 'foo.prof' > prof = hotshot.Profile(profile_file, lineevents=1) > prof.runcall(bench) > p = hotshot.stats.load(profile_file) > print p.sort_stats('cumulative').print_stats(20) > prof.close() > > I am a bit puzzled by all those C vs F storage, though. In Matlab, where > the storage was always F as far as I know, I have never encountered such > differences (eg between y(:, i) and y(:, i)); I don't know if this is > because I am doing it badly, or because matlab is much more clever than > numpy at handling those cases, or if that is the price to pay for the > added flexibility of numpy... > > David > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: David C. <da...@ar...> - 2006-10-05 06:07:30
|
Bruce Southey wrote: > Hi, > I think what you are after is the multivariate normal distribution. > Indeed > Assuming I have it correctly, it is clearer to see (and probably more > accurate to compute) in the log form as: > > -(N/2)*log(2*PI) - 0.5*log(determinant of V) - 0.5*(transpose of > (x-mu))*inverse(V)*(x-mu) > > where N is the number of observations, PI is math constant, V is the > known variance co-variance matrix, x is vector of values, mu is the > known mean. > Sure, but I need the exponential form at the end (actually, in the real code, you can choose between log form and 'standard' form, but I removed this to make the example as simple as possible). > If so, then you can vectorize the density calculation. I am not sure to understand what you mean: the computation is already vectorized in _diag_gauss_den; there is no loop there, and the function expects x to be of shape (n, d), where d is the dimension and n the number of samples. The loop in multiple_gaussian in not on samples, but on densities, that is I need to compute the normal multivariate densities on the same data but with different (vector) means and (diagonal ) variances, for which I don't see any easy way to vectorize without huge memory usage (using rank 3 arrays). Anyway, the problem I try to understand here is not related to gaussian kernel computation, but rather on cost difference for accessing row and columns depending on the underlying storage (C or Fortran). Don't try to find flaws in _diag_gauss_den, as it is just a toy example to make my point clearer. David |
From: David C. <da...@ar...> - 2006-10-04 10:57:21
|
Tim Hochberg wrote: >> >> >> I guess the problem is coming from the fact that y being C order, y[:, >> i] needs accessing data in a non 'linear' way. Is there a way to speed >> this up ? I did something like this: >> >> y = N.zeros((K, n)) >> for i in range(K): >> y[i] = gauss_den(data, mu[i, :], va[i, :]) >> return y.T >> >> which works, but I don't like it very much. > Why not? > Mainly because using those transpose do not really reflect the intention, and this does not seem natural. >> Isn't there any other way > That depends on the details of gauss_den. > > A general note: for this kind of microoptimization puzzle, it's much > easier to help if you can post a self contained example, preferably > something fairly simple that still illustrates the speed issue, that we > can experiment with. > Here we are (the difference may not seem that much between the two multiple_ga, but in reality, _diag_gauss_den is an internal function which is done in C, and is much faster... By writing this example, I've just realized that the function _diag_gauss_den may be slow for exactly the same reasons): #! /usr/bin/env python # Last Change: Wed Oct 04 07:00 PM 2006 J import numpy as N from numpy.random import randn def _diag_gauss_den(x, mu, va): """ This function is the actual implementation of gaussian pdf in scalar case. It assumes all args are conformant, so it should not be used directly Call gauss_den instead""" # Diagonal matrix case d = mu.size inva = 1/va[0] fac = (2*N.pi) ** (-d/2.0) * N.sqrt(inva) y = (x[:,0] - mu[0]) ** 2 * inva * -0.5 for i in range(1, d): inva = 1/va[i] fac *= N.sqrt(inva) y += (x[:,i] - mu[i]) ** 2 * inva * -0.5 y = fac * N.exp(y) def multiple_gauss_den1(data, mu, va): """Helper function to generate several Gaussian pdf (different parameters) from the same data: unoptimized version""" K = mu.shape[0] n = data.shape[0] d = data.shape[1] y = N.zeros((n, K)) for i in range(K): y[:, i] = _diag_gauss_den(data, mu[i, :], va[i, :]) return y def multiple_gauss_den2(data, mu, va): """Helper function to generate several Gaussian pdf (different parameters) from the same data: optimized version""" K = mu.shape[0] n = data.shape[0] d = data.shape[1] y = N.zeros((K, n)) for i in range(K): y[i] = _diag_gauss_den(data, mu[i, :], va[i, :]) return y.T def bench(): #=========================================== # GMM of 30 comp, 15 dimension, 1e4 frames #=========================================== d = 15 k = 30 nframes = 1e4 niter = 10 mode = 'diag' mu = randn(k, d) va = randn(k, d) ** 2 X = randn(nframes, d) print "=============================================================" print "(%d dim, %d components) GMM with %d iterations, for %d frames" \ % (d, k, niter, nframes) for i in range(niter): y1 = multiple_gauss_den1(X, mu, va) for i in range(niter): y2 = multiple_gauss_den2(X, mu, va) se = N.sum(y1-y2) print se if __name__ == '__main__': import hotshot, hotshot.stats profile_file = 'foo.prof' prof = hotshot.Profile(profile_file, lineevents=1) prof.runcall(bench) p = hotshot.stats.load(profile_file) print p.sort_stats('cumulative').print_stats(20) prof.close() I am a bit puzzled by all those C vs F storage, though. In Matlab, where the storage was always F as far as I know, I have never encountered such differences (eg between y(:, i) and y(:, i)); I don't know if this is because I am doing it badly, or because matlab is much more clever than numpy at handling those cases, or if that is the price to pay for the added flexibility of numpy... David |
From: David C. <da...@ar...> - 2006-10-04 11:12:21
|
David Cournapeau wrote: > Here we are (the difference may not seem that much between the two > multiple_ga, but in reality, _diag_gauss_den is an internal function > which is done in C, and is much faster... By writing this example, I've > just realized that the function _diag_gauss_den may be slow for exactly > the same reasons): > > I checked my assumption about the _diag_gauss_den function being slow because of the same problem, and this is indeed the case: If I replace X = randn(nframes, d) by X = randn(d, nframes); X = X.T, the function is mode than twice as fast ! This seems way to much of a difference to me.... David |