Thread: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

[Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: David C. <da...@ar...> - 2006-10-03 10:54:16

Hi,

    I was wondering if there was any way to speed up the following code:

 y   = N.zeros((n, K))
 for i in range(K):
     y[:, i] = gauss_den(data, mu[i, :], va[i, :])

Where K is of order 1e1, n of order 1e5. Normally, gauss_den is a quite 
expensive function, but the profiler tells me that the indexing y[:,i] 
takes almost as much time as the gauss_den computation (which computes n 
exp !). To see if the profiler is "right", i replaces with the (non 
valid) following function:

 y   = N.zeros((n, K))
    for i in range(K):
        yt = gauss_den(data, mu[i, :], va[i, :])
    return y

Where more than 99% of the code is spent inside gauss_den.

I guess the problem is coming from the fact that y being C order, y[:, 
i] needs accessing data in a non 'linear' way. Is there a way to speed 
this up ? I did something like this:

  y   = N.zeros((K, n))
    for i in range(K):
        y[i] = gauss_den(data, mu[i, :], va[i, :])
    return y.T

which works, but I don't like it very much. Isn't there any other way ?

David

Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: Tim H. <tim...@ie...> - 2006-10-03 16:24:26

David Cournapeau wrote:
> Hi,
>
>     I was wondering if there was any way to speed up the following code:
>
>  y   = N.zeros((n, K))
>  for i in range(K):
>      y[:, i] = gauss_den(data, mu[i, :], va[i, :])
>
> Where K is of order 1e1, n of order 1e5. Normally, gauss_den is a quite 
> expensive function, but the profiler tells me that the indexing y[:,i] 
> takes almost as much time as the gauss_den computation (which computes n 
> exp !). To see if the profiler is "right", i replaces with the (non 
> valid) following function:
>
>  y   = N.zeros((n, K))
>     for i in range(K):
>         yt = gauss_den(data, mu[i, :], va[i, :])
>     return y
>
> Where more than 99% of the code is spent inside gauss_den.
>
> I guess the problem is coming from the fact that y being C order, y[:, 
> i] needs accessing data in a non 'linear' way. Is there a way to speed 
> this up ? I did something like this:
>
>   y   = N.zeros((K, n))
>     for i in range(K):
>         y[i] = gauss_den(data, mu[i, :], va[i, :])
>     return y.T
>
> which works, but I don't like it very much.
Why not?

>  Isn't there any other way 
That depends on the details of gauss_den.

A general note: for this kind of microoptimization puzzle, it's much 
easier to help if you can post a self contained example, preferably 
something fairly simple that still illustrates the speed issue, that we 
can experiment with.


-tim

Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: Bruce S. <bso...@gm...> - 2006-10-04 15:36:22

Hi,
I think what you are after is the multivariate normal distribution.
Assuming I have it correctly, it is clearer to see (and probably more
accurate to compute) in the log form as:

-(N/2)*log(2*PI) - 0.5*log(determinant of V) - 0.5*(transpose of
(x-mu))*inverse(V)*(x-mu)

where N is the number of observations, PI is math constant, V is the
known variance co-variance matrix, x is vector of values,  mu is the
known mean.

If so, then you can vectorize the density calculation. Of course if V
has a simple structure (it is unclear from your code on this) and can
be factored out, then the only 'hard' part to compute is the product
of the vector transpose with the vector. If mu and va are scalars for
each calculation, then you can factor them out for so the only
multidimensional calculations are squares and sums of each frame.

Regards
Bruce




On 10/4/06, David Cournapeau <da...@ar...> wrote:
> Tim Hochberg wrote:
> >>
> >>
> >> I guess the problem is coming from the fact that y being C order, y[:,
> >> i] needs accessing data in a non 'linear' way. Is there a way to speed
> >> this up ? I did something like this:
> >>
> >>   y   = N.zeros((K, n))
> >>     for i in range(K):
> >>         y[i] = gauss_den(data, mu[i, :], va[i, :])
> >>     return y.T
> >>
> >> which works, but I don't like it very much.
> > Why not?
> >
> Mainly because using those transpose do not really reflect the
> intention, and this does not seem natural.
> >>  Isn't there any other way
> > That depends on the details of gauss_den.
> >
> > A general note: for this kind of microoptimization puzzle, it's much
> > easier to help if you can post a self contained example, preferably
> > something fairly simple that still illustrates the speed issue, that we
> > can experiment with.
> >
> Here we are (the difference may not seem that much between the two
> multiple_ga, but in reality, _diag_gauss_den is an internal function
> which is done in C, and is much faster... By writing this example, I've
> just realized that the function _diag_gauss_den may be slow for exactly
> the same reasons):
>
>
> #! /usr/bin/env python
> # Last Change: Wed Oct 04 07:00 PM 2006 J
>
> import numpy as N
> from numpy.random import randn
>
> def _diag_gauss_den(x, mu, va):
>     """ This function is the actual implementation
>     of gaussian pdf in scalar case. It assumes all args
>     are conformant, so it should not be used directly
>
>     Call gauss_den instead"""
>     # Diagonal matrix case
>     d       = mu.size
>     inva    = 1/va[0]
>     fac     = (2*N.pi) ** (-d/2.0) * N.sqrt(inva)
>     y       =  (x[:,0] - mu[0]) ** 2 * inva * -0.5
>     for i in range(1, d):
>         inva    = 1/va[i]
>         fac     *= N.sqrt(inva)
>         y       += (x[:,i] - mu[i]) ** 2 * inva * -0.5
>     y   = fac * N.exp(y)
>
> def multiple_gauss_den1(data, mu, va):
>     """Helper function to generate several Gaussian
>     pdf (different parameters) from the same data: unoptimized version"""
>     K   = mu.shape[0]
>     n   = data.shape[0]
>     d   = data.shape[1]
>
>     y   = N.zeros((n, K))
>     for i in range(K):
>         y[:, i] = _diag_gauss_den(data, mu[i, :], va[i, :])
>
>     return y
>
> def multiple_gauss_den2(data, mu, va):
>     """Helper function to generate several Gaussian
>     pdf (different parameters) from the same data: optimized version"""
>     K   = mu.shape[0]
>     n   = data.shape[0]
>     d   = data.shape[1]
>
>     y   = N.zeros((K, n))
>     for i in range(K):
>         y[i] = _diag_gauss_den(data, mu[i, :], va[i, :])
>
>     return y.T
>
> def bench():
>     #===========================================
>     # GMM of 30 comp, 15 dimension, 1e4 frames
>     #===========================================
>     d       = 15
>     k       = 30
>     nframes = 1e4
>     niter   = 10
>     mode    = 'diag'
>
>     mu      = randn(k, d)
>     va      = randn(k, d) ** 2
>     X       = randn(nframes, d)
>
>     print "============================================================="
>     print "(%d dim, %d components) GMM with %d iterations, for %d frames" \
>             % (d, k, niter, nframes)
>
>     for i in range(niter):
>         y1  = multiple_gauss_den1(X, mu, va)
>
>     for i in range(niter):
>         y2  = multiple_gauss_den2(X, mu, va)
>
>     se  = N.sum(y1-y2)
>
>     print se
>
> if __name__ == '__main__':
>     import hotshot, hotshot.stats
>     profile_file    = 'foo.prof'
>     prof    = hotshot.Profile(profile_file, lineevents=1)
>     prof.runcall(bench)
>     p = hotshot.stats.load(profile_file)
>     print p.sort_stats('cumulative').print_stats(20)
>     prof.close()
>
> I am a bit puzzled by all those C vs F storage, though. In Matlab, where
> the storage was always F as far as I know, I have never encountered such
> differences (eg between y(:, i) and y(:, i)); I don't know if this is
> because I am doing it badly, or because matlab is much more clever than
> numpy at handling those cases, or if that is the price to pay for the
> added flexibility of numpy...
>
> David
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>

Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: David C. <da...@ar...> - 2006-10-05 06:07:30

Bruce Southey wrote:
> Hi,
> I think what you are after is the multivariate normal distribution.
>   
Indeed
> Assuming I have it correctly, it is clearer to see (and probably more
> accurate to compute) in the log form as:
>
> -(N/2)*log(2*PI) - 0.5*log(determinant of V) - 0.5*(transpose of
> (x-mu))*inverse(V)*(x-mu)
>
> where N is the number of observations, PI is math constant, V is the
> known variance co-variance matrix, x is vector of values,  mu is the
> known mean.
>   
Sure, but I need the exponential form at the end (actually, in the real 
code, you can choose between log form and 'standard' form, but I removed 
this to make the example as simple as possible).
> If so, then you can vectorize the density calculation. 
I am not sure to understand what you mean: the computation is already 
vectorized in _diag_gauss_den; there is no loop there, and the function 
expects x to be of shape (n, d), where d is the dimension and n the 
number of samples. The loop in multiple_gaussian in not on samples, but 
on densities, that is I need to compute the normal multivariate 
densities on the same data but with different (vector) means and 
(diagonal ) variances, for which I don't see any easy way to vectorize 
without huge memory usage (using rank 3 arrays).

Anyway, the problem I try to understand here is not related to gaussian 
kernel computation, but rather on cost difference for accessing row and 
columns depending on the underlying storage (C or Fortran). Don't try to 
find flaws in _diag_gauss_den, as it is just a toy example to make my 
point clearer.

David

Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: David C. <da...@ar...> - 2006-10-04 10:57:21

Tim Hochberg wrote:
>>
>>
>> I guess the problem is coming from the fact that y being C order, y[:, 
>> i] needs accessing data in a non 'linear' way. Is there a way to speed 
>> this up ? I did something like this:
>>
>>   y   = N.zeros((K, n))
>>     for i in range(K):
>>         y[i] = gauss_den(data, mu[i, :], va[i, :])
>>     return y.T
>>
>> which works, but I don't like it very much.
> Why not?
>
Mainly because using those transpose do not really reflect the 
intention, and this does not seem natural.
>>  Isn't there any other way 
> That depends on the details of gauss_den.
>
> A general note: for this kind of microoptimization puzzle, it's much 
> easier to help if you can post a self contained example, preferably 
> something fairly simple that still illustrates the speed issue, that we 
> can experiment with.
>
Here we are (the difference may not seem that much between the two 
multiple_ga, but in reality, _diag_gauss_den is an internal function 
which is done in C, and is much faster... By writing this example, I've 
just realized that the function _diag_gauss_den may be slow for exactly 
the same reasons):


#! /usr/bin/env python
# Last Change: Wed Oct 04 07:00 PM 2006 J

import numpy as N
from numpy.random import randn

def _diag_gauss_den(x, mu, va):
    """ This function is the actual implementation
    of gaussian pdf in scalar case. It assumes all args
    are conformant, so it should not be used directly
   
    Call gauss_den instead"""
    # Diagonal matrix case
    d       = mu.size
    inva    = 1/va[0]
    fac     = (2*N.pi) ** (-d/2.0) * N.sqrt(inva)
    y       =  (x[:,0] - mu[0]) ** 2 * inva * -0.5
    for i in range(1, d):
        inva    = 1/va[i]
        fac     *= N.sqrt(inva)
        y       += (x[:,i] - mu[i]) ** 2 * inva * -0.5
    y   = fac * N.exp(y)

def multiple_gauss_den1(data, mu, va):
    """Helper function to generate several Gaussian
    pdf (different parameters) from the same data: unoptimized version"""
    K   = mu.shape[0]
    n   = data.shape[0]
    d   = data.shape[1]
   
    y   = N.zeros((n, K))
    for i in range(K):
        y[:, i] = _diag_gauss_den(data, mu[i, :], va[i, :])

    return y

def multiple_gauss_den2(data, mu, va):
    """Helper function to generate several Gaussian
    pdf (different parameters) from the same data: optimized version"""
    K   = mu.shape[0]
    n   = data.shape[0]
    d   = data.shape[1]
   
    y   = N.zeros((K, n))
    for i in range(K):
        y[i] = _diag_gauss_den(data, mu[i, :], va[i, :])
   
    return y.T

def bench():
    #===========================================
    # GMM of 30 comp, 15 dimension, 1e4 frames
    #===========================================
    d       = 15
    k       = 30
    nframes = 1e4
    niter   = 10
    mode    = 'diag'

    mu      = randn(k, d)
    va      = randn(k, d) ** 2
    X       = randn(nframes, d)

    print "============================================================="
    print "(%d dim, %d components) GMM with %d iterations, for %d frames" \
            % (d, k, niter, nframes)

    for i in range(niter):
        y1  = multiple_gauss_den1(X, mu, va)

    for i in range(niter):
        y2  = multiple_gauss_den2(X, mu, va)

    se  = N.sum(y1-y2)

    print se

if __name__ == '__main__':
    import hotshot, hotshot.stats
    profile_file    = 'foo.prof'
    prof    = hotshot.Profile(profile_file, lineevents=1)
    prof.runcall(bench)
    p = hotshot.stats.load(profile_file)
    print p.sort_stats('cumulative').print_stats(20)
    prof.close()

I am a bit puzzled by all those C vs F storage, though. In Matlab, where 
the storage was always F as far as I know, I have never encountered such 
differences (eg between y(:, i) and y(:, i)); I don't know if this is 
because I am doing it badly, or because matlab is much more clever than 
numpy at handling those cases, or if that is the price to pay for the 
added flexibility of numpy...

David

Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: David C. <da...@ar...> - 2006-10-04 11:12:21

David Cournapeau wrote:
> Here we are (the difference may not seem that much between the two 
> multiple_ga, but in reality, _diag_gauss_den is an internal function 
> which is done in C, and is much faster... By writing this example, I've 
> just realized that the function _diag_gauss_den may be slow for exactly 
> the same reasons):
>
>   
I checked my assumption about the _diag_gauss_den function being slow 
because of the same problem, and this is indeed the case:

If I replace X = randn(nframes, d) by X = randn(d, nframes); X   = X.T, 
the function is mode than twice as fast ! This seems way to much of a 
difference to me....

David