Thread: [Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

[Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

From: Torgil S. <tor...@gm...> - 2006-08-26 17:02:56

Hi

ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I
first thought I had a performance issue but discovered that std() used
lots of memory and therefore caused lots of swapping.

I want to get an array where element i is the stadard deviation of row
i in the 2D array. Using valgrind on the std() function...

$ valgrind --tool=massif python -c "from numpy import *;
a=reshape(arange(100000*100),(100000,100)).std(axis=1)"

... showed me a peak of 200Mb memory while iterating line by line...

$ valgrind --tool=massif python -c "from numpy import *;
a=array([x.std() for x in reshape(arange(100000*100),(100000,100))])"

... got a peak of 40Mb memory.

This seems unnecessary since we know before calculations what the
output shape will be and should therefore be able to preallocate
memory.


My original problem was to get an moving average and a moving standard
deviation (120k rows and N=1000). For average I guess convolve should
perform good, but is there anything smart for std()? For now I use ...

>>> moving_std=array([a[i:i+n].std() for i in range(len(a)-n)])

which seems to perform quite well.

BR,

//Torgil

Re: [Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

From: Charles R H. <cha...@gm...> - 2006-08-26 17:49:35

On 8/26/06, Torgil Svensson <tor...@gm...> wrote:
>
> Hi
>
> ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I
> first thought I had a performance issue but discovered that std() used
> lots of memory and therefore caused lots of swapping.
>
> I want to get an array where element i is the stadard deviation of row
> i in the 2D array. Using valgrind on the std() function...
>
> $ valgrind --tool=massif python -c "from numpy import *;
> a=reshape(arange(100000*100),(100000,100)).std(axis=1)"
>
> ... showed me a peak of 200Mb memory while iterating line by line...
>
> $ valgrind --tool=massif python -c "from numpy import *;
> a=array([x.std() for x in reshape(arange(100000*100),(100000,100))])"
>
> ... got a peak of 40Mb memory.
>
> This seems unnecessary since we know before calculations what the
> output shape will be and should therefore be able to preallocate
> memory.
>
>
> My original problem was to get an moving average and a moving standard
> deviation (120k rows and N=1000). For average I guess convolve should
> perform good, but is there anything smart for std()? For now I use ...


Why not use convolve for the std also? You can't subtract the average first,
but you could convolve the square of the vector and then use some variant of
std = sqrt((convsqrs - n*avg**2)/(n-1)). There are possible precision
problems but they may not matter for your application, especially if the
moving window isn't really big.

Chuck

Re: [Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

From: Rob H. <ro...@ho...> - 2006-08-27 06:46:49

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Torgil Svensson wrote:

> My original problem was to get an moving average and a moving standard
> deviation (120k rows and N=1000). For average I guess convolve should
> perform good, but is there anything smart for std()? For now I use ...
> 
>>>> moving_std=array([a[i:i+n].std() for i in range(len(a)-n)])
> 
> which seems to perform quite well.

You can always look for more fancy and unreadable solutions, but since
this one has the inner loop with a reasonable vector length (1000) coded
 in C, one can guess that the performance will be reasonable. I would
start looking for alternatives only if N drops significantly, say to <50.

Rob

- --
Rob W.W. Hooft  ||  ro...@ho...  ||  http://www.hooft.net/people/rob/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE8T/QH7J/Cv8rb3QRAtutAKCikJ1qLbedU4pNl7ZohHCLEAWVKACgji9R
6evNgk6R68/JnimUs4OOd98=
=htbE
-----END PGP SIGNATURE-----

Re: [Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

From: Travis O. <oli...@ie...> - 2006-08-27 06:49:52

Torgil Svensson wrote:
> Hi
>
> ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I
> first thought I had a performance issue but discovered that std() used
> lots of memory and therefore caused lots of swapping.
>   
There are certainly lots of intermediate arrays created as the 
calculation proceeds.  The calculation is not particularly "smart."  It 
just does the basic averaging and multiplication needed.

> I want to get an array where element i is the stadard deviation of row
> i in the 2D array. Using valgrind on the std() function...
>
> $ valgrind --tool=massif python -c "from numpy import *;
> a=reshape(arange(100000*100),(100000,100)).std(axis=1)"
>
> ... showed me a peak of 200Mb memory while iterating line by line...
>
>   
The C-code is basically a directy "translation" of the original Python 
code.  There are lots of temporaries created (apparently 5 at one point 
:-).  I did this before I had the _internal.py code in place where I 
place Python functions that need to be accessed from C.  If I had to do 
it over again, I would place the std implementation there where it could 
be appropriately optimized.



-Travis

Re: [Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

From: Torgil S. <tor...@gm...> - 2006-08-29 00:31:07

> The C-code is basically a directy "translation" of the original Python
> code.
...
> If I had to do it over again, I would place the std implementation there where
> it could be appropriately optimized.

Isn't C-code a good place for optimizations?

//Torgil


On 8/27/06, Travis Oliphant <oli...@ie...> wrote:
> Torgil Svensson wrote:
> > Hi
> >
> > ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I
> > first thought I had a performance issue but discovered that std() used
> > lots of memory and therefore caused lots of swapping.
> >
> There are certainly lots of intermediate arrays created as the
> calculation proceeds.  The calculation is not particularly "smart."  It
> just does the basic averaging and multiplication needed.
>
> > I want to get an array where element i is the stadard deviation of row
> > i in the 2D array. Using valgrind on the std() function...
> >
> > $ valgrind --tool=massif python -c "from numpy import *;
> > a=reshape(arange(100000*100),(100000,100)).std(axis=1)"
> >
> > ... showed me a peak of 200Mb memory while iterating line by line...
> >
> >
> The C-code is basically a directy "translation" of the original Python
> code.  There are lots of temporaries created (apparently 5 at one point
> :-).  I did this before I had the _internal.py code in place where I
> place Python functions that need to be accessed from C.  If I had to do
> it over again, I would place the std implementation there where it could
> be appropriately optimized.
>
>
>
> -Travis
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>