Thread: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

[Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Martin S. <nu...@ms...> - 2006-08-26 10:06:57

Hello,

I'm a bit ignorant of optimization in numpy.

I have a movie with 65535 32x32 frames stored in a 3D array of uint8 
with shape (65535, 32, 32). I load it from an open file f like this:

 >>> import numpy as np
 >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
 >>> data = data.reshape(65535, 32, 32)

I'm picking several thousand frames more or less randomly from 
throughout the movie and finding the mean frame over those frames:

 >>> meanframe = data[frameis].mean(axis=0)

frameis is a 1D array of frame indices with no repeated values in it. If 
it has say 4000 indices in it, then the above line takes about 0.5 sec 
to complete on my system. I'm doing this for a large number of different 
frameis, some of which can have many more indices in them. All this 
takes many minutes to complete, so I'm looking for ways to speed it up.

If I divide it into 2 steps:

 >>> temp = data[frameis]
 >>> meanframe = temp.mean(axis=0)

and time it, I find the first step takes about 0.2 sec, and the second 
takes about 0.3 sec. So it's not just the mean() step, but also the 
indexing step that's taking some time.

If I flatten with ravel:

 >>> temp = data[frameis].ravel()
 >>> meanframe = temp.mean(axis=0)

then the first step still takes about 0.2 sec, but the mean() step drops 
to about 0.1 sec. But of course, this is taking a flat average across 
all pixels in the movie, which isn't what I want to do.

I have a feeling that the culprit is the non contiguity of the movie 
frames being averaged, but I don't know how to proceed.

Any ideas? Could reshaping the data somehow speed things up? Would 
weave.blitz or weave.inline or pyrex help?

I'm running numpy 0.9.8

Thanks,

Martin

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Travis O. <oli...@ie...> - 2006-08-26 10:26:35

Martin Spacek wrote:
> Hello,
>
> I'm a bit ignorant of optimization in numpy.
>
> I have a movie with 65535 32x32 frames stored in a 3D array of uint8 
> with shape (65535, 32, 32). I load it from an open file f like this:
>
>  >>> import numpy as np
>  >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
>  >>> data = data.reshape(65535, 32, 32)
>
> I'm picking several thousand frames more or less randomly from 
> throughout the movie and finding the mean frame over those frames:
>
>  >>> meanframe = data[frameis].mean(axis=0)
>
> frameis is a 1D array of frame indices with no repeated values in it. If 
> it has say 4000 indices in it, then the above line takes about 0.5 sec 
> to complete on my system. I'm doing this for a large number of different 
> frameis, some of which can have many more indices in them. All this 
> takes many minutes to complete, so I'm looking for ways to speed it up.
>
> If I divide it into 2 steps:
>
>  >>> temp = data[frameis]
>  >>> meanframe = temp.mean(axis=0)
>
> and time it, I find the first step takes about 0.2 sec, and the second 
> takes about 0.3 sec. So it's not just the mean() step, but also the 
> indexing step that's taking some time.
>   

If frameis is 1-D, then you should be able to use

temp = data.take(frameis,axis=0) 

for the first step.   This can be quite a bit faster (and is a big 
reason why take is still around).   There are several reasons for this 
(one of which is that index checking is done over the entire list when 
using indexing). 


-Travis

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Francesc A. <fa...@ca...> - 2006-08-26 21:22:42

A Dissabte 26 Agost 2006 12:26, Travis Oliphant va escriure:
> If frameis is 1-D, then you should be able to use
>
> temp =3D data.take(frameis,axis=3D0)
>
> for the first step.   This can be quite a bit faster (and is a big
> reason why take is still around).   There are several reasons for this
> (one of which is that index checking is done over the entire list when
> using indexing).

Well, some days ago I've stumbled on this as well. NumPy manual says=20
that .take() is usually faster than fancy indexing, but my timings shows th=
at=20
this is no longer true in recent versions of NumPy:

In [56]: Timer("b.take(a)","import numpy; a=3Dnumpy.arange(999,-1,-1,=20
dtype=3D'l');b=3Da[:]").repeat(3,1000)
Out[56]: [0.28740906715393066, 0.20345211029052734, 0.20371079444885254]

In [57]: Timer("b[a]","import numpy; a=3Dnumpy.arange(999,-1,-1,=20
dtype=3D'l');b=3Da[:]").repeat(3,1000)
Out[57]: [0.20807695388793945, 0.11684703826904297, 0.11686491966247559]

I've done some profiling on this and it seems that take is using C memmove=
=20
call so as to copy the data, and this is *very* slow, at least in my platfo=
rm=20
(Linux on Intel). On its hand, fancy indexing seems to use an iterator and=
=20
copying the elements one-by-one seems faster. I'd say that replacing memmov=
e=20
by memcpy would make .take() much faster.

Regards,

=2D-=20
>0,0<   Francesc Altet =A0 =A0 http://www.carabos.com/
V   V   C=E1rabos Coop. V. =A0=A0Enjoy Data
 "-"

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Martin S. <nu...@ms...> - 2006-08-27 12:05:33

Travis Oliphant wrote:
> 
> If frameis is 1-D, then you should be able to use
> 
> temp = data.take(frameis,axis=0) 
> 
> for the first step.   This can be quite a bit faster (and is a big 
> reason why take is still around).   There are several reasons for this 
> (one of which is that index checking is done over the entire list when 
> using indexing). 
> 

Yup, that dropped the indexing step down to essentially 0 seconds.

Thanks Travis!

Martin

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Tim H. <tim...@ie...> - 2006-08-26 18:00:11

Martin Spacek wrote:
> Hello,
>
> I'm a bit ignorant of optimization in numpy.
>
> I have a movie with 65535 32x32 frames stored in a 3D array of uint8 
> with shape (65535, 32, 32). I load it from an open file f like this:
>
>  >>> import numpy as np
>  >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
>  >>> data = data.reshape(65535, 32, 32)
>
> I'm picking several thousand frames more or less randomly from 
> throughout the movie and finding the mean frame over those frames:
>
>  >>> meanframe = data[frameis].mean(axis=0)
>
> frameis is a 1D array of frame indices with no repeated values in it. If 
> it has say 4000 indices in it, then the above line takes about 0.5 sec 
> to complete on my system. I'm doing this for a large number of different 
> frameis, some of which can have many more indices in them. All this 
> takes many minutes to complete, so I'm looking for ways to speed it up.
>
> If I divide it into 2 steps:
>
>  >>> temp = data[frameis]
>  >>> meanframe = temp.mean(axis=0)
>
> and time it, I find the first step takes about 0.2 sec, and the second 
> takes about 0.3 sec. So it's not just the mean() step, but also the 
> indexing step that's taking some time.
>
> If I flatten with ravel:
>
>  >>> temp = data[frameis].ravel()
>  >>> meanframe = temp.mean(axis=0)
>
> then the first step still takes about 0.2 sec, but the mean() step drops 
> to about 0.1 sec. But of course, this is taking a flat average across 
> all pixels in the movie, which isn't what I want to do.
>
> I have a feeling that the culprit is the non contiguity of the movie 
> frames being averaged, but I don't know how to proceed.
>
> Any ideas? Could reshaping the data somehow speed things up? Would 
> weave.blitz or weave.inline or pyrex help?
>
> I'm running numpy 0.9.8
>
> Thanks,
>
> Martin
>   
Martin,

Here's an approach (mean_accumulate) that avoids making any copies of 
the data. It runs almost 4x as fast as your approach (called baseline 
here) on my box. Perhaps this will be useful:

    frames = 65535
    samples = 4000

    data = (256 * np.random.random((frames, 32, 32))).astype(np.uint8)
    indices = np.arange(frames)
    random.shuffle(indices)
    indices = indices[:samples]

    def mean_baseline(data, indices):
        return data[indices].mean(axis=0)
       
    def mean_accumulate(data, indices):
        result = np.zeros([32, 32], float)
        for i in indices:
            result += data[i]
        result /= len(indices)
        return result
       
    if __name__ == "__main__":
        import timeit
        print mean_baseline(data, indices)[0,:8]
        print timeit.Timer("s.mean_baseline(s.data, s.indices)", "import
    scratch as s").timeit(10)
        print mean_accumulate(data, indices)[0,:8]
        print timeit.Timer("s.mean_accumulate(s.data, s.indices)",
    "import scratch as s").timeit(10)


This gives:

    [ 126.947    127.39175  128.03725  129.83425  127.98925  126.866   
    128.5352 127.6205 ]
    3.95907664242
    [ 126.947    127.39175  128.03725  129.83425  127.98925  126.866   
    128.53525 127.6205 ]
    1.06913644053

I also wondered if sorting indices would help since it would help 
improve locality of reference, but when I measured that it appeared to 
help not at all.

regards,

-tim

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Martin S. <nu...@ms...> - 2006-08-27 12:28:14

Tim Hochberg wrote:

> Here's an approach (mean_accumulate) that avoids making any copies of 
> the data. It runs almost 4x as fast as your approach (called baseline 
> here) on my box. Perhaps this will be useful:
> 
--snip--
> def mean_accumulate(data, indices):
>     result = np.zeros([32, 32], float)
>     for i in indices:
>         result += data[i]
>     result /= len(indices)
>     return result

Great! I got a roughly 9x speed improvement using take() in combination 
with this approach. Thanks Tim!

Here's what my code looks like now:

 >>> def mean_accum(data):
 >>>     result = np.zeros(data[0].shape, np.float64)
 >>>     for dataslice in data:
 >>>         result += dataslice
 >>>     result /= len(data)
 >>>     return result
 >>>
 >>> # frameis are int64
 >>> frames = data.take(frameis.astype(np.int32), axis=0)
 >>> meanframe = mean_accum(frames)

I'm surprised that using a python for loop is faster than the built-in 
mean method. I suppose mean() can't perform the same in-place operations 
because in certain cases doing so would fail?

Martin

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Tim H. <tim...@ie...> - 2006-08-27 15:37:27

Martin Spacek wrote:
> Tim Hochberg wrote:
>
>   
>> Here's an approach (mean_accumulate) that avoids making any copies of 
>> the data. It runs almost 4x as fast as your approach (called baseline 
>> here) on my box. Perhaps this will be useful:
>>
>>     
> --snip--
>   
>> def mean_accumulate(data, indices):
>>     result = np.zeros([32, 32], float)
>>     for i in indices:
>>         result += data[i]
>>     result /= len(indices)
>>     return result
>>     
>
> Great! I got a roughly 9x speed improvement using take() in combination 
> with this approach. Thanks Tim!
>
> Here's what my code looks like now:
>
>  >>> def mean_accum(data):
>  >>>     result = np.zeros(data[0].shape, np.float64)
>  >>>     for dataslice in data:
>  >>>         result += dataslice
>  >>>     result /= len(data)
>  >>>     return result
>  >>>
>  >>> # frameis are int64
>  >>> frames = data.take(frameis.astype(np.int32), axis=0)
>  >>> meanframe = mean_accum(frames)
>
> I'm surprised that using a python for loop is faster than the built-in 
> mean method. I suppose mean() can't perform the same in-place operations 
> because in certain cases doing so would fail?
>   
I'm not sure why mean is slow here, although possibly it's a locality 
issue -- mean likely computes along axis zero each time, which means 
it's killing the cache -- and on the other hand the accumulate version 
is cache friendly.  One thing to keep in mind about python for loops is 
that they are slow if you are doing a simple computation inside (a 
single add for instance). IIRC, they are 10's of times slower. However, 
here one is doing 1000 odd operations in the inner loop, so the loop 
overhead is minimal.

(What would be perfect here is something just like take, but that 
returned an iterator instead of a new array as that could be done with 
no copying -- unfortunately such a beast does not exist as far as I know)

I'm actually surprised that the take version is faster than my original 
version since it makes a big  ol' copy. I guess this is an indication 
that indexing is more expensive than I realize. That's why nothing beats 
measuring!

An experiment to reshape your data so that it's friendly to mean 
(assuming it really does operate on axis zero) and try that. However, 
this turns out to be a huge pesimization, mostly because take + 
transpose is pretty slow.

-tim


> Martin
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Martin S. <nu...@ms...> - 2006-08-28 07:02:07

Tim Hochberg wrote:

> I'm actually surprised that the take version is faster than my original 
> version since it makes a big  ol' copy. I guess this is an indication 
> that indexing is more expensive than I realize. That's why nothing beats 
> measuring!

Actually, your original version is just as fast as the take() version. 
Both are about 9X faster than numpy.mean() on my system. I prefer the 
take() version because you only have to pass a single argument to 
mean_accum()

Martin

Re: [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

From: Martin S. <nu...@ms...> - 2006-08-28 07:13:18

Martin Spacek wrote:

> 
> Actually, your original version is just as fast as the take() version. 
> Both are about 9X faster than numpy.mean() on my system. I prefer the 
> take() version because you only have to pass a single argument to 
> mean_accum()

I forgot to mention that all my indices are, for now, sorted. I just 
tried shuffling them (as you did), but I still get the same 9x 
improvement in speed, so I don't know why you only get a 4x improvement 
on your system.

Martin