numpy-discussion Mailing List for Numerical Python (Page 49)

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion — Discussion list for all users of Numerical Python

You can subscribe to this list here.

2000	Jan (8)	Feb (49)	Mar (48)	Apr (28)	May (37)	Jun (28)	Jul (16)	Aug (16)	Sep (44)	Oct (61)	Nov (31)	Dec (24)
2001	Jan (56)	Feb (54)	Mar (41)	Apr (71)	May (48)	Jun (32)	Jul (53)	Aug (91)	Sep (56)	Oct (33)	Nov (81)	Dec (54)
2002	Jan (72)	Feb (37)	Mar (126)	Apr (62)	May (34)	Jun (124)	Jul (36)	Aug (34)	Sep (60)	Oct (37)	Nov (23)	Dec (104)
2003	Jan (110)	Feb (73)	Mar (42)	Apr (8)	May (76)	Jun (14)	Jul (52)	Aug (26)	Sep (108)	Oct (82)	Nov (89)	Dec (94)
2004	Jan (117)	Feb (86)	Mar (75)	Apr (55)	May (75)	Jun (160)	Jul (152)	Aug (86)	Sep (75)	Oct (134)	Nov (62)	Dec (60)
2005	Jan (187)	Feb (318)	Mar (296)	Apr (205)	May (84)	Jun (63)	Jul (122)	Aug (59)	Sep (66)	Oct (148)	Nov (120)	Dec (70)
2006	Jan (460)	Feb (683)	Mar (589)	Apr (559)	May (445)	Jun (712)	Jul (815)	Aug (663)	Sep (559)	Oct (930)	Nov (373)	Dec

Flat | Threaded

<< < 1 .. 47 48 49 50 51 .. 480 > >> (Page 49 of 480)

[Numpy-discussion] complex numbers

From: Kenny O. <kor...@id...> - 2006-10-03 20:02:33

excuse my laziness for not looking this up, I googled it but could not find 
a solution.
matlab has a
isreal(array)
where if the array is full of real numbers the value returned is 1.
I'm translating matlab code and ran across

if ~isreal(array)
    array = abs(array)

Is there a way to check to see if a number is real or complex? and if so is 
there a way to extract the(a + ib) because the absolute value of a complex 
number is like the pythagorean therom on a and b?

thanks for your help,
Kenny

Re: [Numpy-discussion] Fastest way of distinguish a numpy scalar of a python scalar?

From: Travis O. <oli...@ee...> - 2006-10-03 17:47:55

Tim Hochberg wrote:

>Francesc Altet wrote:
>  
>
>>Hi,
>>
>>I thought that numpy.isscalar was a good way of distinguising a numpy scalar 
>>from a python scalar, but it seems not:
>>
>>  
>>    
>>
>>>>>numpy.isscalar(numpy.string_('3'))
>>>>>        
>>>>>          
>>>>>
>>True
>>  
>>    
>>
>>>>>numpy.isscalar('3')
>>>>>        
>>>>>          
>>>>>
>>True
>>
>>Is there an easy (and fast, if possible) way to check whether an object is a 
>>numpy scalar or a python one?
>>
>>  
>>    
>>
>It looks like isinstance(x, numpy.generic) works, but I didn't test it 
>extensively.
>
>  
>
That should definitely work. All the array scalars are in a hierarchy 
inheriting from numpy.generic.  There are also sub-levels 
(numpy.integer, numpy.inexact, etc...)


-Travis

Re: [Numpy-discussion] Fastest way of distinguish a numpy scalar of a python scalar?

From: Travis O. <oli...@ee...> - 2006-10-03 17:47:09

Sebastian Haase wrote:

>Hi,
>a noticed the underscore in "numpy.string_"  ...
>I thought the underscores were removed in favour of handling the 
>"from numpy import *" separately via the __all__ variable.
>Is this done only for *some* members of numpy ?
>  
>
Yeah.  The underscores are still on the conflicting type names.

-Travis

Re: [Numpy-discussion] numarray and/or Numeric with python 2.5?

From: Russell E. O. <ro...@ce...> - 2006-10-03 17:35:49

In article <452...@st...>, Todd Miller <jm...@st...> 
wrote:

> Russell E. Owen wrote:
> > Has anyone tried to build numarray or Numeric with python 2.5? Since the
> >   
> numarray builds and runs fine for Python-2.5,  but only as 32-bit.   
> > type of array index was changed, I'm wondering if it works (and if so, 
> >   
> NOTE: numarray does not exploit the new ssize_t index in Python-2.5 so 
> numarray is not 64-bit enabled.

Thank you very much. That's great. It means my current code is 
compatible with Python 2.5 (assuming Numeric works the same way, which 
seems likely). I am really looking forward to switching to numpy, but 
now is just a bit too early.

-- Russell

Re: [Numpy-discussion] Fastest way of distinguish a numpy scalar of a python scalar?

From: Tim H. <tim...@ie...> - 2006-10-03 17:27:27

Francesc Altet wrote:
> Hi,
>
> I thought that numpy.isscalar was a good way of distinguising a numpy scalar 
> from a python scalar, but it seems not:
>
>   
>>>> numpy.isscalar(numpy.string_('3'))
>>>>         
> True
>   
>>>> numpy.isscalar('3')
>>>>         
> True
>
> Is there an easy (and fast, if possible) way to check whether an object is a 
> numpy scalar or a python one?
>
>   
It looks like isinstance(x, numpy.generic) works, but I didn't test it 
extensively.

-tim

Re: [Numpy-discussion] Fastest way of distinguish a numpy scalar of a python scalar?

From: Sebastian H. <ha...@ms...> - 2006-10-03 17:24:31

Hi,
a noticed the underscore in "numpy.string_"  ...
I thought the underscores were removed in favour of handling the 
"from numpy import *" separately via the __all__ variable.
Is this done only for *some* members of numpy ?

(sorry for hijacking your thread ...)
-Sebastian Haase

On Tuesday 03 October 2006 09:55, Francesc Altet wrote:
> Hi,
>
> I thought that numpy.isscalar was a good way of distinguising a numpy
> scalar
>
> from a python scalar, but it seems not:
> >>> numpy.isscalar(numpy.string_('3'))
>
> True
>
> >>> numpy.isscalar('3')
>
> True
>
> Is there an easy (and fast, if possible) way to check whether an object is
> a numpy scalar or a python one?
>
> Thanks,

Re: [Numpy-discussion] ValueError: object too deep for

From: Kenny O. <kor...@id...> - 2006-10-03 17:03:17

Yeah thanks guys, I printed them out the size and realized what it was doing 
then used a for loop to put it into the 1D array, but thanks Travis the way 
you said makes it easier than the for loop.

-Kenny

[Numpy-discussion] Fastest way of distinguish a numpy scalar of a python scalar?

From: Francesc A. <fa...@ca...> - 2006-10-03 16:55:40

Hi,

I thought that numpy.isscalar was a good way of distinguising a numpy scala=
r=20
from a python scalar, but it seems not:

>>> numpy.isscalar(numpy.string_('3'))
True
>>> numpy.isscalar('3')
True

Is there an easy (and fast, if possible) way to check whether an object is =
a=20
numpy scalar or a python one?

Thanks,

=2D-=20
>0,0<   Francesc Altet =A0 =A0 http://www.carabos.com/
V   V   C=E1rabos Coop. V. =A0=A0Enjoy Data
 "-"

Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: Tim H. <tim...@ie...> - 2006-10-03 16:24:26

David Cournapeau wrote:
> Hi,
>
>     I was wondering if there was any way to speed up the following code:
>
>  y   = N.zeros((n, K))
>  for i in range(K):
>      y[:, i] = gauss_den(data, mu[i, :], va[i, :])
>
> Where K is of order 1e1, n of order 1e5. Normally, gauss_den is a quite 
> expensive function, but the profiler tells me that the indexing y[:,i] 
> takes almost as much time as the gauss_den computation (which computes n 
> exp !). To see if the profiler is "right", i replaces with the (non 
> valid) following function:
>
>  y   = N.zeros((n, K))
>     for i in range(K):
>         yt = gauss_den(data, mu[i, :], va[i, :])
>     return y
>
> Where more than 99% of the code is spent inside gauss_den.
>
> I guess the problem is coming from the fact that y being C order, y[:, 
> i] needs accessing data in a non 'linear' way. Is there a way to speed 
> this up ? I did something like this:
>
>   y   = N.zeros((K, n))
>     for i in range(K):
>         y[i] = gauss_den(data, mu[i, :], va[i, :])
>     return y.T
>
> which works, but I don't like it very much.
Why not?

>  Isn't there any other way 
That depends on the details of gauss_den.

A general note: for this kind of microoptimization puzzle, it's much 
easier to help if you can post a self contained example, preferably 
something fairly simple that still illustrates the speed issue, that we 
can experiment with.


-tim

Re: [Numpy-discussion] ValueError: object too deep for desired array

From: Tim H. <tim...@ie...> - 2006-10-03 16:16:12

kor...@id... wrote:
> Microsoft Windows XP [Version 5.1.2600]
> (C) Copyright 1985-2001 Microsoft Corp.
>
> C:\Documents and Settings\kenny>cd c:\lameness
>
> C:\lameness>c:\Python24\python.exe templatewindow.py
> a = [[  1.00013175e+00   2.63483019e-05   1.00006740e+00   5.22246363e-05
>     1.00008735e+00  -2.77694969e-07  -1.30756273e-07   1.03202730e-06]
>  [  2.63483019e-05   6.95644927e-05  -7.15426839e-07   1.99534228e-05
>     5.29400631e-05  -3.07638369e-09  -5.52729618e-06  -7.61431767e-06]
>  [  1.00006740e+00  -7.15426839e-07   1.00011917e+00   2.50407751e-05
>     1.00006219e+00  -2.77757947e-07  -1.30856101e-07   1.23058301e-07]
>  [  5.22246363e-05   1.99534228e-05   2.50407751e-05   8.21505582e-05
>     5.26966037e-05  -6.13563429e-09  -2.76420755e-06  -3.80791858e-06]
>  [  1.00008735e+00   5.29400631e-05   1.00006219e+00   5.26966037e-05
>     1.00020132e+00   5.01389982e-05   3.45959412e-05   3.17129503e-05]
>  [ -2.77694969e-07  -3.07638369e-09  -2.77757947e-07  -6.13563429e-09
>     5.01389982e-05   3.45959412e-05   3.17129503e-05   3.27035490e-05]
>  [ -1.30756273e-07  -5.52729618e-06  -1.30856101e-07  -2.76420755e-06
>     3.45959412e-05   3.17129503e-05   3.27035490e-05   3.59732704e-05]
>  [  1.03202730e-06  -7.61431767e-06   1.23058301e-07  -3.80791858e-06
>     3.17129503e-05   3.27035490e-05   3.59732704e-05   4.12184645e-05]]
> F = [[ -1.44231014e+03]
>  [ -7.54006917e-02]
>  [ -1.44227222e+03]
>  [ -7.49199956e-02]
>  [ -1.44242446e+03]
>  [ -4.24684780e-02]
>  [ -2.49566072e-02]
>  [ -2.16978637e-02]]
> Traceback (most recent call last):
>   File "C:\lameness\PlotPanel.py", line 476, in OnNo
>     self.parent.peaktime()
>   File "C:\lameness\PlotPanel.py", line 64, in peaktime
>     self._replot()
>   File "C:\lameness\PlotPanel.py", line 159, in _replot
>     self.drawpeak()
>   File "C:\lameness\PlotPanel.py", line 233, in drawpeak
>     COE[:,j1-1] = linalg.solve(A,F)
> ValueError: object too deep for desired array
>
> Anyone know what this error could be coming from?  Is it because the
> floating point numbers are too large, memory wise?\
>   
No, this error has to do with the number of dimensions being wrong 
somewhere. Try printing out the shapes of COE[:,j-1] and 
linalg.solve(A,F); you'll probably see that they don't match.

-tim

Re: [Numpy-discussion] ValueError: object too deep for desired array

From: Travis O. <oli...@ee...> - 2006-10-03 16:14:33

kor...@id... wrote:

>Microsoft Windows XP [Version 5.1.2600]
>(C) Copyright 1985-2001 Microsoft Corp.
>
>C:\Documents and Settings\kenny>cd c:\lameness
>
>C:\lameness>c:\Python24\python.exe templatewindow.py
>a = [[  1.00013175e+00   2.63483019e-05   1.00006740e+00   5.22246363e-05
>    1.00008735e+00  -2.77694969e-07  -1.30756273e-07   1.03202730e-06]
> [  2.63483019e-05   6.95644927e-05  -7.15426839e-07   1.99534228e-05
>    5.29400631e-05  -3.07638369e-09  -5.52729618e-06  -7.61431767e-06]
> [  1.00006740e+00  -7.15426839e-07   1.00011917e+00   2.50407751e-05
>    1.00006219e+00  -2.77757947e-07  -1.30856101e-07   1.23058301e-07]
> [  5.22246363e-05   1.99534228e-05   2.50407751e-05   8.21505582e-05
>    5.26966037e-05  -6.13563429e-09  -2.76420755e-06  -3.80791858e-06]
> [  1.00008735e+00   5.29400631e-05   1.00006219e+00   5.26966037e-05
>    1.00020132e+00   5.01389982e-05   3.45959412e-05   3.17129503e-05]
> [ -2.77694969e-07  -3.07638369e-09  -2.77757947e-07  -6.13563429e-09
>    5.01389982e-05   3.45959412e-05   3.17129503e-05   3.27035490e-05]
> [ -1.30756273e-07  -5.52729618e-06  -1.30856101e-07  -2.76420755e-06
>    3.45959412e-05   3.17129503e-05   3.27035490e-05   3.59732704e-05]
> [  1.03202730e-06  -7.61431767e-06   1.23058301e-07  -3.80791858e-06
>    3.17129503e-05   3.27035490e-05   3.59732704e-05   4.12184645e-05]]
>F = [[ -1.44231014e+03]
> [ -7.54006917e-02]
> [ -1.44227222e+03]
> [ -7.49199956e-02]
> [ -1.44242446e+03]
> [ -4.24684780e-02]
> [ -2.49566072e-02]
> [ -2.16978637e-02]]
>Traceback (most recent call last):
>  File "C:\lameness\PlotPanel.py", line 476, in OnNo
>    self.parent.peaktime()
>  File "C:\lameness\PlotPanel.py", line 64, in peaktime
>    self._replot()
>  File "C:\lameness\PlotPanel.py", line 159, in _replot
>    self.drawpeak()
>  File "C:\lameness\PlotPanel.py", line 233, in drawpeak
>    COE[:,j1-1] = linalg.solve(A,F)
>ValueError: object too deep for desired array
>
>Anyone know what this error could be coming from?  Is it because the
>floating point numbers are too large, memory wise?
>  
>
No, it's because you are trying to fit a 2-d array into a 1-d position. 

linalg.solve(A,F) produces a 2-d array in this instance (because F is a 
2-d array). 

COE[:,j1-1:] =linalg.solve(A,F)

should work

or squeezing the result of the solution to a 1-d array would also work.

-Travis

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Tim H. <tim...@ie...> - 2006-10-03 16:12:54

Travis Oliphant wrote:
> Albert Strasheim wrote:
>   
>>>> [1] 12.97% of function time
>>>> [2] 8.65% of functiont ime
>>>> [3] 62.14% of function time
>>>>
>>>> If statistics from elsewhere in the code would be helpful, let me 
>>>> know,
>>>>       
>>>>         
>>> and
>>>     
>>>       
>>>> I'll see if I can convince Quantify to cough it up.
>>>>
>>>>       
>>>>         
>>> Please run the same test but using
>>>
>>> x1 = N.random.rand(39,2000)
>>> x2 = N.random.rand(39,64,1)
>>>
>>> z1 = x1[:,N.newaxis,:] - x2
>>>     
>>>       
>> Very similar results to what I had previously:
>>
>> [1] 10.88%
>> [2] 7.25%
>> [3] 68.25%
>>
>>   
>>     
> Thanks,
>
> I've got some ideas about how to speed this up by eliminating some of 
> the unnecessary calculations  going on outside of the function loop, but 
> there will still be some speed issues depending on how the array is 
> traversed once you get above a certain size.   I'm not sure there anyway 
> around that, ultimately, due to memory access being slow on most hardware. 
>
> If anyone has any ideas, I'd love to hear them.    I won't be able to 
> get to implementing my ideas until at least Friday (also when rc2 will 
> be released).
>
>   
I had an idea regarding which axis to operate on first. Rather than 
operate on strictly the longest axis or strictly the innermost axis, a 
hybrid approach could be used. We would operate on the longest axis that 
would not result in the inner loop overflowing the cache. The idea is 
minimize the loop overhead as we do now by choosing the largest axis, 
while at the same time attempting to maintain cache friendliness.

Let's suppose that it's safe to stuff 2000 array elements into the 
cache[1], and lets consider two arrays where the lengths of the axes[2], 
from innermost to outermost are [10,100,1000] and [100,10,1000]. Under 
this scheme, we would loop over the length 100 axis in both cases. 
Looping over the length-1000 axis pull 1e6 elements into the cache. If 
the innermost axis alone exceeded our cache imposed limit, we would of 
course just loop over that axis first.

-tim

[1] This would really be specified in bytes and thus the number of 
allowable elements would vary by data type. It might also need to be 
tuned based on architecture.
[2] I believe that there's some consolidation of adjacent axes that goes 
on where possible, let's assume that's already happened at this point.

[Numpy-discussion] ValueError: object too deep for desired array

From: <kor...@id...> - 2006-10-03 16:03:36

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\kenny>cd c:\lameness

C:\lameness>c:\Python24\python.exe templatewindow.py
a = [[  1.00013175e+00   2.63483019e-05   1.00006740e+00   5.22246363e-05
    1.00008735e+00  -2.77694969e-07  -1.30756273e-07   1.03202730e-06]
 [  2.63483019e-05   6.95644927e-05  -7.15426839e-07   1.99534228e-05
    5.29400631e-05  -3.07638369e-09  -5.52729618e-06  -7.61431767e-06]
 [  1.00006740e+00  -7.15426839e-07   1.00011917e+00   2.50407751e-05
    1.00006219e+00  -2.77757947e-07  -1.30856101e-07   1.23058301e-07]
 [  5.22246363e-05   1.99534228e-05   2.50407751e-05   8.21505582e-05
    5.26966037e-05  -6.13563429e-09  -2.76420755e-06  -3.80791858e-06]
 [  1.00008735e+00   5.29400631e-05   1.00006219e+00   5.26966037e-05
    1.00020132e+00   5.01389982e-05   3.45959412e-05   3.17129503e-05]
 [ -2.77694969e-07  -3.07638369e-09  -2.77757947e-07  -6.13563429e-09
    5.01389982e-05   3.45959412e-05   3.17129503e-05   3.27035490e-05]
 [ -1.30756273e-07  -5.52729618e-06  -1.30856101e-07  -2.76420755e-06
    3.45959412e-05   3.17129503e-05   3.27035490e-05   3.59732704e-05]
 [  1.03202730e-06  -7.61431767e-06   1.23058301e-07  -3.80791858e-06
    3.17129503e-05   3.27035490e-05   3.59732704e-05   4.12184645e-05]]
F = [[ -1.44231014e+03]
 [ -7.54006917e-02]
 [ -1.44227222e+03]
 [ -7.49199956e-02]
 [ -1.44242446e+03]
 [ -4.24684780e-02]
 [ -2.49566072e-02]
 [ -2.16978637e-02]]
Traceback (most recent call last):
  File "C:\lameness\PlotPanel.py", line 476, in OnNo
    self.parent.peaktime()
  File "C:\lameness\PlotPanel.py", line 64, in peaktime
    self._replot()
  File "C:\lameness\PlotPanel.py", line 159, in _replot
    self.drawpeak()
  File "C:\lameness\PlotPanel.py", line 233, in drawpeak
    COE[:,j1-1] = linalg.solve(A,F)
ValueError: object too deep for desired array

Anyone know what this error could be coming from?  Is it because the
floating point numbers are too large, memory wise?


-Kenny

[Numpy-discussion] RC2 to be released Friday

From: Travis O. <oli...@ie...> - 2006-10-03 15:03:49

I'm going to have to put off release of rc2 for Friday.  I'm just too 
busy right now.  That might help us get some speed-ups into the 
NOBUFFER_UFUNCLOOP code as well.

My speed-up ideas are:

1) Only keep track of 1 set of coordinates instead of self->nargs sets 
(1 for each iterator).
2) Keep track of the dataptr for each iterator in the bufptr array (so a 
copy isn't necessary)
3) Not increment the index on each iterator separately.

All of these changes will be made directly in the NOBUFFER_UFUNCLOOP code. 

More generally, it would be nice to take these ideas and push them into 
other areas of the code --- perhaps through the multi-iterator that is 
already present.  Probably, this will have to wait until 1.0.1 though.

-Travis

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Travis O. <oli...@ie...> - 2006-10-03 14:58:35

Albert Strasheim wrote:
>>> [1] 12.97% of function time
>>> [2] 8.65% of functiont ime
>>> [3] 62.14% of function time
>>>
>>> If statistics from elsewhere in the code would be helpful, let me 
>>> know,
>>>       
>> and
>>     
>>> I'll see if I can convince Quantify to cough it up.
>>>
>>>       
>> Please run the same test but using
>>
>> x1 = N.random.rand(39,2000)
>> x2 = N.random.rand(39,64,1)
>>
>> z1 = x1[:,N.newaxis,:] - x2
>>     
>
> Very similar results to what I had previously:
>
> [1] 10.88%
> [2] 7.25%
> [3] 68.25%
>
>   
Thanks,

I've got some ideas about how to speed this up by eliminating some of 
the unnecessary calculations  going on outside of the function loop, but 
there will still be some speed issues depending on how the array is 
traversed once you get above a certain size.   I'm not sure there anyway 
around that, ultimately, due to memory access being slow on most hardware. 

If anyone has any ideas, I'd love to hear them.    I won't be able to 
get to implementing my ideas until at least Friday (also when rc2 will 
be released).

-Travis

[Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

From: David C. <da...@ar...> - 2006-10-03 10:54:16

Hi,

    I was wondering if there was any way to speed up the following code:

 y   = N.zeros((n, K))
 for i in range(K):
     y[:, i] = gauss_den(data, mu[i, :], va[i, :])

Where K is of order 1e1, n of order 1e5. Normally, gauss_den is a quite 
expensive function, but the profiler tells me that the indexing y[:,i] 
takes almost as much time as the gauss_den computation (which computes n 
exp !). To see if the profiler is "right", i replaces with the (non 
valid) following function:

 y   = N.zeros((n, K))
    for i in range(K):
        yt = gauss_den(data, mu[i, :], va[i, :])
    return y

Where more than 99% of the code is spent inside gauss_den.

I guess the problem is coming from the fact that y being C order, y[:, 
i] needs accessing data in a non 'linear' way. Is there a way to speed 
this up ? I did something like this:

  y   = N.zeros((K, n))
    for i in range(K):
        y[i] = gauss_den(data, mu[i, :], va[i, :])
    return y.T

which works, but I don't like it very much. Isn't there any other way ?

David

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Albert S. <136...@su...> - 2006-10-03 09:12:38

Hello all

> -----Original Message-----
> From: Travis Oliphant [mailto:oli...@ee...]
> Sent: 03 October 2006 03:51
> To: Discussion of Numerical Python; Albert Strasheim
> Subject: Re: ***[Possible UCE]*** Re: [Numpy-discussion] Vectorizing 
> code, for loops, and all that
> 
> <snip>

> >Meanwhile, I can confirm that the NOBUFFER_UFUNCLOOP case in 
> >PyUFunc_GenericFunction is getting exercised in the slower case. 
> >Here's
> some
> >info on what's happening, courtesy of Rational Quantify:
> >
> >case NOBUFFER_UFUNCLOOP:
> >while (loop->index < loop->size) {
> >  for (i=0; i<self->nargs; i++)
> >    loop->bufptr[i] = loop->iters[i]->dataptr; [1]
> >
> >  loop->function((char **)loop->bufptr, &(loop->bufcnt),
> >                 loop->steps, loop->funcdata); [2]  
> > UFUNC_CHECK_ERROR(loop);
> >
> >  for (i=0; i<self->nargs; i++) {
> >      PyArray_ITER_NEXT(loop->iters[i]); [3]
> >  }
> >  loop->index++;
> >}
> >break;
> >
> >[1] 12.97% of function time
> >[2] 8.65% of functiont ime
> >[3] 62.14% of function time
> >
> >If statistics from elsewhere in the code would be helpful, let me 
> >know,
> and
> >I'll see if I can convince Quantify to cough it up.
> >
> Please run the same test but using
> 
> x1 = N.random.rand(39,2000)
> x2 = N.random.rand(39,64,1)
> 
> z1 = x1[:,N.newaxis,:] - x2

Very similar results to what I had previously:

[1] 10.88%
[2] 7.25%
[3] 68.25%

Cheers,

Albert

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: David C. <da...@ar...> - 2006-10-03 07:31:47

Albert Strasheim wrote:
> Hello all
>
> I recently started looking at David Cournapeau's PyEM package, specifically
> his implementation of the K-Means algorithm. He implemented part of this
> algorithm with in pure Python version and also provided a Pyrex alternative
> that is significantly faster (about 10 times with the data I tested with). I
> tried to figure out why this is so.
>
>   
For what it worths, PyEM has been quietly updated in the past weeks: 
most responses I got initially were people reporting errors almost 
always related to change in numpy API changes, so I decided to keep 
quiet for a while, waiting for an API freeze on numpy side (last 
versions use vq from scipy.cluster, for example, the plotting have been 
much improved, a fast Ctype version for diagonal gaussian kernel 
enables  to  run EM on  hundred of thousand frames of several dozens 
dimensions  with several dozens of gaussian in the mixture in reasonable 
times)
> The job of this part of the algorithm is pretty simple: from a bunch of
> cluster means (the codebook) find the nearest cluster mean for each data
> point. The first attempt at implementing this algorithm might use two for
> loops, one over the data points and one over the cluster means, computing a
> Euclidean distance at each step. Slightly better is to use one loop and some
> broadcasting.
>
> By randomly trying some code, I arrived at the following one-liner:
>
> N.sum((data[...,N.newaxis,...]-code)**2, 2).argmin(axis=1)
>
> where
>
> data = N.random.randn(2000, 39)
> nclusters = 64
> code = data[0:nclusters,:]
>
> This code was still slow, so I mangled it a bit so that I could profile it
> using cProfile under Python 2.5 (use profile if you have an older version of
> Python):
>
> def kmean():
>     data = N.random.randn(2000, 39)
>     nclusters = 64
>     code = data[0:nclusters,:]
>     for i in xrange(10):
>         def a(): return data[...,N.newaxis,...]
>         z = a()
>         def b(): return z-code
>         z = b()
>         def c(): return z**2
>         z = c()
>         def d(): return N.sum(z, 2)
>         z = d()
>         def e(): return z.argmin(axis=1)
>         z = e()
>
> if __name__ == '__main__':
>     import cProfile
>     cProfile.run("kmean()")
>
> Profiler output:
>   
I got exactly the same kind of weird behaviour in other places of PyEM 
(apparently simple substractions taking a lot of time compared to other 
parts although it should theoratically be negligeable: eg X - mu where 
X  is (n, d) and mu (d, 1) takeing almost as much time as exp( X**2) !).

It would be great to at least know the origin of this non-intuitive result,

David

Re: [Numpy-discussion] ***[Possible UCE]*** Re: Vectorizing code, for loops, and all that

From: Travis O. <oli...@ee...> - 2006-10-03 01:50:36

Albert Strasheim wrote:

>Hello all
>
>  
>
>>-----Original Message-----
>>From: num...@li... [mailto:numpy-
>>dis...@li...] On Behalf Of Travis Oliphant
>>Sent: 03 October 2006 02:32
>>To: Discussion of Numerical Python
>>Subject: Re: [Numpy-discussion] Vectorizing code, for loops, and all that
>>
>>Travis Oliphant wrote:
>>
>>    
>>
>>>I suspect I know why, although the difference seems rather large.
>>>
>>>      
>>>
>>[snip]
>>
>>    
>>
>>>I'm surprised the overhead of adjusting pointers is so high, but then
>>>again you are probably getting a lot of cache misses in the first case
>>>so there is more to it than that, the loops may run more slowly too.
>>>
>>>
>>>      
>>>
>>I'm personally bothered that this example runs so much more slowly.  I
>>don't think it should.  Perhaps it is unavoidable because of the
>>memory-layout issues.  It is just hard to believe that the overhead for
>>calling into the loop and adjusting the pointers is so much higher.
>>    
>>
>
>Firstly, thanks to Tim... I'll try his functions tomorrow.
>
>Meanwhile, I can confirm that the NOBUFFER_UFUNCLOOP case in
>PyUFunc_GenericFunction is getting exercised in the slower case. Here's some
>info on what's happening, courtesy of Rational Quantify:
>
>case NOBUFFER_UFUNCLOOP:
>while (loop->index < loop->size) {
>  for (i=0; i<self->nargs; i++)
>    loop->bufptr[i] = loop->iters[i]->dataptr; [1]
>
>  loop->function((char **)loop->bufptr, &(loop->bufcnt),
>                 loop->steps, loop->funcdata); [2]
>  UFUNC_CHECK_ERROR(loop);
>
>  for (i=0; i<self->nargs; i++) {
>      PyArray_ITER_NEXT(loop->iters[i]); [3]
>  }
>  loop->index++;
>}
>break;
>
>[1] 12.97% of function time
>[2] 8.65% of functiont ime
>[3] 62.14% of function time
>
>If statistics from elsewhere in the code would be helpful, let me know, and
>I'll see if I can convince Quantify to cough it up.
>
>  
>
Please run the same test but using

x1 = N.random.rand(39,2000)
x2 = N.random.rand(39,64,1)

z1 = x1[:,N.newaxis,:] - x2


Thanks,

-Travis

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Albert S. <136...@su...> - 2006-10-03 00:42:21

Hello all

> -----Original Message-----
> From: num...@li... [mailto:numpy-
> dis...@li...] On Behalf Of Travis Oliphant
> Sent: 03 October 2006 02:32
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Vectorizing code, for loops, and all that
> 
> Travis Oliphant wrote:
> 
> >
> >I suspect I know why, although the difference seems rather large.
> >
> [snip]
> 
> >I'm surprised the overhead of adjusting pointers is so high, but then
> >again you are probably getting a lot of cache misses in the first case
> >so there is more to it than that, the loops may run more slowly too.
> >
> >
> 
> I'm personally bothered that this example runs so much more slowly.  I
> don't think it should.  Perhaps it is unavoidable because of the
> memory-layout issues.  It is just hard to believe that the overhead for
> calling into the loop and adjusting the pointers is so much higher.

Firstly, thanks to Tim... I'll try his functions tomorrow.

Meanwhile, I can confirm that the NOBUFFER_UFUNCLOOP case in
PyUFunc_GenericFunction is getting exercised in the slower case. Here's some
info on what's happening, courtesy of Rational Quantify:

case NOBUFFER_UFUNCLOOP:
while (loop->index < loop->size) {
  for (i=0; i<self->nargs; i++)
    loop->bufptr[i] = loop->iters[i]->dataptr; [1]

  loop->function((char **)loop->bufptr, &(loop->bufcnt),
                 loop->steps, loop->funcdata); [2]
  UFUNC_CHECK_ERROR(loop);

  for (i=0; i<self->nargs; i++) {
      PyArray_ITER_NEXT(loop->iters[i]); [3]
  }
  loop->index++;
}
break;

[1] 12.97% of function time
[2] 8.65% of functiont ime
[3] 62.14% of function time

If statistics from elsewhere in the code would be helpful, let me know, and
I'll see if I can convince Quantify to cough it up.

> <snip>

Thanks for all code and suggestions.

Cheers,

Albert

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Travis O. <oli...@ee...> - 2006-10-03 00:32:26

Travis Oliphant wrote:

>
>I suspect I know why, although the difference seems rather large.  
>
[snip]

>I'm surprised the overhead of adjusting pointers is so high, but then 
>again you are probably getting a lot of cache misses in the first case 
>so there is more to it than that, the loops may run more slowly too.
>  
>

I'm personally bothered that this example runs so much more slowly.  I 
don't think it should.  Perhaps it is unavoidable because of the 
memory-layout issues.  It is just hard to believe that the overhead for 
calling into the loop and adjusting the pointers is so much higher. 

But, that isn't the problem, here.  Notice the following:

x3 = N.random.rand(39,2000)
x4 = N.random.rand(39,64,1)

%timeit z3 = x3[:,None,:] - x4

10 loops, best of 3: 76.4 ms per loop

Hmm... It looks like cache misses are a lot more important than making 
sure the inner loop is taken over the largest number of variables 
(that's the current way ufuncs decide which axis ought to be used as the 
1-d loop). 

Perhaps those inner 1-d loops could be optimized (using prefetch or 
something) to reduce the number of cache misses on the inner 
computation, and the concept of looping over the largest dimension 
(instead of the last dimension) should be re-considered.

Ideas,

-Travis

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Tim H. <tim...@ie...> - 2006-10-03 00:17:58

Tim Hochberg wrote:
> Travis Oliphant wrote:
>   
>> Albert Strasheim wrote:
>>
>>   
>>     
>>> In [571]: x1 = N.random.randn(2000,39)
>>>
>>> In [572]: y1 = N.random.randn(64,39)
>>>
>>> In [574]: %timeit z1=x1[...,N.newaxis,...]-y1 10 loops, best of 3: 703 ms
>>> per loop
>>>
>>> In [575]: z1.shape
>>> Out[575]: (2000, 64, 39)
>>>
>>> As far as I can figure, this operation is doing 2000*64*39 subtractions.
>>> Doing this straight up yields the following:
>>>
>>> In [576]: x2 = N.random.randn(2000,64,39)
>>>
>>> In [577]: y2 = N.random.randn(2000,64,39)
>>>
>>> In [578]: %timeit z2 = x2-y2
>>> 10 loops, best of 3: 108 ms per loop
>>>
>>> Does anybody have any ideas on why this is so much faster? Hopefully I
>>> didn't mess up somewhere...
>>>  
>>>
>>>    
[SNIP]
>
>
> I just spent a while playing with this, and assuming I've correctly 
> translated your original intent I've come up with two alternative, 
> looping versions that run, respectively 2 and 3 times faster. I've a 
> feeling that kmean3, the fastest one, still has a little more room to be 
> sped up, but I'm out of time now. Code is below
>
> -tim
>
>
>   
One more iterations, this time a little algorithmic improvement, and 
were up to 4x as fast as the original code. Here we take advantage of 
the fact that the term di**2 is constant across the axis we are 
minimizing on, computation of ci**2 can be hoisted out of the loop and 
the -2*di*ci term can be rejiggered to di*ci by appropriate rescaling. 
This reduces the computation in the inner loop from a subtract and a 
multiply to just a multiply.



def kmean4(data):
    nclusters = 64
    naxes = data.shape[-1]
    code = data[:nclusters]
    transdata = data.transpose().copy()
    totals = N.empty([nclusters, len(data)], float)
    code2 = (code**2).sum(-1)
    code2 *= -0.5
    totals[:] = code2[:, N.newaxis]
    for cluster, tot in zip(code, totals):
        for di, ci in zip(transdata, cluster):
            tot += di*ci
    return totals.argmax(axis=0)


[SNIP CODE]

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Tim H. <tim...@ie...> - 2006-10-02 23:50:06

Travis Oliphant wrote:
> Albert Strasheim wrote:
>
>   
>> In [571]: x1 = N.random.randn(2000,39)
>>
>> In [572]: y1 = N.random.randn(64,39)
>>
>> In [574]: %timeit z1=x1[...,N.newaxis,...]-y1 10 loops, best of 3: 703 ms
>> per loop
>>
>> In [575]: z1.shape
>> Out[575]: (2000, 64, 39)
>>
>> As far as I can figure, this operation is doing 2000*64*39 subtractions.
>> Doing this straight up yields the following:
>>
>> In [576]: x2 = N.random.randn(2000,64,39)
>>
>> In [577]: y2 = N.random.randn(2000,64,39)
>>
>> In [578]: %timeit z2 = x2-y2
>> 10 loops, best of 3: 108 ms per loop
>>
>> Does anybody have any ideas on why this is so much faster? Hopefully I
>> didn't mess up somewhere...
>>  
>>
>>     
>
> I suspect I know why, although the difference seems rather large.  There 
> is code optimization that is being taken advantage of in the second 
> case.  If you have contiguous arrays (no broadcasting needed), then 1 
> C-loop is used for the subtraction (your second case).
>
> In the first case you are using broadcasting to generate the larger 
> array.  This requires more complicated looping constructs under the 
> covers which causes your overhead.  Bascially, you will have 64*39 1-d 
> loops of 2000 elements each in the first example with a bit of 
> calculation over-head to reset the pointers before each loop.
>
> In the ufunc code, compare the ONE_UFUNCLOOP case with the 
> NOBUFFER_UFUNCLOOP case.  If you want to be sure what is running 
> un-comment the fprintf statements so you can tell.
>
> I'm surprised the overhead of adjusting pointers is so high, but then 
> again you are probably getting a lot of cache misses in the first case 
> so there is more to it than that, the loops may run more slowly too.
>
>   
I suspect that Travis is basically right about why your simple 
subtraction runs much faster than your test case.

However, that doesn't mean you can't do better than at present. When 
dealing with large, multidimensional arrays my experience has been that 
unrolling all of the for loops is frequently counterproductive. I chalk 
this up to two factors: first that you tend to end up generating large 
temporary arrays and this in turn leads to cache misses. Second, you 
loose flexibility in how you perform the calculation, which in turns 
limits other possible explanations.

I just spent a while playing with this, and assuming I've correctly 
translated your original intent I've come up with two alternative, 
looping versions that run, respectively 2 and 3 times faster. I've a 
feeling that kmean3, the fastest one, still has a little more room to be 
sped up, but I'm out of time now. Code is below

-tim




import numpy as N
 
data = N.random.randn(2000, 39)

 
def kmean0(data):
    nclusters = 64
    code = data[0:nclusters,:]
    return N.sum((data[...,N.newaxis,...]-code)**2, 2).argmin(axis=1)
 
def kmean1(data):
    nclusters = 64
    code = data[0:nclusters,:]
    z = data[:,N.newaxis,:]
    z = z-code
    z = z**2
    z = N.sum(z, 2)
    return z.argmin(axis=1)

def kmean2(data):
    nclusters = 64
    naxes = data.shape[-1]
    code = data[0:nclusters,:]
    data = data[:, N.newaxis]
    allz = N.zeros([len(data)])
    for i, x in enumerate(data):
        z = (x - code)
        z **= 2
        allz[i] = z.sum(-1).argmin(0)
    return allz



def kmean3(data):
    nclusters = 64
    naxes = data.shape[-1]
    code = data[0:nclusters]
    totals = N.zeros([nclusters, len(data)], float)
    transdata = data.transpose().copy()
    for cluster, tot in zip(code, totals):
        for di, ci in zip(transdata, cluster):
            delta = di - ci
            delta **=2
            tot += delta
    return totals.argmin(axis=0)
           
   

if __name__ == '__main__':
    assert N.alltrue(kmean0(data) == kmean1(data))
    assert N.alltrue(kmean0(data) == kmean2(data))
    assert N.alltrue(kmean0(data) == kmean3(data))

   
    from timeit import Timer
    print Timer('kmean0(data)', 'from scratch import kmean0, 
data').timeit(3)
    print Timer('kmean1(data)', 'from scratch import kmean1, 
data').timeit(3)
    print Timer('kmean2(data)', 'from scratch import kmean2, 
data').timeit(3)
    print Timer('kmean3(data)', 'from scratch import kmean3, 
data').timeit(3)

Re: [Numpy-discussion] Vectorizing code, for loops, and all that

From: Travis O. <oli...@ee...> - 2006-10-02 23:04:02

Albert Strasheim wrote:

>In [571]: x1 = N.random.randn(2000,39)
>
>In [572]: y1 = N.random.randn(64,39)
>
>In [574]: %timeit z1=x1[...,N.newaxis,...]-y1 10 loops, best of 3: 703 ms
>per loop
>
>In [575]: z1.shape
>Out[575]: (2000, 64, 39)
>
>As far as I can figure, this operation is doing 2000*64*39 subtractions.
>Doing this straight up yields the following:
>
>In [576]: x2 = N.random.randn(2000,64,39)
>
>In [577]: y2 = N.random.randn(2000,64,39)
>
>In [578]: %timeit z2 = x2-y2
>10 loops, best of 3: 108 ms per loop
>
>Does anybody have any ideas on why this is so much faster? Hopefully I
>didn't mess up somewhere...
>  
>

I suspect I know why, although the difference seems rather large.  There 
is code optimization that is being taken advantage of in the second 
case.  If you have contiguous arrays (no broadcasting needed), then 1 
C-loop is used for the subtraction (your second case).

In the first case you are using broadcasting to generate the larger 
array.  This requires more complicated looping constructs under the 
covers which causes your overhead.  Bascially, you will have 64*39 1-d 
loops of 2000 elements each in the first example with a bit of 
calculation over-head to reset the pointers before each loop.

In the ufunc code, compare the ONE_UFUNCLOOP case with the 
NOBUFFER_UFUNCLOOP case.  If you want to be sure what is running 
un-comment the fprintf statements so you can tell.

I'm surprised the overhead of adjusting pointers is so high, but then 
again you are probably getting a lot of cache misses in the first case 
so there is more to it than that, the loops may run more slowly too.

-Travis

[Numpy-discussion] Vectorizing code, for loops, and all that

From: Albert S. <136...@su...> - 2006-10-02 22:10:39

Hello all

I recently started looking at David Cournapeau's PyEM package, specifically
his implementation of the K-Means algorithm. He implemented part of this
algorithm with in pure Python version and also provided a Pyrex alternative
that is significantly faster (about 10 times with the data I tested with). I
tried to figure out why this is so.

The job of this part of the algorithm is pretty simple: from a bunch of
cluster means (the codebook) find the nearest cluster mean for each data
point. The first attempt at implementing this algorithm might use two for
loops, one over the data points and one over the cluster means, computing a
Euclidean distance at each step. Slightly better is to use one loop and some
broadcasting.

By randomly trying some code, I arrived at the following one-liner:

N.sum((data[...,N.newaxis,...]-code)**2, 2).argmin(axis=1)

where

data = N.random.randn(2000, 39)
nclusters = 64
code = data[0:nclusters,:]

This code was still slow, so I mangled it a bit so that I could profile it
using cProfile under Python 2.5 (use profile if you have an older version of
Python):

def kmean():
    data = N.random.randn(2000, 39)
    nclusters = 64
    code = data[0:nclusters,:]
    for i in xrange(10):
        def a(): return data[...,N.newaxis,...]
        z = a()
        def b(): return z-code
        z = b()
        def c(): return z**2
        z = c()
        def d(): return N.sum(z, 2)
        z = d()
        def e(): return z.argmin(axis=1)
        z = e()

if __name__ == '__main__':
    import cProfile
    cProfile.run("kmean()")

Profiler output:

In [570]: %run -t -e kmean2.py
         84 function calls in 8.567 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.567    8.567 <string>:1(<module>)
       10    0.000    0.000    0.835    0.084 fromnumeric.py:375(sum)
       10    0.000    0.000    0.000    0.000 kmean2.py:10(a)
       10    6.837    0.684    6.837    0.684 kmean2.py:12(b)
       10    0.623    0.062    0.623    0.062 kmean2.py:14(c)
       10    0.000    0.000    0.835    0.084 kmean2.py:16(d)
       10    0.080    0.008    0.080    0.008 kmean2.py:18(e)
        1    0.180    0.180    8.567    8.567 kmean2.py:5(kmean2)
       10    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
        1    0.012    0.012    0.012    0.012 {method 'randn' of
'mtrand.RandomState' objects}
       10    0.835    0.083    0.835    0.083 {method 'sum' of
'numpy.ndarray' objects}

It seems the b part of the computation is taking most of the time. I tried
to simulate this separately:

In [571]: x1 = N.random.randn(2000,39)

In [572]: y1 = N.random.randn(64,39)

In [574]: %timeit z1=x1[...,N.newaxis,...]-y1 10 loops, best of 3: 703 ms
per loop

In [575]: z1.shape
Out[575]: (2000, 64, 39)

As far as I can figure, this operation is doing 2000*64*39 subtractions.
Doing this straight up yields the following:

In [576]: x2 = N.random.randn(2000,64,39)

In [577]: y2 = N.random.randn(2000,64,39)

In [578]: %timeit z2 = x2-y2
10 loops, best of 3: 108 ms per loop

Does anybody have any ideas on why this is so much faster? Hopefully I
didn't mess up somewhere...

Thanks!

Regards,

Albert

P.S. I'm used NumPy 1.0rc1 with Python 2.5 on Windows for these tests. I get
similar results with 1.0.dev3239 with Python 2.4.3 on Linux.

P.P.S. SourceForge and GMail *really* don't like each other right now.

258 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 47 48 49 50 51 .. 480 > >> (Page 49 of 480)