numpy-discussion Mailing List for Numerical Python (Page 231)

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion — Discussion list for all users of Numerical Python

You can subscribe to this list here.

2000	Jan (8)	Feb (49)	Mar (48)	Apr (28)	May (37)	Jun (28)	Jul (16)	Aug (16)	Sep (44)	Oct (61)	Nov (31)	Dec (24)
2001	Jan (56)	Feb (54)	Mar (41)	Apr (71)	May (48)	Jun (32)	Jul (53)	Aug (91)	Sep (56)	Oct (33)	Nov (81)	Dec (54)
2002	Jan (72)	Feb (37)	Mar (126)	Apr (62)	May (34)	Jun (124)	Jul (36)	Aug (34)	Sep (60)	Oct (37)	Nov (23)	Dec (104)
2003	Jan (110)	Feb (73)	Mar (42)	Apr (8)	May (76)	Jun (14)	Jul (52)	Aug (26)	Sep (108)	Oct (82)	Nov (89)	Dec (94)
2004	Jan (117)	Feb (86)	Mar (75)	Apr (55)	May (75)	Jun (160)	Jul (152)	Aug (86)	Sep (75)	Oct (134)	Nov (62)	Dec (60)
2005	Jan (187)	Feb (318)	Mar (296)	Apr (205)	May (84)	Jun (63)	Jul (122)	Aug (59)	Sep (66)	Oct (148)	Nov (120)	Dec (70)
2006	Jan (460)	Feb (683)	Mar (589)	Apr (559)	May (445)	Jun (712)	Jul (815)	Aug (663)	Sep (559)	Oct (930)	Nov (373)	Dec

Flat | Threaded

<< < 1 .. 229 230 231 232 233 .. 480 > >> (Page 231 of 480)

Re[2]: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Alan G I. <ai...@am...> - 2006-02-21 21:42:13

On Tue, 21 Feb 2006, (CET) Mads Ipsen apparently wrote: 
> My system consists of N particles, whose coordinates in 
> the xy-plane is given by the two vectors x and y. I need 
> to calculate the distance between all particle pairs 

Of possible interest?
http://www.cs.umd.edu/~mount/ANN/

Cheers,
Alan Isaac

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Mads I. <mp...@os...> - 2006-02-21 21:04:31

On Tue, 21 Feb 2006, Tim Hochberg wrote:

> Can you explain a little more about what you are trying to calculate?
> The bit about subtracting off  box*rint(dx/box) is a little odd. It
> almost seems like you should be able to do something with fmod, but I
> admit that I'm not sure how.
>
> If I had to guess as to source of the relative slowness I'd say it's
> because you are creating a lot of temporary matrices. There are ways to
> avoid this, but when taken to the extreme, they make your code look
> ugly. You might try the following, untested, code or some variation and
> see if it speeds things up. This makes extensive use of the little known
> optional destination argument for ufuncs. I only tend to do this sort of
> stuff where it's very critical since, as you can see, it makes things
> quite ugly.
>
>      dx_space = x.copy()
>      dy_space = y.copy()
>      scratch_space = x.copy()
>      for i in range(n-1):
>         dx = dx_space[i+1:]
>         dy = dy_space[i+1:]
>         scratch = scratch_space[i+1:]
>         subtract(x[i+1:], x[i], dx)
>         subtract(y[i+1:], y[i], dy)
>         #   dx -= box*rint(dx/box)
>         divide(dx, box, scratch)
>         rint(scratch, scratch)
>         scratch *= box
>         dx -= scratch
>         # dy -= box*rint(dy/box)
>         divide(dy, box, scratch)
>         rint(scratch, scratch)
>         scratch *= box
>         dy -= scratch
>         r2 = dx**2 + dy**2      # square of dist. between points
>
>
>
> Hope that helps:
>
> -tim

Here's what I am trying to do:

My system consists of N particles, whose coordinates in the xy-plane is
given by the two vectors x and y. I need to calculate the distance between
all particle pairs, which goes like this:

I pick particle 1 and calculate its distance to the N-1 other points. Then
I pick particle 2. Since its distance to particle 1 was found in the
previuos step, I only have to find its distance to the N-2 remaining
points. In the i'th step, I therefore only have to consider particle i+1
up to particle N. That explains the loop structure, where

  dx = x[i+1:] - x[i]
  dy = y[i+1:] - y[i]

the resulting vectors dx and dy will contain the x-distances from x[i] to
the proceeding points from i+1 up to N. The square of the distance r2 is
the given by

  r2 = dx**2 + dy**2

Another approach would be to use

  dx = subtract.outer(x,x)
  dy = subtract.outer(y,y)

but that will be overkill, since all distances are counted twice, and
also, the storage requirements grow rapidly if you have more than 1000
particles (app. 10^6 particle pairs).

Thanks for your code feedback, which I'll have a closer look at.  But I
try to believe, that numpy/Numeric/Python was invented with the one
purpose of avoiding coding like this - I think this is also a point you
already made. But thanks again.

// Mads

Re: [Numpy-discussion] mixing arrays and matrices: squeeze yes, flattened no?

From: Christopher B. <Chr...@no...> - 2006-02-21 20:48:18

Travis Oliphant wrote:

> You can mix arrays and matrices just fine if you remember that 1d arrays 
> are equivalent to row-vectors.

and you can easily get a column vector out of an array, if you remember 
that you want to keep it 2-d. i.e. use a slice rather than an index:

 >>> import numpy as N
 >>> a = N.ones((5,10))
 >>> a[:,1].shape # an index: it reduces the rank
(5,)
 >>> a[:,1:2].shape # a slice: it keeps the rank
(5, 1)

-Chris







-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

[Numpy-discussion] Problems building numpy w/ ATLAS on Solaris 8

From: <sk...@po...> - 2006-02-21 20:20:00

After a brief hiatus I'm back to trying to build numpy.  Last time I checked
in (on the scipy list) I had successfully built ATLAS and created this
simple site.cfg file in .../numpy/distutils/site.cfg:

    [atlas]
    library_dirs = /home/titan/skipm/src/ATLAS/lib/SunOS_Babe
    include_dirs = /home/titan/skipm/src/ATLAS/include/SunOS_Babe
    # for overriding the names of the atlas libraries
    atlas_libs = lapack, f77blas, cblas, atlas

I svn up'd (now at rev 2138), zapped my build directory, then executed
"python setup.py build".  Just in case it matters, I'm using Python 2.4.2
built with GCC 3.4.1 on Solaris 8.  Here's the output of my build attempt:

    Running from numpy source directory.
    No module named __svn_version__
    F2PY Version 2_2138
    blas_opt_info:
    blas_mkl_info:
    /home/ink/skipm/src/numpy/numpy/distutils/system_info.py:531: UserWarning: Library error: libs=['mkl', 'vml', 'guide'] found_libs=[]
      warnings.warn("Library error: libs=%s found_libs=%s" % \
      NOT AVAILABLE
    
    atlas_blas_threads_info:
    Setting PTATLAS=ATLAS
    /home/ink/skipm/src/numpy/numpy/distutils/system_info.py:531: UserWarning: Library error: libs=['lapack', 'f77blas', 'cblas', 'atlas'] found_libs=[]
      warnings.warn("Library error: libs=%s found_libs=%s" % \
    Setting PTATLAS=ATLAS
    Setting PTATLAS=ATLAS
      FOUND:
        libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
        library_dirs = ['/home/titan/skipm/src/ATLAS/lib/SunOS_Babe']
        language = c
        include_dirs = ['/opt/include']
    ...

See my site.cfg file?  Why does it affect library_dirs but not include_dirs?

    running build_src
    building extension "atlas_version" sources
      adding 'build/src/atlas_version_0x33c6fa32.c' to sources.
    running build_ext
    customize UnixCCompiler
    customize UnixCCompiler using build_ext
    building 'atlas_version' extension
    compiling C sources
    gcc options: '-fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC'
    compile options: '-I/opt/include -Inumpy/core/include -I/opt/app/g++lib6/python-2.4/include/python2.4 -c'
    /opt/lang/gcc-3.4/bin/gcc -shared build/temp.solaris-2.8-i86pc-2.4/build/src/atlas_version_0x33c6fa32.o -L/home/titan/skipm/src/ATLAS/lib/SunOS_Babe -llapack -lf77blas -lcblas -latlas -o build/temp.solaris-2.8-i86pc-2.4/atlas_version.so
    Text relocation remains                         referenced
        against symbol                  offset      in file
    <unknown>                           0x7         /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    <unknown>                           0xc         /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    ... bunch of missing <unknown>s elided ...
    printf                              0x1b        /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    printf                              0x2d        /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    printf                              0x3f        /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    printf                              0x51        /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    ... what's this? can't find printf??? ...
    ld: fatal: relocations remain against allocatable but non-writable sections
    collect2: ld returned 1 exit status
    Text relocation remains                         referenced
        against symbol                  offset      in file
    <unknown>                           0x7         /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    ... more eliding ...
    printf                              0x108       /home/titan/skipm/src/ATLAS/lib/SunOS_Babe/libatlas.a(ATL_buildinfo.o)
    ld: fatal: relocations remain against allocatable but non-writable sections
    collect2: ld returned 1 exit status
    ##### msg: error: Command "/opt/lang/gcc-3.4/bin/gcc -shared build/temp.solaris-2.8-i86pc-2.4/build/src/atlas_version_0x33c6fa32.o -L/home/titan/skipm/src/ATLAS/lib/SunOS_Babe -llapack -lf77blas -lcblas -latlas -o build/temp.solaris-2.8-i86pc-2.4/atlas_version.so" failed with exit status 1
    error: Command "/opt/lang/gcc-3.4/bin/gcc -shared build/temp.solaris-2.8-i86pc-2.4/build/src/atlas_version_0x33c6fa32.o -L/home/titan/skipm/src/ATLAS/lib/SunOS_Babe -llapack -lf77blas -lcblas -latlas -o build/temp.solaris-2.8-i86pc-2.4/atlas_version.so" failed with exit status 1
      FOUND:
        libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
        library_dirs = ['/home/titan/skipm/src/ATLAS/lib/SunOS_Babe']
        language = c
        define_macros = [('NO_ATLAS_INFO', 2)]
        include_dirs = ['/opt/include']
    
    Warning: distutils distribution has been initialized, it may be too late to add an extension _dotblas
    ...

How can I initialize things earlier? Does it matter?

    Traceback (most recent call last):
      File "setup.py", line 76, in ?
        setup_package()
      File "setup.py", line 63, in setup_package
        config.add_subpackage('numpy')
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 592, in add_subpackage
        config_list = self.get_subpackage(subpackage_name,subpackage_path)
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 582, in get_subpackage
        subpackage_path)
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 539, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "/home/ink/skipm/src/numpy/numpy/setup.py", line 10, in configuration
        config.add_subpackage('core')
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 592, in add_subpackage
        config_list = self.get_subpackage(subpackage_name,subpackage_path)
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 582, in get_subpackage
        subpackage_path)
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 539, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "numpy/core/setup.py", line 215, in configuration
        config.add_data_dir('tests')
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 636, in add_data_dir
        self.add_data_files((ds,filenames))
      File "/home/ink/skipm/src/numpy/numpy/distutils/misc_util.py", line 702, in add_data_files
        dist.data_files.extend(data_dict.items())
    AttributeError: 'NoneType' object has no attribute 'extend'

And finally, a traceback.  What's up with that?

In parallel with trying to build with ATLAS I'm also trying Travis's
suggestion of explicitly setting PTATLAS, ATLAS and BLAS to "None".  Numpy
builds when I do that.

-- 
Skip Montanaro - sk...@po...
      "The values  to which people cling  most stubbornly  under inappropriate
      conditions are those  values that were  previously  the source  of their
      greatest triumphs over adversity." -- Jared Diamond in "Collapse"

Re: [Numpy-discussion] mixing arrays and matrices: squeeze yes, flattened no?

From: Travis O. <oli...@ie...> - 2006-02-21 19:12:47

Sven Schreiber wrote:

>Hi, sometimes I'm still struggling with peculiarities of numpy-arrays
>vs. numpy-matrices; my latest story goes like this:
>
>I first slice out a column of a 2d-numpy-array (a = somearray[:,1]). I
>can just manage to understand the resulting shape ( == (112,) ).
>
>Then I slice a column from a numpy-matrix b = somematrix[:,1] and get
>the expected (112,1) shape.
>
>Then I do what I thought was the easiest thing in the world, I subtract
>the two vectors: c = a - b
>I was very surprised by the bug that showed up due to the fact that
>c.shape == (112,112) !!
>  
>
As you know this isn't a bug, but very expected behavior.  I don't see 
this changing any time soon.  Arrays are different than matrices.  
Matrices are always 2-d arrays while arrays can have any number of 
dimensions. 

The default relationship between arrays and matrices is that 1-d arrays 
get converted to row-matrices (1,N).   Regardless of which convention is 
chosen somebody will be bitten by that conversion if they think in terms 
of the other default.   I don't see a way around that except to be 
careful when you mix arrays and matrices.

>Next, I try to workaround by b.squeeze(). That seems to work, but why is
>b.squeeze().shape == (1, 112) instead of (112,)?
>  
>
Again the same reason as before.  A matrix is returned from b.squeeze() 
and there are no 1-d matrices.   Thus, you get a row-vector.   Use .T if 
you want a column vector.

>Then I thought maybe b.flattened() does the job, but then I get an error
>(matrix has no attr flattened). Again, I'm baffled.
>  
>
The correct spelling is b.flatten()

And again you are going to get a (1,N) matrix out because of how 1d 
arrays are interpreted as matrices.

In short, there is no way to get a 1-d matrix because that doesn't make 
sense.  You can get a 1-d array using

b.A.squeeze()

>Second (preliminary) conclusion: I will paranoically use even more
>asmatrix()-conversions in my code to avoid dealing with those
>array-beasts ;-) and get column vectors I can trust...
>
>  
>
>Is there a better general advice than to say: "numpy-matrices and
>numpy-arrays are best kept in separated worlds" ?
>  
>
You can mix arrays and matrices just fine if you remember that 1d arrays 
are equivalent to row-vectors. 

-Travis

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Tim H. <tim...@co...> - 2006-02-21 19:11:38

Mads Ipsen wrote:

>On Tue, 21 Feb 2006, Zachary Pincus wrote:
>
>  
>
>>Mads,
>>
>>The game with numpy, just as it is with Matlab or any other
>>interpreted numeric environment, is to try push as much of the
>>looping down into the C code as you can. This is because, as you now
>>know, compiled C can loop much faster than interpreted python.
>>
>>A simple example for averaging 1000 (x,y,z) points:
>>
>>print data.shape
>>(1000, 3)
>># bad: explicit for loop in python
>>avg = numpy.zeros(3, numpy.float_)
>>for i in data: avg += i
>>avg /= 1000.0
>>
>># good: implicit for loop in C
>>avg = numpy.add.reduce(data, axis = 0)
>>avg /= 1000.0
>>
>>In your case, instead of explicitly looping through each point, why
>>not do the calculations in parallel, operating on entire vectors of
>>points at one time? Then the looping is "pushed down" into compiled C
>>code. Or if you're really lucky, it's pushed all the way down to the
>>vector math units on your cpu if you have a good BLAS or whatever
>>installed.
>>
>>Zach Pincus
>>
>>Program in Biomedical Informatics and Department of Biochemistry
>>Stanford University School of Medicine
>>
>>
>>
>>    
>>
>>>2. Here is the main loop for finding all possible pair distances,
>>>   which corresponds to a loop over the upper triangular part of a
>>>   square matrix
>>>
>>>     # Loop over all particles
>>>     for i in range(n-1):
>>>        dx = x[i+1:] - x[i]
>>>        dy = y[i+1:] - y[i]
>>>
>>>        dx -= box*rint(dx/box)
>>>        dy -= box*rint(dy/box)
>>>
>>>        r2 = dx**2 + dy**2      # square of dist. between points
>>>
>>>where x and y contain the positions of the particles. A naive
>>>implementation in C is
>>>
>>>
>>>    // loop over all particles
>>>    for (int i=0; i<n-1; i++) {
>>>        for (int j=i+1; j<n; j++) {
>>>            dx = x[j] - x[i];
>>>            dy = y[j] - y[i];
>>>
>>>            dx -= box*rint(dx/box);
>>>            dy -= box*rint(dy/box);
>>>
>>>            r2 = dx*dx + dy*dy;
>>>        }
>>>    }
>>>
>>>For n = 2500 particles, i.e. 3123750 particle pairs, the C loop is
>>>app. 10 times faster than the Python/Numeric counterpart. This is of
>>>course not satisfactory.
>>>
>>>Are there any things I am doing completely wrong here, basic
>>>approaches completely misunderstood, misuses etc?
>>>
>>>Any suggestions, guidelines, hints are most welcome.
>>>
>>>Best regards,
>>>
>>>Mads Ipsen
>>>
>>>
>>>+---------------------------------+-------------------------+
>>>| Mads Ipsen                      |                         |
>>>| Dept. of Chemistry              | phone:     +45-35320220 |
>>>| H.C.Ørsted Institute            | fax:       +45-35320322 |
>>>| Universitetsparken 5            |                         |
>>>| DK-2100 Copenhagen Ø, Denmark   | mp...@os...         |
>>>+---------------------------------+-------------------------+
>>>
>>>
>>>-------------------------------------------------------
>>>This SF.net email is sponsored by: Splunk Inc. Do you grep through
>>>log files
>>>for problems?  Stop!  Download the new AJAX search engine that makes
>>>searching your log files as easy as surfing the  web.  DOWNLOAD
>>>SPLUNK!
>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
>>>_______________________________________________
>>>Numpy-discussion mailing list
>>>Nump
>>>      
>>>
>
>I agree completely with your comments. But, as you can, the innermost part
>of the loop has been removed in the code, and replaced with numpy slices.
>It's hard for me to see how to compress the outer loop as well, since it
>determines the ranges for the inner loop. Unless there is some fancy slice
>notation, that allows you to loop over a triangular part of a matrix, ie.
>
>    x[i] = sum(A[i+1,i])
>
>meaning x[i] = sum of elements in i'th row of A using only elements from
>position i+1 up to n.
>
>Of course, there is the possibility of hardcoding this in C and then make
>it available as a Python module. But I don't want to do this before I am
>sure there isn't a numpy way out this.
>
>Let me know, if you have any suggestions.
>

Can you explain a little more about what you are trying to calculate? 
The bit about subtracting off  box*rint(dx/box) is a little odd. It 
almost seems like you should be able to do something with fmod, but I 
admit that I'm not sure how.

If I had to guess as to source of the relative slowness I'd say it's 
because you are creating a lot of temporary matrices. There are ways to 
avoid this, but when taken to the extreme, they make your code look 
ugly. You might try the following, untested, code or some variation and 
see if it speeds things up. This makes extensive use of the little known 
optional destination argument for ufuncs. I only tend to do this sort of 
stuff where it's very critical since, as you can see, it makes things 
quite ugly.

     dx_space = x.copy()
     dy_space = y.copy()
     scratch_space = x.copy()
     for i in range(n-1):
        dx = dx_space[i+1:]
        dy = dy_space[i+1:]
        scratch = scratch_space[i+1:]
        subtract(x[i+1:], x[i], dx)
        subtract(y[i+1:], y[i], dy) 
        #   dx -= box*rint(dx/box)
        divide(dx, box, scratch)
        rint(scratch, scratch)
        scratch *= box
        dx -= scratch
        # dy -= box*rint(dy/box)
        divide(dy, box, scratch)
        rint(scratch, scratch)
        scratch *= box
        dy -= scratch
        r2 = dx**2 + dy**2      # square of dist. between points



Hope that helps:

-tim

Re: [Numpy-discussion] optimizing power() for complex and real cases

From: Travis O. <oli...@ie...> - 2006-02-21 18:57:55

Tim Hochberg wrote:

> Yeah, sort of. I meant that the little helper functions that ufuncs 
> call, such as DOUBLE_multiply,  take the same types of arguments. 
> However, I just realized that I'm not certain that's true -- I just 
> assumed it because all the one's I've ever seen do. Also, this isn't 
> really a problem anyway -- the real problem is the slow conversion of 
> Python scalars to arrays in ufuncs.

Yes, that is true.  We have only defined multiplication for same-types.  
But, I just wanted to clarify that the ufunc machinery is more general 
than that, because others have been confused in the past. 

> I took a look at this earlier and it appears that the reason that 
> conversion of Python scalars are slow is that FromAny trys every other 
> conversion first. The check for Python scalars looks pretty cheap, so 
> it seems reasonable to check for them and do the appropriate 
> conversion early. Do the ufunc's call EnsureArray or FromAny? If the 
> former it would seem pretty straighforward to just stick another check 
> in there. Then David's original strategy of optimizing in DOUBLE_pow 
> should be close to as fast as what I'm doing.

Yes, I suspect the biggest slow-downs are the two attribute lookups 
which allow anything with __array__ or the array interface defined to be 
used. 

I think we could special-case Python scalars in that code.

-Travis

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Mads I. <mp...@os...> - 2006-02-21 18:24:10

On Tue, 21 Feb 2006, Zachary Pincus wrote:

> Mads,
>
> The game with numpy, just as it is with Matlab or any other
> interpreted numeric environment, is to try push as much of the
> looping down into the C code as you can. This is because, as you now
> know, compiled C can loop much faster than interpreted python.
>
> A simple example for averaging 1000 (x,y,z) points:
>
> print data.shape
> (1000, 3)
> # bad: explicit for loop in python
> avg =3D numpy.zeros(3, numpy.float_)
> for i in data: avg +=3D i
> avg /=3D 1000.0
>
> # good: implicit for loop in C
> avg =3D numpy.add.reduce(data, axis =3D 0)
> avg /=3D 1000.0
>
> In your case, instead of explicitly looping through each point, why
> not do the calculations in parallel, operating on entire vectors of
> points at one time? Then the looping is "pushed down" into compiled C
> code. Or if you're really lucky, it's pushed all the way down to the
> vector math units on your cpu if you have a good BLAS or whatever
> installed.
>
> Zach Pincus
>
> Program in Biomedical Informatics and Department of Biochemistry
> Stanford University School of Medicine
>
>
>
> > 2. Here is the main loop for finding all possible pair distances,
> >    which corresponds to a loop over the upper triangular part of a
> >    square matrix
> >
> >      # Loop over all particles
> >      for i in range(n-1):
> >         dx =3D x[i+1:] - x[i]
> >         dy =3D y[i+1:] - y[i]
> >
> >         dx -=3D box*rint(dx/box)
> >         dy -=3D box*rint(dy/box)
> >
> >         r2 =3D dx**2 + dy**2      # square of dist. between points
> >
> > where x and y contain the positions of the particles. A naive
> > implementation in C is
> >
> >
> >     // loop over all particles
> >     for (int i=3D0; i<n-1; i++) {
> >         for (int j=3Di+1; j<n; j++) {
> >             dx =3D x[j] - x[i];
> >             dy =3D y[j] - y[i];
> >
> >             dx -=3D box*rint(dx/box);
> >             dy -=3D box*rint(dy/box);
> >
> >             r2 =3D dx*dx + dy*dy;
> >         }
> >     }
> >
> > For n =3D 2500 particles, i.e. 3123750 particle pairs, the C loop is
> > app. 10 times faster than the Python/Numeric counterpart. This is of
> > course not satisfactory.
> >
> > Are there any things I am doing completely wrong here, basic
> > approaches completely misunderstood, misuses etc?
> >
> > Any suggestions, guidelines, hints are most welcome.
> >
> > Best regards,
> >
> > Mads Ipsen
> >
> >
> > +---------------------------------+-------------------------+
> > | Mads Ipsen                      |                         |
> > | Dept. of Chemistry              | phone:     +45-35320220 |
> > | H.C.=D8rsted Institute            | fax:       +45-35320322 |
> > | Universitetsparken 5            |                         |
> > | DK-2100 Copenhagen =D8, Denmark   | mp...@os...         |
> > +---------------------------------+-------------------------+
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc. Do you grep through
> > log files
> > for problems?  Stop!  Download the new AJAX search engine that makes
> > searching your log files as easy as surfing the  web.  DOWNLOAD
> > SPLUNK!
> > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=103432&bid#0486&dat=12164=
2
> > _______________________________________________
> > Numpy-discussion mailing list
> > Nump

I agree completely with your comments. But, as you can, the innermost part
of the loop has been removed in the code, and replaced with numpy slices.
It's hard for me to see how to compress the outer loop as well, since it
determines the ranges for the inner loop. Unless there is some fancy slice
notation, that allows you to loop over a triangular part of a matrix, ie.

    x[i] =3D sum(A[i+1,i])

meaning x[i] =3D sum of elements in i'th row of A using only elements from
position i+1 up to n.

Of course, there is the possibility of hardcoding this in C and then make
it available as a Python module. But I don't want to do this before I am
sure there isn't a numpy way out this.

Let me know, if you have any suggestions.

// Mads

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Zachary P. <zp...@st...> - 2006-02-21 17:18:13

Mads,

The game with numpy, just as it is with Matlab or any other =20
interpreted numeric environment, is to try push as much of the =20
looping down into the C code as you can. This is because, as you now =20
know, compiled C can loop much faster than interpreted python.

A simple example for averaging 1000 (x,y,z) points:

print data.shape
(1000, 3)
# bad: explicit for loop in python
avg =3D numpy.zeros(3, numpy.float_)
for i in data: avg +=3D i
avg /=3D 1000.0

# good: implicit for loop in C
avg =3D numpy.add.reduce(data, axis =3D 0)
avg /=3D 1000.0

In your case, instead of explicitly looping through each point, why =20
not do the calculations in parallel, operating on entire vectors of =20
points at one time? Then the looping is "pushed down" into compiled C =20=

code. Or if you're really lucky, it's pushed all the way down to the =20
vector math units on your cpu if you have a good BLAS or whatever =20
installed.

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine



> 2. Here is the main loop for finding all possible pair distances,
>    which corresponds to a loop over the upper triangular part of a
>    square matrix
>
>      # Loop over all particles
>      for i in range(n-1):
>         dx =3D x[i+1:] - x[i]
>         dy =3D y[i+1:] - y[i]
>
>         dx -=3D box*rint(dx/box)
>         dy -=3D box*rint(dy/box)
>
>         r2 =3D dx**2 + dy**2      # square of dist. between points
>
> where x and y contain the positions of the particles. A naive
> implementation in C is
>
>
>     // loop over all particles
>     for (int i=3D0; i<n-1; i++) {
>         for (int j=3Di+1; j<n; j++) {
>             dx =3D x[j] - x[i];
>             dy =3D y[j] - y[i];
>
>             dx -=3D box*rint(dx/box);
>             dy -=3D box*rint(dy/box);
>
>             r2 =3D dx*dx + dy*dy;
>         }
>     }
>
> For n =3D 2500 particles, i.e. 3123750 particle pairs, the C loop is
> app. 10 times faster than the Python/Numeric counterpart. This is of
> course not satisfactory.
>
> Are there any things I am doing completely wrong here, basic
> approaches completely misunderstood, misuses etc?
>
> Any suggestions, guidelines, hints are most welcome.
>
> Best regards,
>
> Mads Ipsen
>
>
> +---------------------------------+-------------------------+
> | Mads Ipsen                      |                         |
> | Dept. of Chemistry              | phone:     +45-35320220 |
> | H.C.=D8rsted Institute            | fax:       +45-35320322 |
> | Universitetsparken 5            |                         |
> | DK-2100 Copenhagen =D8, Denmark   | mp...@os...         |
> +---------------------------------+-------------------------+
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through =20
> log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD =20
> SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=103432&bid#0486&dat=121642=

> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Pau G. <pau...@gm...> - 2006-02-21 16:01:16

> the closest integer, i.e. for x =3D array([1.1, 1.8])
>
>   around(x) =3D [1.0, 2.0]
>
> whereas
>
>   x.astype(int).astype(float) =3D [1.0, 1.0]
>
(x+0.5).astype(int).astype(float) =3D [1.0, 2.0]

i hope it helps,
pau

Re: [Numpy-discussion] wiki page for record arrays

From: Stefan v. d. W. <st...@su...> - 2006-02-21 15:55:44

Hi Gary

Thanks for your suggestions.  I incorporated them.

St=E9fan

On Wed, Feb 22, 2006 at 01:24:24AM +1100, Gary Ruben wrote:
> 1. I'd put 'assumes from numpy import *' in the preamble.

> 2. Is it possible to change the formatting to make it more obvious what=
=20
> is input and what is output? I think it is better to show the input and=
=20
> output with a standard Python prompt a'la idle or possibly ipython.

> 3. I think it might be worth pointing out that
>=20
> img =3D array([(0,0,0), (1,0,0), (0,1,0), (0,0,1)], [('r',Float32),('g'=
,F
> loat32),('b',Float32)])
>=20
> is valid syntax that can be replaced by the 2-line version you present.=
=20

> 4. Can you explain dtype=3D(void,12)?

> 5. When the page's name is changed, a link should be put to it in the=20
> 'Getting Started and Tutorial' section of the Documentation page.

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Sasha <nd...@ma...> - 2006-02-21 15:31:49

On 2/21/06, Mads Ipsen <mp...@os...> wrote:
> Maybe I am wrong here, but around() and rint() is supposed to round to
> the closest integer, i.e. for x =3D array([1.1, 1.8])

You are right. In the follow-up I've suggested to speed-up the case
decimals=3D0 in around in around instead of adding another function.  I
think that would be a more "pythonic" solution.

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Mads I. <mp...@os...> - 2006-02-21 15:23:01

On Tue, 21 Feb 2006, Alexander Belopolsky wrote:

> On 2/21/06, Mads Ipsen <mp...@os...> wrote:
> > I suggest that rint() is added as a ufunc or is there any concerns
> > here that I am not aware of?
>
> You might want to use astype(int).  On my system it is much faster than around:
>
> > python -m timeit -s "from numpy import array, around; x = array([1.5]*1000)" "around(x)"
> 10000 loops, best of 3: 176 usec per loop
> > python -m timeit -s "from numpy import array, around; x = array([1.5]*1000)" "x.astype(int)"
> 100000 loops, best of 3: 3.2 usec per loop
>
> the difference is too big to be explained by the fact that around
> allocates twice as much memory for the result.  In fact the following
> equivalent of rint is still very fast:
>
> > python -m timeit -s "from numpy import array, around;

Maybe I am wrong here, but around() and rint() is supposed to round to
the closest integer, i.e. for x = array([1.1, 1.8])

  around(x) = [1.0, 2.0]

whereas

  x.astype(int).astype(float) = [1.0, 1.0]

This particular property of around() as well as rint() is crucial for my
application.

// Mads

Re: [Numpy-discussion] algorithm, optimization, or other problem?

From: Bruce S. <bso...@gm...> - 2006-02-21 15:15:20

Hi,
In the current version, note that Y is scalar so replace the squaring
(Y**2) with Y*Y as you do in the dohebb function.  On my system
without blas etc removing the squaring removes a few seconds (16.28 to
12.4). It did not seem to help factorizing Y.

Also, eta and tau are constants so define them only once as scalars
outside the loops and do the division outside the loop. It only saves
about 0.2 seconds but these add up.

The inner loop probably can be vectorized because it is just vector
operations on a matrix. You are just computing over the ith dimension
of X.  I think that you could be able to find the matrix version on
the net.

Regards
Bruce



On 2/21/06, Brian Blais <bb...@br...> wrote:
> Hello,
>
> I am trying to translate some Matlab/mex code to Python, for doing neural
> simulations.  This application is definitely computing-time limited, and =
I need to
> optimize at least one inner loop of the code, or perhaps even rethink the=
 algorithm.
>    The procedure is very simple, after initializing any variables:
>
> 1) select a random input vector, which I will call "x".  right now I have=
 it as an
> array, and I choose columns from that array randomly.  in other cases, I =
may need to
> take an image, select a patch, and then make that a column vector.
>
> 2) calculate an output value, which is the dot product of the "x" and a w=
eight
> vector, "w", so
>
>         y=3Ddot(x,w)
>
> 3) modify the weight vector based on a matrix equation, like:
>
>         w=3Dw+ eta * (y*x - y**2*w)
>                ^
>                |
>                +---- learning rate constant
>
> 4) repeat steps 1-3 many times
>
> I've organized it like:
>
> for e in 100:   # outer loop
>      for i in 1000:  # inner loop
>          (steps 1-3)
>
>      display things.
>
> so that the bulk of the computation is in the inner loop, and is amenable=
 to
> converting to a faster language.  This is my issue:
>
> straight python, in the example posted below for 250000 inner-loop steps,=
 takes 20
> seconds for each outer-loop step.  I tried Pyrex, which should work very =
fast on such
> a problem, takes about 8.5 seconds per outer-loop step.  The same code as=
 a C-mex
> file in matlab takes 1.5 seconds per outer-loop step.
>
> Given the huge difference between the Pyrex and the Mex, I feel that ther=
e is
> something I am doing wrong, because the C-code for both should run compar=
ably.
> Perhaps the approach is wrong?  I'm willing to take any suggestions!  I d=
on't mind
> coding some in C, but the Python API seemed a bit challenging to me.
>
> One note: I am using the Numeric package, not numpy, only because I want =
to be able
> to use the Enthought version for Windows.  I develop on Linux, and haven'=
t had a
> chance to see if I can compile numpy using the Enthought Python for Windo=
ws.
>
> If there is anything else anyone needs to know, I'll post it.  I put the =
main script,
> and a dohebb.pyx code below.
>
>
>                         thanks!
>
>                                         Brian Blais
>
> --
> -----------------
>
>              bb...@br...
>              http://web.bryant.edu/~bblais
>
>
>
>
> # Main script:
>
> from dohebb import *
> import pylab as p
> from Numeric import *
> from RandomArray import *
> import time
>
> x=3Drandom((100,1000))    # 1000 input vectors
>
> numpats=3Dx.shape[0]
> w=3Drandom((numpats,1));
>
> th=3Drandom((1,1))
>
> params=3D{}
> params['eta']=3D0.001;
> params['tau']=3D100.0;
> old_mx=3D0;
> for e in range(100):
>
>      rnd=3Drandint(0,numpats,250000)
>      t1=3Dtime.time()
>      if 0:  # straight python
>          for i in range(len(rnd)):
>              pat=3Drnd[i]
>              xx=3Dreshape(x[:,pat],(1,-1))
>              y=3Dmatrixmultiply(xx,w)
>              w=3Dw+params['eta']*(y*transpose(xx)-y**2*w);
>              th=3Dth+(1.0/params['tau'])*(y**2-th);
>      else: # pyrex
>          dohebb(params,w,th,x,rnd)
>      print time.time()-t1
>
>
> p.plot(w,'o-')
> p.xlabel('weights')
> p.show()
>
>
> #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> #         dohebb.pyx
>
> cdef extern from "Numeric/arrayobject.h":
>
>    struct PyArray_Descr:
>      int type_num, elsize
>      char type
>
>    ctypedef class Numeric.ArrayType [object PyArrayObject]:
>      cdef char *data
>      cdef int nd
>      cdef int *dimensions, *strides
>      cdef object base
>      cdef PyArray_Descr *descr
>      cdef int flags
>
>
> def dohebb(params,ArrayType w,ArrayType th,ArrayType X,ArrayType rnd):
>
>
>      cdef int num_iterations
>      cdef int num_inputs
>      cdef int offset
>      cdef double *wp,*xp,*thp
>      cdef int *rndp
>      cdef double eta,tau
>
>      eta=3Dparams['eta']  # learning rate
>      tau=3Dparams['tau']  # used for variance estimate
>
>      cdef double y
>      num_iterations=3Drnd.dimensions[0]
>      num_inputs=3Dw.dimensions[0]
>
>      # get the pointers
>      wp=3D<double *>w.data
>      xp=3D<double *>X.data
>      rndp=3D<int *>rnd.data
>      thp=3D<double *>th.data
>
>      for it from 0 <=3D it < num_iterations:
>
>          offset=3Drndp[it]*num_inputs
>
>          # calculate the output
>          y=3D0.0
>          for i from 0 <=3D i < num_inputs:
>              y=3Dy+wp[i]*xp[i+offset]
>
>          # change in the weights
>          for i from 0 <=3D i < num_inputs:
>              wp[i]=3Dwp[i]+eta*(y*xp[i+offset] - y*y*wp[i])
>
>          # estimate the variance
>          thp[0]=3Dthp[0]+(1.0/tau)*(y**2-thp[0])
>
>
>
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi=
les
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D103432&bid=3D230486&dat=
=3D121642
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>

Re: [Numpy-discussion] wiki page for record arrays

From: Travis N. V. <tr...@en...> - 2006-02-21 15:11:42

Travis N. Vaught wrote:
>  
> I've renamed it.  Now the page is at:
>
> http://www.scipy.org/RecordArray
>
Doh!  That should have been http://www.scipy.org/RecordArrays .

Re: [Numpy-discussion] wiki page for record arrays

From: Travis N. V. <tr...@en...> - 2006-02-21 15:05:10

Stefan van der Walt wrote:
> I wrote a short tutorial on using record arrays, which can be found at
>
> http://www.scipy.org/ArrayRecords
>
> The page is named ArrayRecords instead of RecordArrays, so I'd be glad
> if someone with priviledges could rename it.  Also, please fix any
> mistakes I might have made.
>   
...
I've renamed it.  Now the page is at:

http://www.scipy.org/RecordArray

Travis

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Sasha <nd...@ma...> - 2006-02-21 14:59:07

On the second thought, the difference between around and astype is not
surprising because around operates in terms of decimals.  Rather than
adding rint, I would suggest to make a special case decimals=3D0 use C
rint.

> On 2/21/06, Mads Ipsen <mp...@os...> wrote:
> > I suggest that rint() is added as a ufunc or is there any concerns
> > here that I am not aware of?
>
> You might want to use astype(int).  On my system it is much faster than a=
round:
>
> > python -m timeit -s "from numpy import array, around; x =3D array([1.5]=
*1000)" "around(x)"
> 10000 loops, best of 3: 176 usec per loop
> > python -m timeit -s "from numpy import array, around; x =3D array([1.5]=
*1000)" "x.astype(int)"
> 100000 loops, best of 3: 3.2 usec per loop
>
> the difference is too big to be explained by the fact that around
> allocates twice as much memory for the result.  In fact the following
> equivalent of rint is still very fast:
>
> > python -m timeit -s "from numpy import array, around; x =3D array([1.5]=
*1000)" "x.astype(int).astype(float)"
> 100000 loops, best of 3: 6.48 usec per loop
>

Re: [Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Alexander B. <ale...@gm...> - 2006-02-21 14:51:31

On 2/21/06, Mads Ipsen <mp...@os...> wrote:
> I suggest that rint() is added as a ufunc or is there any concerns
> here that I am not aware of?

You might want to use astype(int).  On my system it is much faster than aro=
und:

> python -m timeit -s "from numpy import array, around; x =3D array([1.5]*1=
000)" "around(x)"
10000 loops, best of 3: 176 usec per loop
> python -m timeit -s "from numpy import array, around; x =3D array([1.5]*1=
000)" "x.astype(int)"
100000 loops, best of 3: 3.2 usec per loop

the difference is too big to be explained by the fact that around
allocates twice as much memory for the result.  In fact the following
equivalent of rint is still very fast:

> python -m timeit -s "from numpy import array, around; x =3D array([1.5]*1=
000)" "x.astype(int).astype(float)"
100000 loops, best of 3: 6.48 usec per loop

Re: [Numpy-discussion] wiki page for record arrays

From: Gary R. <gr...@bi...> - 2006-02-21 14:24:36

Thanks for this Stéfan,

Can I make some observations. I don't want to just change your 
formatting. I think it would be good to have some discussion about the 
formatting used in tutorials like this, because all should probably 
follow a standard presentation style.
I like the usage summary at the end.

1. I'd put 'assumes from numpy import *' in the preamble.
2. Is it possible to change the formatting to make it more obvious what 
is input and what is output? I think it is better to show the input and 
output with a standard Python prompt a'la idle or possibly ipython.

A couple of things specific to your examples:
3. I think it might be worth pointing out that

img = array([(0,0,0), (1,0,0), (0,1,0), (0,0,1)], [('r',Float32),('g',F
loat32),('b',Float32)])

is valid syntax that can be replaced by the 2-line version you present. 
Should the valid syntax for creating a record array be presented in EBNF 
format?

4. Can you explain dtype=(void,12)?
5. When the page's name is changed, a link should be put to it in the 
'Getting Started and Tutorial' section of the Documentation page.

What do you and others think?

Gary R.

Stefan van der Walt wrote:
> I wrote a short tutorial on using record arrays, which can be found at
> 
> http://www.scipy.org/ArrayRecords
> 
> The page is named ArrayRecords instead of RecordArrays, so I'd be glad
> if someone with priviledges could rename it.  Also, please fix any
> mistakes I might have made.
> 
> Regards
> Stéfan

[Numpy-discussion] Rookie problems - Why is C-code much faster?

From: Mads I. <mp...@os...> - 2006-02-21 14:17:15

Hey,

I am relatively new to python and Numeric, and am currently involved
in a project of developing molecular dynamics code written in python
and using Numeric for number crunching. During this, I've run into
some problems, that I hope I can get some assistance with here. My
main issues here are:

1. around() appears to be slow
2. C code appears to be much faster


1. One of the bottle necks in MD is the calculation of the distances
between all particle pairs.

In MD simulations with periodic boundary conditions, you have to
estimate the shortest distance between all the particle pairs in your
system. Fx. on a line of length box =3D 10, the distance dx between two
points x0 =3D 1 and x1 =3D 9, will be dx =3D -2 (and NOT dx =3D 9). One way=
 to
do this in numpy is

    dx  =3D x1 - x0
    dx -=3D around(dx/box)

My first observation here, is that around() seems to be very slow. So
I looked in umathmodule.c and implemented rint() from the C math
library and made my own custom Numeric module. This gives a speed-up
by a factor of app. 4 compared to around().

I suggest that rint() is added as a ufunc or is there any concerns
here that I am not aware of?

2. Here is the main loop for finding all possible pair distances,
   which corresponds to a loop over the upper triangular part of a
   square matrix

     # Loop over all particles
     for i in range(n-1):
        dx =3D x[i+1:] - x[i]
        dy =3D y[i+1:] - y[i]

        dx -=3D box*rint(dx/box)
        dy -=3D box*rint(dy/box)

        r2 =3D dx**2 + dy**2      # square of dist. between points

where x and y contain the positions of the particles. A naive
implementation in C is


    // loop over all particles
    for (int i=3D0; i<n-1; i++) {
        for (int j=3Di+1; j<n; j++) {
            dx =3D x[j] - x[i];
            dy =3D y[j] - y[i];

            dx -=3D box*rint(dx/box);
            dy -=3D box*rint(dy/box);

            r2 =3D dx*dx + dy*dy;
        }
    }

For n =3D 2500 particles, i.e. 3123750 particle pairs, the C loop is
app. 10 times faster than the Python/Numeric counterpart. This is of
course not satisfactory.

Are there any things I am doing completely wrong here, basic
approaches completely misunderstood, misuses etc?

Any suggestions, guidelines, hints are most welcome.

Best regards,

Mads Ipsen


+---------------------------------+-------------------------+
| Mads Ipsen                      |                         |
| Dept. of Chemistry              | phone:     +45-35320220 |
| H.C.=D8rsted Institute            | fax:       +45-35320322 |
| Universitetsparken 5            |                         |
| DK-2100 Copenhagen =D8, Denmark   | mp...@os...         |
+---------------------------------+-------------------------+

[Numpy-discussion] mixing arrays and matrices: squeeze yes, flattened no?

From: Sven S. <sve...@gm...> - 2006-02-21 13:42:10

Hi, sometimes I'm still struggling with peculiarities of numpy-arrays
vs. numpy-matrices; my latest story goes like this:

I first slice out a column of a 2d-numpy-array (a = somearray[:,1]). I
can just manage to understand the resulting shape ( == (112,) ).

Then I slice a column from a numpy-matrix b = somematrix[:,1] and get
the expected (112,1) shape.

Then I do what I thought was the easiest thing in the world, I subtract
the two vectors: c = a - b
I was very surprised by the bug that showed up due to the fact that
c.shape == (112,112) !!

First conclusion: broadcasting is nice and everything, but here I
somehow think that it shouldn't be like this. I like numpy, but this is
frustrating.

Next, I try to workaround by b.squeeze(). That seems to work, but why is
b.squeeze().shape == (1, 112) instead of (112,)?

Then I thought maybe b.flattened() does the job, but then I get an error
(matrix has no attr flattened). Again, I'm baffled.

Could someone please explain? I already own the numpy-book, otherwise I
wouldn't even have thought of using those methods, but here it hasn't
enlightened me.

Second (preliminary) conclusion: I will paranoically use even more
asmatrix()-conversions in my code to avoid dealing with those
array-beasts ;-) and get column vectors I can trust...

Is there a better general advice than to say: "numpy-matrices and
numpy-arrays are best kept in separated worlds" ?

Thanks for any insights,
Sven

[Numpy-discussion] wiki page for record arrays

From: Stefan v. d. W. <st...@su...> - 2006-02-21 12:28:12

I wrote a short tutorial on using record arrays, which can be found at

http://www.scipy.org/ArrayRecords

The page is named ArrayRecords instead of RecordArrays, so I'd be glad
if someone with priviledges could rename it.  Also, please fix any
mistakes I might have made.

Regards
St=E9fan

[Numpy-discussion] algorithm, optimization, or other problem?

From: Brian B. <bb...@br...> - 2006-02-21 12:26:01

Hello,

I am trying to translate some Matlab/mex code to Python, for doing neural 
simulations.  This application is definitely computing-time limited, and I need to 
optimize at least one inner loop of the code, or perhaps even rethink the algorithm. 
   The procedure is very simple, after initializing any variables:

1) select a random input vector, which I will call "x".  right now I have it as an 
array, and I choose columns from that array randomly.  in other cases, I may need to 
take an image, select a patch, and then make that a column vector.

2) calculate an output value, which is the dot product of the "x" and a weight 
vector, "w", so

	y=dot(x,w)

3) modify the weight vector based on a matrix equation, like:

	w=w+ eta * (y*x - y**2*w)
               ^
               |
               +---- learning rate constant

4) repeat steps 1-3 many times

I've organized it like:

for e in 100:   # outer loop
     for i in 1000:  # inner loop
         (steps 1-3)

     display things.

so that the bulk of the computation is in the inner loop, and is amenable to 
converting to a faster language.  This is my issue:

straight python, in the example posted below for 250000 inner-loop steps, takes 20 
seconds for each outer-loop step.  I tried Pyrex, which should work very fast on such 
a problem, takes about 8.5 seconds per outer-loop step.  The same code as a C-mex 
file in matlab takes 1.5 seconds per outer-loop step.

Given the huge difference between the Pyrex and the Mex, I feel that there is 
something I am doing wrong, because the C-code for both should run comparably. 
Perhaps the approach is wrong?  I'm willing to take any suggestions!  I don't mind 
coding some in C, but the Python API seemed a bit challenging to me.

One note: I am using the Numeric package, not numpy, only because I want to be able 
to use the Enthought version for Windows.  I develop on Linux, and haven't had a 
chance to see if I can compile numpy using the Enthought Python for Windows.

If there is anything else anyone needs to know, I'll post it.  I put the main script, 
and a dohebb.pyx code below.


			thanks!

					Brian Blais

-- 
-----------------

             bb...@br...
             http://web.bryant.edu/~bblais




# Main script:

from dohebb import *
import pylab as p
from Numeric import *
from RandomArray import *
import time

x=random((100,1000))    # 1000 input vectors

numpats=x.shape[0]
w=random((numpats,1));

th=random((1,1))

params={}
params['eta']=0.001;
params['tau']=100.0;
old_mx=0;
for e in range(100):

     rnd=randint(0,numpats,250000)
     t1=time.time()
     if 0:  # straight python
         for i in range(len(rnd)):
             pat=rnd[i]
             xx=reshape(x[:,pat],(1,-1))
             y=matrixmultiply(xx,w)
             w=w+params['eta']*(y*transpose(xx)-y**2*w);
             th=th+(1.0/params['tau'])*(y**2-th);
     else: # pyrex
         dohebb(params,w,th,x,rnd)
     print time.time()-t1


p.plot(w,'o-')
p.xlabel('weights')
p.show()


#=============================================

#         dohebb.pyx

cdef extern from "Numeric/arrayobject.h":

   struct PyArray_Descr:
     int type_num, elsize
     char type

   ctypedef class Numeric.ArrayType [object PyArrayObject]:
     cdef char *data
     cdef int nd
     cdef int *dimensions, *strides
     cdef object base
     cdef PyArray_Descr *descr
     cdef int flags


def dohebb(params,ArrayType w,ArrayType th,ArrayType X,ArrayType rnd):


     cdef int num_iterations
     cdef int num_inputs
     cdef int offset
     cdef double *wp,*xp,*thp
     cdef int *rndp
     cdef double eta,tau

     eta=params['eta']  # learning rate
     tau=params['tau']  # used for variance estimate

     cdef double y
     num_iterations=rnd.dimensions[0]
     num_inputs=w.dimensions[0]

     # get the pointers
     wp=<double *>w.data
     xp=<double *>X.data
     rndp=<int *>rnd.data
     thp=<double *>th.data

     for it from 0 <= it < num_iterations:

         offset=rndp[it]*num_inputs

         # calculate the output
         y=0.0
         for i from 0 <= i < num_inputs:
             y=y+wp[i]*xp[i+offset]

         # change in the weights
         for i from 0 <= i < num_inputs:
             wp[i]=wp[i]+eta*(y*xp[i+offset] - y*y*wp[i])

         # estimate the variance
         thp[0]=thp[0]+(1.0/tau)*(y**2-thp[0])

[Numpy-discussion] comparing container objects with arrays

From: Pearu P. <pe...@sc...> - 2006-02-21 08:54:27

Hi,

Question: what is the recommended way to compare two array objects? And 
when they are contained in a tuple or list or dictinary or etc.?

I ask because I found that arr1.__eq__(arr2) can return either bool or an 
array of bools when shape(arr1)!=shape(arr2) or shape(arr1)==shape(arr2), 
respectively:

>>> array([1,2])==array([1,0,0])
False
>>> array([1,2])==array([1,0])
array([True, False], dtype=bool)

I wonder if numpy users are happy with that? Shouldn't arr1==arr2 return 
bool as well because current __eq__ behaviour is handled by equal() 
function when the shapes are equal?

Note that if __eq__ would always return bool then the following codes
would work as I would expect:

>>> (1,array([1,2]))==(1,array([1,2]))
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

>>> # I would expect True, compare with
>>> (1,[1,2])==(1,[1,2])
True

>>> (1,array([1,2]))==(1,array([1,2,0]))
False

I started to write this message because

   object1 == object2

returns boolean for (all?) Python builtin objects but as soon
as these objects contain arrays, the test will fail with an exception.

May be numpy needs equalobjs(obj1,obj2) that always returns boolean
and can handle comparing objects like {1:array([1,2])}, [3,[array([2,2])],
etc.

Pearu

Re: [Numpy-discussion] optimizing power() for complex and real cases

From: Travis O. <oli...@ie...> - 2006-02-21 03:44:35

Tim Hochberg wrote:

>> Hmm, ufuncs are passed a void* argument for passing info to them. Now,
>> what that argument is defined when the ufunc is created, but maybe
>> there's a way to piggy-back on it.
>>  
>>
> Yeah, I really felt like I was fighting the ufuncs when I was playing 
> with this. On the one hand, you really want to use the ufunc 
> machinery. On the other hand that forces you into using the same types 
> for both arguments.


This is not true.  Ufuncs can have different types for their 
arguments.   Perhaps you meant something else?

> Just to add a little more confusion to the mix. I did a little testing 
> to see how close pow(x,n) and x*x*... actually are. They are slightly 
> less close for small values of N and slightly closer for large values 
> of N than I would have expected. The upshot of this is that integer 
> powers between  -2 and +4 all seem to vary by the same amount when 
> computed using pow(x,n) versus multiplies. I'm including the test code 
> at the end. Assuming that this result is not a fluke that expands the 
> noncontroversial set by at least 3 more values. That's starting to 
> strain the ufunc aproach, so perhaps optimizing in @TYP@_power is the 
> way to go after all. Or,  more likely, adding @TYP@_int_power or maybe 
> @TYP@_fast_power (so as to be able to include some half integer 
> powers) and dispatching appropriately from array_power.
>
> The problem here, of course, is the overhead that PyArray_EnsureArray 
> runs into. I'm not sure if the ufuncs actually call that, but I was 
> using that to convert things to arrays at one point and I saw the 
> slowdown, so I suspect that the slowdown is in something 
> PyArray_EnsureArray calls if not in that routine itself. I'm afraid to 
> dig into that stuff though.. On the other hand, it would probably 
> speed up all kinds of stuff if that was sped up.

EnsureArray simply has some short cuts and then calls PyArray_FromAny.  
PyArray_FromAny is the big array conversion code.  It converts anything 
(it can) to an array.

>> Too bad we couldn't make a function generator :-) [Well, we could using
>> weave...]\
>>  
>>
> Yaigh!

That's actually an interesting approach that could use some attention..

-Travis

258 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 229 230 231 232 233 .. 480 > >> (Page 231 of 480)