[Numpy-discussion] C API questions and tricks.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I recently discovered the "clip()" function, and thought it was just
what I'd been wanting, because I need to do that a lot, and it provided
a nice notation, and I was hoping speed improvement over a couple of
where() statements.

I did some experiments, and discovered that it was, in fact, slower than
the where statements, and that the fastest way to do it is to use two
putmask() calls. (note: this changes the array in place, rather than
creating a new one)

The reason it is not faster is because it is written in Python, calling
choose(), similarly to how where() does. I decided that I could use a
fast version, so I set out to write one in C, resulting in the following
questions:

A) How do I loop through all the elements of a discontiguous array of
arbitraty dimensions? If I knew the rank ahead of time, I could just
nest some for loops and use the strides[] values. Not knowing before
hand, it seems that I should be able to do some nesting of loops using
nd and dimensions[], but I can't seem to work it out. Someone must have
come up with a nifty way to do this. Is there an existing macro or
function to do it?

B) How can I write a function that can act on any of the NumPy data
types? I currently have it written to only work with contiguous Float
arrays, which is what i need at the moment, but I'd much rather have one
for the general case.

C) I'd also like any feedback on other elements of my code (enclosed
with this email). A few comments on what the function (I've called it
fastclip) is supposed to do:

"""
fastclip(A,min,max)

changes the array, A, in place, so that all the elements less than min
are replaced by min, and all the elements greater that max are replaced
by max.

min and max can be either scalars, or anything that can be converted to
an array with the same number of elements as A (using
PyArray_ContiguousFromObject() ). If min and/or max is an array, than
the coresponding elements are used. This allows, among other things, a
way to clip to just a min or max value by calling it as:
fastclip(A,A,max) or fastclip(A,min,A).

"""

I wrote a little test script to benchmark the function, and it is much
faster that the alternatives that I have thought of:

#!/usr/bin/env python
# testing speed of where vs clip vs fastclip

from Numeric import *
from RandomArray import uniform
from NumericExtras import fastclip
import time

n = 5000

a = uniform(0,100,(n,))
b = uniform(0,100,(n,))
c = uniform(0,100,(n,))

min = 20.0
max = 80.0

print "n = %i"%(n,)

start = time.clock()
for i in range(100):
    a = clip(a,min,max)
print "clip took %f seconds"%(time.clock()-start)

start = time.clock()
for i in range(100):
    putmask(a,a < min,min)
    putmask(a,a > max,max)
print "putmask took %f seconds"%(time.clock()-start)

start = time.clock()
for i in range(100):
    fastclip(a,min,max)
print "fastclip took %f seconds"%(time.clock()-start)

Here are some results:

n = 50
clip took 0.020000 seconds
putmask took 0.050000 seconds
fastclip took 0.010000 seconds

n = 5000
clip took 0.300000 seconds
putmask took 0.230000 seconds
fastclip took 0.030000 seconds

As expected the large the array is, the better the improvement.

I'd love to here any feedbackyou can give me: I've new to writing Python
extensions, using the  Numeric API, and C itself, for that matter.

-thanks, Chris

-- 
Christopher Barker,
Ph.D.                                                           
Chr...@ho...                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------

[Numpy-discussion] C API questions and tricks.

A package for scientific computing with Python

[Numpy-discussion] C API questions and tricks.