From: <jd...@us...> - 2007-12-07 04:48:32
|
Revision: 4663 http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4663&view=rev Author: jdh2358 Date: 2007-12-06 20:48:30 -0800 (Thu, 06 Dec 2007) Log Message: ----------- updated pyrex workbook Modified Paths: -------------- trunk/py4science/workbook/main.tex Added Paths: ----------- trunk/py4science/workbook/intro_pyrex.tex trunk/py4science/workbook/pyrex_ringbuf.tex Added: trunk/py4science/workbook/intro_pyrex.tex =================================================================== --- trunk/py4science/workbook/intro_pyrex.tex (rev 0) +++ trunk/py4science/workbook/intro_pyrex.tex 2007-12-07 04:48:30 UTC (rev 4663) @@ -0,0 +1,167 @@ +\texttt{pyrex} is a pure python packages that utilizes a custom +language which is a hybid of C and python to write code that looks +like python, but is converted by \texttt{pyrex} into python C +extension code. It can be used to write custom C extension modules in +a python like module to remove performance bottlenecks in code, as +well as to wrap and existing C API with a python binding. \textt{pyrex} +generates C code, so you can use it to automatically generate C +extensions that you can ship with your code and users can build your +code without \texttt{pyrex} installed. + +\section{Writing C extensions \texttt{pyrex}} + +The canonical \texttt{pyrex} example generates a list of \texttt{N} +prime numbers, and illustrates the hybrid nature of \texttt{pyrex} +syntax + +\begin{lstlisting} + +# name this file with the pyx extension for pyrex, rather than the py +# extension for python, eg primes.pyx +def primes(int kmax): + # pyrex uses cdef to declare a c type + cdef int n, k, i + cdef int p[1000] + + # you can use normal python too, eg a python list + result = [] + if kmax > 1000: + kmax = 1000 + k = 0 + n = 2 + while k < kmax: + i = 0 + while i < k and n % p[i] <> 0: + i = i + 1 + if i == k: + p[k] = n + k = k + 1 + result.append(n) + n = n + 1 + return result + +\end{lstlisting} + +To build our python extension, we will use the \texttt{pyrex.distutils} +extensions. Here is a typical setup.py + +\begin{lstlisting} +from distutils.core import setup + +# we use the Pyrex distutils Extension class rather than the standard +# python one +#from distutils.extension import Extension + +from Pyrex.Distutils.extension import Extension +from Pyrex.Distutils import build_ext + +setup( + name = 'Demos', + ext_modules=[ + Extension("primes", ["primes.pyx"]), + ], + cmdclass = {'build_ext': build_ext} +) + +\end{lstlisting} + +and we can build it in place using + +\begin{lstlisting} +python setup.py build_ext --inplace +\end{lstlisting} + +This creates a primes.c module which is the generated C code that we +can ship with our python code to users who may not have \texttt{pyrex} +installed, and a primes.so file which is the python shared library +extension. We can now fire up ipython, import primes, and call our +function with C performance. Here is an example shell session in +which we build and test our new extension code + +\begin{lstlisting} +# our single pyx file from above +pyrex_demos> ls primes* +primes.pyx + +# build the module in place +pyrex_demos> python setup.py build_ext --inplace +running build_ext +pyrexc primes.pyx --> primes.c +building 'primes' extension +creating build +creating build/temp.macosx-10.3-fat-2.5 +gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c primes.c -o build/temp.macosx-10.3-fat-2.5/primes.o +gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-fat-2.5/primes.o -o primes.so + +# now we have the original pyx and also the autogenerated C file and +# the extension module +pyrex_demos> ls primes* +primes.cprimes.pyxprimes.so + +# let's test drive this in ipython +pyrex_demos> ipython +IPython 0.8.3.svn.r2876 -- An enhanced Interactive Python. + +In [1]: import primes + +In [2]: dir(primes) +Out[2]: ['__builtins__', '__doc__', '__file__', '__name__', 'primes'] + +In [3]: print primes.primes(20) +[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71] + + +\end{lstlisting} + + +\section{Working with \texttt{numpy} arrays} + +\textt{numpy} arrays are the core of high performance computing in +python, and on of the most common data formats for passing large data +sets around between pyhton code and other wrappers. There are many +things that arrays do very well and are practically as fast as a +native C or Fortran implementations, eg convolutions and FFTs. But +there are somethings that can be painfully slow in python when working +with arrays, for example iterative algorithms over an array of values. +For these cases, it is nice to be able to quickly generate some python +extension code for working with \texttt{numpy} array data. + +\texttt{numpy} provides a file which exposes its C API for use in +\texttt{pyrex} extension code, you can find it, and another file which +\texttt{numpy} uses to expose the requisite bits of the Python C API +which it needs, in the \texttt{numpy} source code directory +\texttt{numpy/doc/pyrex}. These files are \texttt{c_numpy.pxd} and +\texttt{c_python.pxd}. In addition, \texttt{numpy} provides and +example file \texttt{numpyx.pyx}that shows you how to build a pyx +extension file for multi-dimensional array sof different data types +(eg int, float, python object). Here we will be a little less +ambitious for starters, and write a simple toy function that sums a 1D +array of floats. + +\begin{lstlisting} + +# import the numpy c API (you need to have c_python.pxd and +# c_numpy.pxd from the numpy source directory in your build directory +cimport c_numpy + +# since this is pyrex, we can import normal python modules too +import numpy + +# numpy must be initialized -- don't forget to do this when writing +# numpy extension code. It's a common gotcha +c_numpy.import_array() + +def sum_elements(c_numpy.ndarray arr): + cdef int i + cdef double x, val + + x = 0. + val = 0. + for i from 0<=i<arr.dimensions[0]: + val = (<double*>(arr.data + i*arr.strides[0]))[0] + x = x + val + + return x + +\end{lstlisting} + Modified: trunk/py4science/workbook/main.tex =================================================================== --- trunk/py4science/workbook/main.tex 2007-12-07 04:43:36 UTC (rev 4662) +++ trunk/py4science/workbook/main.tex 2007-12-07 04:48:30 UTC (rev 4663) @@ -50,12 +50,12 @@ \begin{document} -\title{ \vspace{3cm} +\title{ \vspace{3cm} Practical Scientific Computing\\ in Python} -\author{ \vspace{1cm} +\author{ \vspace{1cm} Editors:\\ John D. Hunter\\ Fernando P\xE9rez @@ -136,6 +136,11 @@ \chapter{Plotting on maps} \input{basemap.tex} +\chapter{Performance python: interfacing with other languages} +\input{intro_pyrex.tex} +\input{pyrex_ringbuf.tex} + + %%% Bibliography section \bibliographystyle{plain} Added: trunk/py4science/workbook/pyrex_ringbuf.tex =================================================================== --- trunk/py4science/workbook/pyrex_ringbuf.tex (rev 0) +++ trunk/py4science/workbook/pyrex_ringbuf.tex 2007-12-07 04:48:30 UTC (rev 4663) @@ -0,0 +1,43 @@ +This exercise introduces \texttt{pyrex} to wrap a C library for +trailing statistics. + +Computation of trailing windowed statistics is common in many +quantitative data driven disciplines, particularly where there is +noisy data. Common uses of windowed statistics are the trailing +moving average, standard deviation, minumum and maximum. Two common +use cases which pose computational challenges for python: real time +updating of trailing statistics as live data comes in, and posthoc +computation of trailing statistics over a large data array. In the +second case, for some statistics we can use convolution and related +techniques for efficient computation, eg of the trailing 30 sample +average + +\begin{lstlisting} + numpy.convolve(x, numpy.ones(30), mode=valid')[:len(x)] +\end{lstlisting} + +but for other statistics like the trailing 30 day maximum at each +point, efficient routines like convolution are of no help. + +This exercise introduces \texttt{pyrex} to efficiently solve the problem of +trailing statistics over arrays as well as for a live, incoming data +stream. A pure C library, \texttt{ringbuf}, defines a circular C +buffer and attached methods for efficiently computing trailing +averages, and \texttt{pyrex} is used to provide a pythonic API on top of this +extension code. The rigid segregation between the C library and the +python wrappers insures that the C code can be used in other projects, +be it a matlab (TM) extension or some other C library. The goal of +the exercise is to compute the trailing statistics \textit{mean}, +\textit{median}, \textit{stddev}, \textit{min} and \textit{max} using +three approaches: + +\begin{itemize} + \item with brute force using \texttt{numpy} arrays, slices and methods + + \item with python bindings to the \texttt{ringbuf} code + \texttt{ringbuf.Ringbuf}. + + \item using a \texttt{pyrex} extension to the + \texttt{ringbuf.runstats} code + +\end{itemize} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |