SF.net SVN: matplotlib: [4032] trunk/py4science

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 4032
          http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4032&view=rev
Author:   jdh2358
Date:     2007-10-26 21:00:59 -0700 (Fri, 26 Oct 2007)

Log Message:
-----------
added a few more workbook examples

Modified Paths:
--------------
    trunk/py4science/examples/skel/stats_descriptives_skel.py
    trunk/py4science/examples/stats_descriptives.py
    trunk/py4science/workbook/convolution.tex
    trunk/py4science/workbook/intro_stats.tex
    trunk/py4science/workbook/stats_descriptives.tex
    trunk/py4science/workbook/stats_distributions.tex

Modified: trunk/py4science/examples/skel/stats_descriptives_skel.py
===================================================================

--- trunk/py4science/examples/skel/stats_descriptives_skel.py	2007-10-26 20:21:23 UTC (rev 4031)
+++ trunk/py4science/examples/skel/stats_descriptives_skel.py	2007-10-27 04:00:59 UTC (rev 4032)
@@ -56,14 +56,27 @@
 
           maxlags : max number of lags for the autocorr
 
-          detrend : a function used to detrend the data for the correlation and spectral functions
+          detrend : a function used to detrend the data for the
+          correlation and spectral functions
 
           fmt     : the plot format string
         """
         data = self.samples
 
-        class Bunch: pass
-        c = Bunch()
+	# Here we use a rather strange idiom: we create an empty do
+        # nothing class C and simply attach attributes to it for
+        # return value (which we carefully describe in the docstring).
+        # The alternative is either to return a tuple a,b,c,d or a
+        # dictionary {'a':someval, 'b':someotherval} but both of these
+        # methods have problems.  If you return a tuple, and later
+        # want to return something new, you have to change all the
+        # code that calls this function.  Dictionaries work fine, but
+        # I find the client code harder to use d['a'] vesus d.a.  The
+        # final alternative, which is most suitable for production
+        # code, is to define a custom class to store (and pretty
+        # print) your return object
+        class C: pass
+        c = C()
         N = 5
         fig = c.fig = figfunc()
         ax = c.ax1 = fig.add_subplot(N,1,1)

Modified: trunk/py4science/examples/stats_descriptives.py
===================================================================
--- trunk/py4science/examples/stats_descriptives.py	2007-10-26 20:21:23 UTC (rev 4031)
+++ trunk/py4science/examples/stats_descriptives.py	2007-10-27 04:00:59 UTC (rev 4032)
@@ -72,6 +72,18 @@
         """
         data = self.samples
 
+	# Here we use a rather strange idiom: we create an empty do
+        # nothing class C and simply attach attributes to it for
+        # return value (which we carefully describe in the docstring).
+        # The alternative is either to return a tuple a,b,c,d or a
+        # dictionary {'a':someval, 'b':someotherval} but both of these
+        # methods have problems.  If you return a tuple, and later
+        # want to return something new, you have to change all the
+        # code that calls this function.  Dictionaries work fine, but
+        # I find the client code harder to use d['a'] vesus d.a.  The
+        # final alternative, which is most suitable for production
+        # code, is to define a custom class to store (and pretty
+        # print) your return object
         class C: pass
         c = C()
         N = 5

Modified: trunk/py4science/workbook/convolution.tex
===================================================================
--- trunk/py4science/workbook/convolution.tex	2007-10-26 20:21:23 UTC (rev 4031)
+++ trunk/py4science/workbook/convolution.tex	2007-10-27 04:00:59 UTC (rev 4032)
@@ -1,14 +1,61 @@
 \section{Convolution}
 \label{sec:convolution}
 
+The output of a linear system is given by the convolution of its
+impulse response function with the input.  Mathematically
+\[
+  y(t) = \int_0^\t x(\tau)r(t-\tau)d\tau
+\]
+This fundamental relationship lies at the heart of linear systems
+analysis.  It is used to model the dynamics of calcium buffers in
+neuronal synapses, where incoming action potentials are represented as
+Dirac $\delta$-functions and the calcium stores are represented with a
+response function with multiple exponential time constants.  It is
+used in microscopy, in which the image distortions introduced by the
+lenses are \textit{deconvolved} out using a measured point spread
+function to provide a better picture of the true image input.  It is
+essential in structural engineering to determine how materials respond
+to shocks.
 
+The impulse response function $r$ is the system response to a
+pulsatile input.  For example, in Figure~\ref{fig:convolve_explain}
+below, the response function is the sum of two exponentials with
+different time constants and signs.  This is a typical function used
+to model synaptic current following a neuronal action potential.  The
+figure shows three $\delta$ inputs at different times and with
+different amplitudes.  The corresponsing impulse response for each
+input is shown following it, and is color coded with the impulse input
+color.  If the system response is linear, by definition, the response
+to a sum of inputs is the sum of the responses to the individual
+inputs, and the lower panel shows the sum of the responses, or
+equivalently, the convolution of the impulse response function with
+the input function.
+
 \begin{center}%
 \begin{figure}
 \begin{centering}\includegraphics[width=4in]{fig/convolve_explain}\par\end{centering}
-\caption{\label{fig:convolve_explain}The output of a linear system to a series of impulse inputs is equal to the sum of the scaled and time shifted impulse response functions.}
+\caption{\label{fig:convolve_explain}The output of a linear system to
+a series of impulse inputs is equal to the sum of the scaled and time
+shifted impulse response functions.}
 \end{figure}
 \par\end{center}
 
+In Figure~\ref{fig:convolve_explain}, the summing of the impulse
+response function over the three inputs is conceptually and visually
+easy to understand.  Some find the concept of a convolution of an
+impulse response function with a continuos time function, such as a
+sinusoid or a noise process, conceptually more difficult.  It
+shouldn't be.  By the \textit{sampling theorem}, we can represent any
+finite bandwidth continuous time signal as the sum of Dirac-$\delta$
+functions where the height of the $\delta$ function at each time point
+is simply the amplitude of the signal at that time point.  The only
+requirement is that the sampling frequency be at least as high as the
+Nyquist frequency, defined as the highest spectral frequency in the
+signal divided by 2.  See Figure~\ref{fig:convolve_deltas} for a
+representation of a delta function sampling of a damped, oscillatory,
+exponential function.
+
+
 \begin{center}%
 \begin{figure}
 \begin{centering}\includegraphics[width=4in]{fig/convolve_deltas}\par\end{centering}
@@ -17,6 +64,29 @@
 \par\end{center}
 
 
+In the exercise below, we will convolve a sample from the normal
+distribution (white noise) with a double exponential impulse response
+function.  Such a function acts as a low pass filter, so the resultant
+output will look considerably smoother than the input.  You can use
+\texttt{numpy.convolve} to perform the convolution numerically.
+
+We also explore the important relationship that a convolution in the
+tempoeral (or spatial) domain becomes a multiplication in the spectral
+domain, which is mathematically much easier to work with.  
+\[
+Y = R*X
+\] 
+
+where $Y$, $X$, and $R$ are the Fourier transforms of the respective
+variable in the temporal convolution equation above.  The Fourier
+transform of the impulse response function serves as an amplitude
+weighting and phase shifting operator for each frequency component.
+Thus, we can get deeper insight into the effects of impulse response
+function $r$ by studying the amplitude and phase spectrum of its
+transform $R$.  In the example below, however, we simply use the
+multiplication property to perform the same convolution in Fourier
+space to confirm the numerical result from \textt{numpy.convolve}.
+
 \lstinputlisting[label=code:convolution_demo,caption={IGNORED}]{skel/convolution_demo_skel.py}
 
 

Modified: trunk/py4science/workbook/intro_stats.tex
===================================================================
--- trunk/py4science/workbook/intro_stats.tex	2007-10-26 20:21:23 UTC (rev 4031)
+++ trunk/py4science/workbook/intro_stats.tex	2007-10-27 04:00:59 UTC (rev 4032)
@@ -1 +1,31 @@
-TODO
\ No newline at end of file
+\textt{R}, a statistical package based on \textt{S}, is viewd by some
+as the best statistical software on the planet, and in the open source
+world it is the clear choice for sophisticated statistical analysis.
+Like python, \textt{R} is an interpreted language written in C with an
+interactive shell.  Unlike python, which is a general purpose
+programming language, \textt{R} is a specialized statistical language.
+Since python is a excellent glue language, with facilities for
+providing a transparent interface to FORTRAN, C, C++ and other
+languages, it should come as no surprise that you can harness
+\textt{R}'s immense statistical power from python, through the
+\texttt{rpy} third part extension library.
+
+However, \textt{R} is not without its warts.  As a language, it lacks
+python's elegance and advanced programming constructs and idioms.  It
+is also GPL, which means you cannot distribute code based upon it
+unhindered: the code you distribute must be GPL as well (python, and
+the core scientific extension libraries, carry a more permissive
+license which support distribution in closed source, proprietary
+application).
+
+Fortunately, the core tools scientific libraries for python (primarily
+\textt{numpy} and \texttt{scipy.stats}) provide a wide array of
+statistical tools, from basic descriptive statistics (mean, variance,
+skew, kurtosis, correlation, \dots) to hypothesis testing (t-tests,
+$\Chi$-Square, analysis of variance, general linear models, \dots) to
+analytical and numerical tools for working with almost every discrete
+and continuous statistical distribution you can think of (normal,
+gamma, poisson, weibull, lognormal, levy stable, \dots).
+
+
+

Modified: trunk/py4science/workbook/stats_descriptives.tex
===================================================================
--- trunk/py4science/workbook/stats_descriptives.tex	2007-10-26 20:21:23 UTC (rev 4031)
+++ trunk/py4science/workbook/stats_descriptives.tex	2007-10-27 04:00:59 UTC (rev 4032)
@@ -1,7 +1,49 @@
 \section{Descriptive statistics}
 \label{sec:stats_descriptives}
 
+The first step in any statistical analysis should be to describe,
+charaterize and importantly, visualize your data.  The normal
+distribution (aka Gaussian or bell curve) lies at the heart of much of
+formal statistical analysis, and normal distributions have the tidy
+property that they are completely characterized by their mean and
+variance.  As you may have observed in your interactions with family
+and friends, most of the world is not normal, and many statistical
+analyses are flawed by summarizing data with just the mean and
+standard deviation (square root of variance) and associated
+signficance tests (eg the T-Test) as if it were normally distributed
+data.
 
+In the exercise below, we write a class to provide descriptive
+statistics of a data set passed into the constructor, with class
+methods to pretty print the results and to create a battery of
+standard plots which may show structure missing in a casual analysis.
+Many new programmers, or even experienced programmers used to a
+proceedural environment, are uncomfortable with the idea of classes,
+having hear their geekier programmer friends talk about them but not
+really sure what to do with them.  There are many interesting things
+one can do with classes (aka object oriented programming) but at their
+hear they are a way of bundling data with methods that operate on that
+data.  The \texttt{self} variable is special in python and is how the
+class refers to its own data and methods.  Here is a toy example
+
+\begin{lstlisting}
+
+In [115]: class MyData:
+   .....:     def __init__(self, x):
+   .....:         self.x = x
+   .....:     def sumsquare(self):
+   .....:         return (self.x**2).sum()
+   .....:     
+   .....:     
+
+In [116]: nse = npy.random.rand(100)
+
+In [117]: mydata.sumsquare()
+Out[117]: 29.6851135284
+
+\end{lstlisting}
+
+
 \lstinputlisting[label=code:stats_descriptives_skel,caption={IGNORED}]{skel/stats_descriptives_skel.py}
 
 \begin{figure}

Modified: trunk/py4science/workbook/stats_distributions.tex
===================================================================
--- trunk/py4science/workbook/stats_distributions.tex	2007-10-26 20:21:23 UTC (rev 4031)
+++ trunk/py4science/workbook/stats_distributions.tex	2007-10-27 04:00:59 UTC (rev 4032)
@@ -1,9 +1,83 @@
 \section{Statistical distributions}
 \label{sec:stats_distributions}
 
+We explore a handful of the statistical distributions in
+\texttt{scipy.stats} module and the connections between them.  The
+organization of the distribution functions in \texttt{scipy.stats} is
+quite elegant, with each distribution providing random variates
+(\texttt{rvs}), analytical moments (mean, variance, skew, kurtosis),
+analytic density (\texttt{pdf}, \texttt{cdf}) and survival functions
+(\texttt{sf}, \textt{isf}) (where available) and tools for fitting
+empirical distributions to the analytic distributions (\textt{fit}).
 
+in the exercise below, we will simulate a radioactive particle
+emitter, and look at the empirical distribution of waiting times
+compared with the expected analytical distributions.  Our radioative
+particle emitter has an equal likelihood of emitting a particle in any
+equal time interval, and emits particles at a rate of 20~Hz.  We will
+discretely sample time at a high frequency, and record a 1 of a
+particle is emitted and a 0 otherwise, and then look at the
+distribution of waiting times between emissions.  The probability of a
+particle emission in one of our sample intervals (assumed to be very
+small compared to the average interval between emissions) is
+proportional to the rate and the sample interval $\Delta t$, ie
+$p(\Delta t) = \alpha \Delta t$ where $\alpha$ is the emission rate in
+particles per second.
 
+\begin{lstlisting}
 
+# a uniform distribution [0..1]
+In [62]: uniform = scipy.stats.uniform()
+
+# our sample interval in seconds
+In [63]: deltat = 0.001
+
+# the emission rate, 20Hz
+In [65]: alpha = 20
+
+# 1000 random numbers
+In [66]: rvs = uniform.rvs(1000)
+
+# a look at the 1st 20 random variates
+In [67]: rvs[:20]
+Out[67]: 
+array([ 0.71167172,  0.01723161,  0.25849255,  0.00599207,  0.58656146,
+        0.12765225,  0.17898621,  0.77724693,  0.18042977,  0.91935639,
+        0.97659579,  0.59045477,  0.94730366,  0.00764026,  0.12153159,
+        0.82286929,  0.18990484,  0.34608396,  0.63931108,  0.57199175])
+
+# we simulate an emission when the random number is less than
+# p(Delta t) = alpha * deltat
+In [84]: emit = rvs < (alpha * deltat)
+
+
+# there were 3 emissions in the first 20 observations
+In [85]: emit[:20]
+Out[85]: 
+array([False,  True, False,  True, False, False, False, False, False,
+       False, False, False, False,  True, False, False, False, False,
+       False, False], dtype=bool)
+\end{lstlisting}
+
+The waiting times between the emissions should follow an exponential
+distribution (see \texttt{scipy.stats.expon}) with a mean of
+$1/\alpha$.  In the exercise below, you will generate a long array of
+emissions, compute the waiting times between emissions, between 2
+emissions, and between 10 emissions.  These should approach an 1st
+order gamma (aka exponential) distribution, 2nd order gamma, and 10th
+order gamma (see \texttt{scipy.stats.gamma}).  Use the probability
+density functions for these distributions in \texttt{scipy.stats} to
+compare your simulated distributions and moments with the analytic
+versions provided by \texttt{scipy.stats}.  With 10 waiting times, we
+should be approaching a normal distribution since we are summing 10
+waiting times and under the central limit theorem the sum of
+independent samples from a finite variance process approaches the
+normal distribution (see \texttt{scipy.stats.norm}).  In the final
+part of the exercise below, you will be asked to approximate the 10th
+order gamma distribution with a normal distribution.  The results
+should look something like those in
+Figure~\ref{fig:stats_distributions}.
+
 \lstinputlisting[label=code:stats_distributions_skel,caption={IGNORED}]{skel/stats_distributions_skel.py}
 
 \begin{figure}


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.