From: <jd...@us...> - 2007-10-27 04:01:05
|
Revision: 4032 http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4032&view=rev Author: jdh2358 Date: 2007-10-26 21:00:59 -0700 (Fri, 26 Oct 2007) Log Message: ----------- added a few more workbook examples Modified Paths: -------------- trunk/py4science/examples/skel/stats_descriptives_skel.py trunk/py4science/examples/stats_descriptives.py trunk/py4science/workbook/convolution.tex trunk/py4science/workbook/intro_stats.tex trunk/py4science/workbook/stats_descriptives.tex trunk/py4science/workbook/stats_distributions.tex Modified: trunk/py4science/examples/skel/stats_descriptives_skel.py =================================================================== --- trunk/py4science/examples/skel/stats_descriptives_skel.py 2007-10-26 20:21:23 UTC (rev 4031) +++ trunk/py4science/examples/skel/stats_descriptives_skel.py 2007-10-27 04:00:59 UTC (rev 4032) @@ -56,14 +56,27 @@ maxlags : max number of lags for the autocorr - detrend : a function used to detrend the data for the correlation and spectral functions + detrend : a function used to detrend the data for the + correlation and spectral functions fmt : the plot format string """ data = self.samples - class Bunch: pass - c = Bunch() + # Here we use a rather strange idiom: we create an empty do + # nothing class C and simply attach attributes to it for + # return value (which we carefully describe in the docstring). + # The alternative is either to return a tuple a,b,c,d or a + # dictionary {'a':someval, 'b':someotherval} but both of these + # methods have problems. If you return a tuple, and later + # want to return something new, you have to change all the + # code that calls this function. Dictionaries work fine, but + # I find the client code harder to use d['a'] vesus d.a. The + # final alternative, which is most suitable for production + # code, is to define a custom class to store (and pretty + # print) your return object + class C: pass + c = C() N = 5 fig = c.fig = figfunc() ax = c.ax1 = fig.add_subplot(N,1,1) Modified: trunk/py4science/examples/stats_descriptives.py =================================================================== --- trunk/py4science/examples/stats_descriptives.py 2007-10-26 20:21:23 UTC (rev 4031) +++ trunk/py4science/examples/stats_descriptives.py 2007-10-27 04:00:59 UTC (rev 4032) @@ -72,6 +72,18 @@ """ data = self.samples + # Here we use a rather strange idiom: we create an empty do + # nothing class C and simply attach attributes to it for + # return value (which we carefully describe in the docstring). + # The alternative is either to return a tuple a,b,c,d or a + # dictionary {'a':someval, 'b':someotherval} but both of these + # methods have problems. If you return a tuple, and later + # want to return something new, you have to change all the + # code that calls this function. Dictionaries work fine, but + # I find the client code harder to use d['a'] vesus d.a. The + # final alternative, which is most suitable for production + # code, is to define a custom class to store (and pretty + # print) your return object class C: pass c = C() N = 5 Modified: trunk/py4science/workbook/convolution.tex =================================================================== --- trunk/py4science/workbook/convolution.tex 2007-10-26 20:21:23 UTC (rev 4031) +++ trunk/py4science/workbook/convolution.tex 2007-10-27 04:00:59 UTC (rev 4032) @@ -1,14 +1,61 @@ \section{Convolution} \label{sec:convolution} +The output of a linear system is given by the convolution of its +impulse response function with the input. Mathematically +\[ + y(t) = \int_0^\t x(\tau)r(t-\tau)d\tau +\] +This fundamental relationship lies at the heart of linear systems +analysis. It is used to model the dynamics of calcium buffers in +neuronal synapses, where incoming action potentials are represented as +Dirac $\delta$-functions and the calcium stores are represented with a +response function with multiple exponential time constants. It is +used in microscopy, in which the image distortions introduced by the +lenses are \textit{deconvolved} out using a measured point spread +function to provide a better picture of the true image input. It is +essential in structural engineering to determine how materials respond +to shocks. +The impulse response function $r$ is the system response to a +pulsatile input. For example, in Figure~\ref{fig:convolve_explain} +below, the response function is the sum of two exponentials with +different time constants and signs. This is a typical function used +to model synaptic current following a neuronal action potential. The +figure shows three $\delta$ inputs at different times and with +different amplitudes. The corresponsing impulse response for each +input is shown following it, and is color coded with the impulse input +color. If the system response is linear, by definition, the response +to a sum of inputs is the sum of the responses to the individual +inputs, and the lower panel shows the sum of the responses, or +equivalently, the convolution of the impulse response function with +the input function. + \begin{center}% \begin{figure} \begin{centering}\includegraphics[width=4in]{fig/convolve_explain}\par\end{centering} -\caption{\label{fig:convolve_explain}The output of a linear system to a series of impulse inputs is equal to the sum of the scaled and time shifted impulse response functions.} +\caption{\label{fig:convolve_explain}The output of a linear system to +a series of impulse inputs is equal to the sum of the scaled and time +shifted impulse response functions.} \end{figure} \par\end{center} +In Figure~\ref{fig:convolve_explain}, the summing of the impulse +response function over the three inputs is conceptually and visually +easy to understand. Some find the concept of a convolution of an +impulse response function with a continuos time function, such as a +sinusoid or a noise process, conceptually more difficult. It +shouldn't be. By the \textit{sampling theorem}, we can represent any +finite bandwidth continuous time signal as the sum of Dirac-$\delta$ +functions where the height of the $\delta$ function at each time point +is simply the amplitude of the signal at that time point. The only +requirement is that the sampling frequency be at least as high as the +Nyquist frequency, defined as the highest spectral frequency in the +signal divided by 2. See Figure~\ref{fig:convolve_deltas} for a +representation of a delta function sampling of a damped, oscillatory, +exponential function. + + \begin{center}% \begin{figure} \begin{centering}\includegraphics[width=4in]{fig/convolve_deltas}\par\end{centering} @@ -17,6 +64,29 @@ \par\end{center} +In the exercise below, we will convolve a sample from the normal +distribution (white noise) with a double exponential impulse response +function. Such a function acts as a low pass filter, so the resultant +output will look considerably smoother than the input. You can use +\texttt{numpy.convolve} to perform the convolution numerically. + +We also explore the important relationship that a convolution in the +tempoeral (or spatial) domain becomes a multiplication in the spectral +domain, which is mathematically much easier to work with. +\[ +Y = R*X +\] + +where $Y$, $X$, and $R$ are the Fourier transforms of the respective +variable in the temporal convolution equation above. The Fourier +transform of the impulse response function serves as an amplitude +weighting and phase shifting operator for each frequency component. +Thus, we can get deeper insight into the effects of impulse response +function $r$ by studying the amplitude and phase spectrum of its +transform $R$. In the example below, however, we simply use the +multiplication property to perform the same convolution in Fourier +space to confirm the numerical result from \textt{numpy.convolve}. + \lstinputlisting[label=code:convolution_demo,caption={IGNORED}]{skel/convolution_demo_skel.py} Modified: trunk/py4science/workbook/intro_stats.tex =================================================================== --- trunk/py4science/workbook/intro_stats.tex 2007-10-26 20:21:23 UTC (rev 4031) +++ trunk/py4science/workbook/intro_stats.tex 2007-10-27 04:00:59 UTC (rev 4032) @@ -1 +1,31 @@ -TODO \ No newline at end of file +\textt{R}, a statistical package based on \textt{S}, is viewd by some +as the best statistical software on the planet, and in the open source +world it is the clear choice for sophisticated statistical analysis. +Like python, \textt{R} is an interpreted language written in C with an +interactive shell. Unlike python, which is a general purpose +programming language, \textt{R} is a specialized statistical language. +Since python is a excellent glue language, with facilities for +providing a transparent interface to FORTRAN, C, C++ and other +languages, it should come as no surprise that you can harness +\textt{R}'s immense statistical power from python, through the +\texttt{rpy} third part extension library. + +However, \textt{R} is not without its warts. As a language, it lacks +python's elegance and advanced programming constructs and idioms. It +is also GPL, which means you cannot distribute code based upon it +unhindered: the code you distribute must be GPL as well (python, and +the core scientific extension libraries, carry a more permissive +license which support distribution in closed source, proprietary +application). + +Fortunately, the core tools scientific libraries for python (primarily +\textt{numpy} and \texttt{scipy.stats}) provide a wide array of +statistical tools, from basic descriptive statistics (mean, variance, +skew, kurtosis, correlation, \dots) to hypothesis testing (t-tests, +$\Chi$-Square, analysis of variance, general linear models, \dots) to +analytical and numerical tools for working with almost every discrete +and continuous statistical distribution you can think of (normal, +gamma, poisson, weibull, lognormal, levy stable, \dots). + + + Modified: trunk/py4science/workbook/stats_descriptives.tex =================================================================== --- trunk/py4science/workbook/stats_descriptives.tex 2007-10-26 20:21:23 UTC (rev 4031) +++ trunk/py4science/workbook/stats_descriptives.tex 2007-10-27 04:00:59 UTC (rev 4032) @@ -1,7 +1,49 @@ \section{Descriptive statistics} \label{sec:stats_descriptives} +The first step in any statistical analysis should be to describe, +charaterize and importantly, visualize your data. The normal +distribution (aka Gaussian or bell curve) lies at the heart of much of +formal statistical analysis, and normal distributions have the tidy +property that they are completely characterized by their mean and +variance. As you may have observed in your interactions with family +and friends, most of the world is not normal, and many statistical +analyses are flawed by summarizing data with just the mean and +standard deviation (square root of variance) and associated +signficance tests (eg the T-Test) as if it were normally distributed +data. +In the exercise below, we write a class to provide descriptive +statistics of a data set passed into the constructor, with class +methods to pretty print the results and to create a battery of +standard plots which may show structure missing in a casual analysis. +Many new programmers, or even experienced programmers used to a +proceedural environment, are uncomfortable with the idea of classes, +having hear their geekier programmer friends talk about them but not +really sure what to do with them. There are many interesting things +one can do with classes (aka object oriented programming) but at their +hear they are a way of bundling data with methods that operate on that +data. The \texttt{self} variable is special in python and is how the +class refers to its own data and methods. Here is a toy example + +\begin{lstlisting} + +In [115]: class MyData: + .....: def __init__(self, x): + .....: self.x = x + .....: def sumsquare(self): + .....: return (self.x**2).sum() + .....: + .....: + +In [116]: nse = npy.random.rand(100) + +In [117]: mydata.sumsquare() +Out[117]: 29.6851135284 + +\end{lstlisting} + + \lstinputlisting[label=code:stats_descriptives_skel,caption={IGNORED}]{skel/stats_descriptives_skel.py} \begin{figure} Modified: trunk/py4science/workbook/stats_distributions.tex =================================================================== --- trunk/py4science/workbook/stats_distributions.tex 2007-10-26 20:21:23 UTC (rev 4031) +++ trunk/py4science/workbook/stats_distributions.tex 2007-10-27 04:00:59 UTC (rev 4032) @@ -1,9 +1,83 @@ \section{Statistical distributions} \label{sec:stats_distributions} +We explore a handful of the statistical distributions in +\texttt{scipy.stats} module and the connections between them. The +organization of the distribution functions in \texttt{scipy.stats} is +quite elegant, with each distribution providing random variates +(\texttt{rvs}), analytical moments (mean, variance, skew, kurtosis), +analytic density (\texttt{pdf}, \texttt{cdf}) and survival functions +(\texttt{sf}, \textt{isf}) (where available) and tools for fitting +empirical distributions to the analytic distributions (\textt{fit}). +in the exercise below, we will simulate a radioactive particle +emitter, and look at the empirical distribution of waiting times +compared with the expected analytical distributions. Our radioative +particle emitter has an equal likelihood of emitting a particle in any +equal time interval, and emits particles at a rate of 20~Hz. We will +discretely sample time at a high frequency, and record a 1 of a +particle is emitted and a 0 otherwise, and then look at the +distribution of waiting times between emissions. The probability of a +particle emission in one of our sample intervals (assumed to be very +small compared to the average interval between emissions) is +proportional to the rate and the sample interval $\Delta t$, ie +$p(\Delta t) = \alpha \Delta t$ where $\alpha$ is the emission rate in +particles per second. +\begin{lstlisting} +# a uniform distribution [0..1] +In [62]: uniform = scipy.stats.uniform() + +# our sample interval in seconds +In [63]: deltat = 0.001 + +# the emission rate, 20Hz +In [65]: alpha = 20 + +# 1000 random numbers +In [66]: rvs = uniform.rvs(1000) + +# a look at the 1st 20 random variates +In [67]: rvs[:20] +Out[67]: +array([ 0.71167172, 0.01723161, 0.25849255, 0.00599207, 0.58656146, + 0.12765225, 0.17898621, 0.77724693, 0.18042977, 0.91935639, + 0.97659579, 0.59045477, 0.94730366, 0.00764026, 0.12153159, + 0.82286929, 0.18990484, 0.34608396, 0.63931108, 0.57199175]) + +# we simulate an emission when the random number is less than +# p(Delta t) = alpha * deltat +In [84]: emit = rvs < (alpha * deltat) + + +# there were 3 emissions in the first 20 observations +In [85]: emit[:20] +Out[85]: +array([False, True, False, True, False, False, False, False, False, + False, False, False, False, True, False, False, False, False, + False, False], dtype=bool) +\end{lstlisting} + +The waiting times between the emissions should follow an exponential +distribution (see \texttt{scipy.stats.expon}) with a mean of +$1/\alpha$. In the exercise below, you will generate a long array of +emissions, compute the waiting times between emissions, between 2 +emissions, and between 10 emissions. These should approach an 1st +order gamma (aka exponential) distribution, 2nd order gamma, and 10th +order gamma (see \texttt{scipy.stats.gamma}). Use the probability +density functions for these distributions in \texttt{scipy.stats} to +compare your simulated distributions and moments with the analytic +versions provided by \texttt{scipy.stats}. With 10 waiting times, we +should be approaching a normal distribution since we are summing 10 +waiting times and under the central limit theorem the sum of +independent samples from a finite variance process approaches the +normal distribution (see \texttt{scipy.stats.norm}). In the final +part of the exercise below, you will be asked to approximate the 10th +order gamma distribution with a normal distribution. The results +should look something like those in +Figure~\ref{fig:stats_distributions}. + \lstinputlisting[label=code:stats_distributions_skel,caption={IGNORED}]{skel/stats_distributions_skel.py} \begin{figure} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |