From: Václav Šmilauer <eudoxos@ar...>  20091004 15:11:39

Hi, about a year ago I developed for my own purposes a routine for averaging irregularlysampled data using gaussian average. I would like to contribute it to matplotlib, after a cleanup, if there is interest. First it stores all data points in a (regular) grid. One can ask for value at arbitrary point within the grid range: based on stDev parameter, all data points within some distance (usersettable) are weighted using the normal distribution function and then the average is returned. Storing data in grid avoids searching all data points every time, only those within certain gridrange will be used. It would be relatively easy to extend to another distribution types (lines 120130), such as ellipticnormal distribution, with different stDev in each direction. The code is here: http://bazaar.launchpad.net/~yadedev/yade/trunk/annotate/head% 3A/lib/smoothing/WeightedAverage2d.hpp I should be able to convert it to accept numpy arrays, wrap using Python API instead of boost::python and write some documentation. Let me know if matplotlibdevel list would be more appropriate for this discussion. Cheers, Vaclav 
From: Christopher Barker <Chris.B<arker@no...>  20091004 20:27:42

Václav Šmilauer wrote: > about a year ago I developed for my own purposes a routine for averaging > irregularlysampled data using gaussian average. is this similar to Kernel Density estimation? http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html In any case, it sounds more like an appropriate contribution to scipy than MPL. > First it stores all data points in a (regular) grid. What if your data points are not on a regular grid? Wouldn't this be a way to interpolate onto such a grid? > Storing data in grid avoids searching all data points every > time, only those within certain gridrange will be used. another option would be to store the points in some sort of spatial index, like an rtree. > I should be able to convert it to accept numpy arrays, key. > wrap using Python > API instead of boost::python I'd consider Cython for that. sounds like a useful module, Chris  Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@... 
From: Robert Kern <robert.kern@gm...>  20091004 21:25:20

On 20091004 15:27 PM, Christopher Barker wrote: > Václav Šmilauer wrote: > >> about a year ago I developed for my own purposes a routine for averaging >> irregularlysampled data using gaussian average. > > is this similar to Kernel Density estimation? > > http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html No. It is probably closer to radial basis function interpolation (in fact, it almost certainly is a form of RBFs): http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#id1  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco 
From: Ryan May <rmay31@gm...>  20091005 02:18:35

On Sun, Oct 4, 2009 at 4:21 PM, Robert Kern <robert.kern@...> wrote: > On 20091004 15:27 PM, Christopher Barker wrote: >> Václav Šmilauer wrote: >> >>> about a year ago I developed for my own purposes a routine for averaging >>> irregularlysampled data using gaussian average. >> >> is this similar to Kernel Density estimation? >> >> http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html > > No. It is probably closer to radial basis function interpolation (in fact, it > almost certainly is a form of RBFs): > > http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#id1 Except in radial basis function interpolation, you solve for the weights that give the original values at the original data points. Here, it's just a inversedistance weighted average, where the weights are chosen using an exp(x^2/A) relation. There's a huge difference between the two when you're dealing with data with noise. Ryan  Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma 
From: Robert Kern <robert.kern@gm...>  20091005 02:20:53

Ryan May wrote: > On Sun, Oct 4, 2009 at 4:21 PM, Robert Kern <robert.kern@...> wrote: >> On 20091004 15:27 PM, Christopher Barker wrote: >>> Václav Šmilauer wrote: >>> >>>> about a year ago I developed for my own purposes a routine for averaging >>>> irregularlysampled data using gaussian average. >>> is this similar to Kernel Density estimation? >>> >>> http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html >> No. It is probably closer to radial basis function interpolation (in fact, it >> almost certainly is a form of RBFs): >> >> http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#id1 > > Except in radial basis function interpolation, you solve for the > weights that give the original values at the original data points. > Here, it's just a inversedistance weighted average, where the weights > are chosen using an exp(x^2/A) relation. There's a huge difference > between the two when you're dealing with data with noise. Fair point.  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco 
From: Christopher Barker <Chris.B<arker@no...>  20091005 21:26:22

VáclavŠmilauer wrote: > I checked the official terminology, it is a kernel average smoother > (in the sense of [1]) with special weight function exp((xx0)^2/const), > operating on irregularlyspaced data in 2d. > > I am not sure if that is the same as what scipy.stats.kde.gaussian_kde does, > the documentation is terse. Can I be enlightened here? It looks like they are both part of a similar class of methods: http://en.wikipedia.org/wiki/Kernel_%28statistics%29 As such, it makes sense to me to have them all in SciPy, sharing code and API. Chris  Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@... 
From: Eric Firing <efiring@ha...>  20091004 23:51:18

Václav Šmilauer wrote: > Hi, > > about a year ago I developed for my own purposes a routine for averaging > irregularlysampled data using gaussian average. I would like to > contribute it to matplotlib, after a cleanup, if there is interest. This sounds like a nice thing to have, but I am wondering whether it should go into scipy instead of mpl. On the other hand, I do see the rationale for simply putting it in mpl so it can be used for contouring etc. without adding an external dependency. > > First it stores all data points in a (regular) grid. One can ask for > value at arbitrary point within the grid range: based on stDev > parameter, all data points within some distance (usersettable) are > weighted using the normal distribution function and then the average is > returned. Storing data in grid avoids searching all data points every > time, only those within certain gridrange will be used. > Good idea. > It would be relatively easy to extend to another distribution types > (lines 120130), such as ellipticnormal distribution, with different > stDev in each direction. This sounds a lot like "optimal interpolation". Comments? Certainly, in many fields it is essential to be able to specify different std for x and y. > > The code is here: > http://bazaar.launchpad.net/~yadedev/yade/trunk/annotate/head% > 3A/lib/smoothing/WeightedAverage2d.hpp > > I should be able to convert it to accept numpy arrays, wrap using Python > API instead of boost::python and write some documentation. Let me know > if matplotlibdevel list would be more appropriate for this discussion. matplotlibusers is fine for checking for interest; it may make sense to move more technical discussion to devel. But I will ignore that for now. Is wrapping with cython a reasonable option? I don't think we have any cython in mpl at present, but it seems to me like the way to go for many things, especially new things. It is readable, flexible, and promises to make the eventual transition to Python 3.1 painless for the components that use it. I realize it is Coriented, not C++ oriented, but my understanding is that the wrapping of C++ is still reasonable in many cases. (I have not tried it.) Eric > > Cheers, Vaclav > > >  > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 912, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Matplotlibusers mailing list > Matplotlibusers@... > https://lists.sourceforge.net/lists/listinfo/matplotlibusers 
From: VáclavŠmilauer <eudoxos@ar...>  20091005 21:00:25

> >> about a year ago I developed for my own purposes a routine for averaging > >> irregularlysampled data using gaussian average. > > > > is this similar to Kernel Density estimation? > > > > http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html > > No. It is probably closer to radial basis function interpolation (in fact, it > almost certainly is a form of RBFs): > > http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#id1 I checked the official terminology, it is a kernel average smoother (in the sense of [1]) with special weight function exp((xx0)^2/const), operating on irregularlyspaced data in 2d. I am not sure if that is the same as what scipy.stats.kde.gaussian_kde does, the documentation is terse. Can I be enlightened here? Cheers, Vaclav [1] http://en.wikipedia.org/wiki/Kernel_smoothing 
From: Robert Kern <robert.kern@gm...>  20091005 21:25:17

On 20091005 15:54 PM, VáclavŠmilauer wrote: > I am not sure if that is the same as what scipy.stats.kde.gaussian_kde does, > the documentation is terse. Can I be enlightened here? gaussian_kde does kernel density estimation. While many of the intermediate computations are similar, they have entirely different purposes. http://en.wikipedia.org/wiki/Kernel_density_estimation  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco 