Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
Close
From: Chris Withers <chris@si...>  20080314 17:13:23

Hi All, Say I have data that looks like: date x y z 20080101 10 20080102 21 11 20080102 32 15 5 How can I plot it such that all three lines are plotted by that it's apparent two of them are missing some data? (I know I could just sub in zeros for the missing values, but I'd like the point not to be there, not just down the bottom of the graph...) cheers, Chris  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: Eric Firing <efiring@ha...>  20080314 17:45:38

Chris, Use masked arrays. See masked_demo.py in the mpl examples subdirectory. Eric Chris Withers wrote: > Hi All, > > Say I have data that looks like: > > date x y z > 20080101 10 > 20080102 21 11 > 20080102 32 15 5 > > How can I plot it such that all three lines are plotted by that it's > apparent two of them are missing some data? > (I know I could just sub in zeros for the missing values, but I'd like > the point not to be there, not just down the bottom of the graph...) > > cheers, > > Chris > 
From: Chris Withers <chris@si...>  20080317 10:36:37

Eric Firing wrote: > Chris, > > Use masked arrays. See masked_demo.py in the mpl examples subdirectory. Hi Eric, I took a look at that, but it uses: import matplotlib.numerix.npyma as ma ...and matplotlib.numerix isn't listed in the API reference. Where are the docs for this? Specifically, what I have is an array like so: ['','','',1.1,2.2] I want to mask the strings out so I don't get ValueErrors raised when I call plot functions with that array. How should I do that? cheers, Chris  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: Eric Firing <efiring@ha...>  20080317 18:14:18

Chris Withers wrote: > Eric Firing wrote: >> Chris, >> >> Use masked arrays. See masked_demo.py in the mpl examples subdirectory. > > Hi Eric, > > I took a look at that, but it uses: > > import matplotlib.numerix.npyma as ma > > ...and matplotlib.numerix isn't listed in the API reference. Where are > the docs for this? numerix is obsolete, and numerix.npyma was a temporary method to provide access to either of two masked array implementations. It is probably time for me to remove it from the examples. Substitute import numpy.ma as ma The ma module is documented as part of numpy. > > Specifically, what I have is an array like so: > > ['','','',1.1,2.2] Try something like this: import numpy.ma as ma from pylab import * aa = [3.4, 2.5, '','','',1.1,2.2] def to_num(arg): if arg == '': return 9999.0 return arg aanum = array([to_num(arg) for arg in aa]) aamasked = ma.masked_where(aanum==9999.0, aanum) plot(aamasked) show() Eric > > I want to mask the strings out so I don't get ValueErrors raised when I > call plot functions with that array. > > How should I do that? > > cheers, > > Chris > 
From: Eric Firing <efiring@ha...>  20080318 20:17:42

Chris Withers wrote: > Eric Firing wrote: >>> Specifically, what I have is an array like so: >>> >>> ['','','',1.1,2.2] >> >> Try something like this: >> >> import numpy.ma as ma >> from pylab import * >> >> aa = [3.4, 2.5, '','','',1.1,2.2] >> def to_num(arg): >> if arg == '': >> return 9999.0 >> return arg >> >> aanum = array([to_num(arg) for arg in aa]) >> aamasked = ma.masked_where(aanum==9999.0, aanum) >> plot(aamasked) >> show() > > What I ended up doing was getting my array to look like: > > from numpy import nan > aa = [3.4,2.5,nan,nan,nan,1.1,2.2] > values = numpy.array(aa) > values = numpy.ma.masked_equal(values,nan) This is not doing what you think it is, because any logical operation with a Nan returns False: In [4]:nan == nan Out[4]:False You should use numpy.masked_where(numpy.isnan(aa), aa). In some places in mpl, nans are treated as missing values, but this is not uniformly true, so it is better not to count on it. Your values array is not actually getting masked at the nans: In [7]:aa = array([1,nan,2]) In [8]:aa Out[8]:array([ 1., NaN, 2.]) In [9]:values = ma.masked_equal(aa, nan) In [10]:values Out[10]: masked_array(data = [1.0 nan 2.0], mask = [False False False], fill_value=1e+20) Eric > > I only wish that masked_equal didn't blow up when aa contains datetime > objects :( > > cheers, > > Chris > 
From: Giorgio F. Gilestro <giorgio@gi...>  20080319 17:11:37

import numpy as np a = ['','','',1.1,2.2] mask_a = [i == '' for i in a] b = np.ma.MaskedArray(a, mask=mask_a) Chris Withers wrote: > Eric Firing wrote: > >> Chris, >> >> Use masked arrays. See masked_demo.py in the mpl examples subdirectory. >> > > Hi Eric, > > I took a look at that, but it uses: > > import matplotlib.numerix.npyma as ma > > ...and matplotlib.numerix isn't listed in the API reference. Where are > the docs for this? > > Specifically, what I have is an array like so: > > ['','','',1.1,2.2] > > I want to mask the strings out so I don't get ValueErrors raised when I > call plot functions with that array. > > How should I do that? > > cheers, > > Chris > >  giorgio@... http://www.cafelamarck.it 
From: Chris Withers <chris@si...>  20080320 22:53:57

Giorgio F. Gilestro wrote: > > import numpy as np > a = ['','','',1.1,2.2] > mask_a = [i == '' for i in a] > b = np.ma.MaskedArray(a, mask=mask_a) Not very efficient, though, is it? cheers, Chris  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: Chris Withers <chris@si...>  20080318 09:34:08

Eric Firing wrote: >> Specifically, what I have is an array like so: >> >> ['','','',1.1,2.2] > > Try something like this: > > import numpy.ma as ma > from pylab import * > > aa = [3.4, 2.5, '','','',1.1,2.2] > def to_num(arg): > if arg == '': > return 9999.0 > return arg > > aanum = array([to_num(arg) for arg in aa]) > aamasked = ma.masked_where(aanum==9999.0, aanum) > plot(aamasked) > show() What I ended up doing was getting my array to look like: from numpy import nan aa = [3.4,2.5,nan,nan,nan,1.1,2.2] values = numpy.array(aa) values = numpy.ma.masked_equal(values,nan) I only wish that masked_equal didn't blow up when aa contains datetime objects :( cheers, Chris  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: Pierre GM <pgmdevlist@gm...>  20080318 20:31:46

On Tuesday 18 March 2008 16:17:08 Eric Firing wrote: > Chris Withers wrote: > > Eric Firing wrote: > You should use numpy.masked_where(numpy.isnan(aa), aa). or use masked_invalid directly (shortcut to masked_where((isnan(aa)  isinf(aa)) > > I only wish that masked_equal didn't blow up when aa contains datetime > > objects :( Could you send me an example of the kind of data you're using ? As it seems you're dealing with series indexed in time, you may want to try scikits.timeseries, a package Matt Knox and myself implemented for that very reason. 
From: Chris Withers <chris@si...>  20080319 10:42:35

Pierre GM wrote: > Could you send me an example of the kind of data you're using ? It's basically performance and volume data for a highvolume website. Unfortunately, the data is gappy in places due to data collection errors in the past... (it's important the gaps are shown, rather than trying to interpolate them away, however) > As it seems you're dealing with series indexed in time, you may want to try > scikits.timeseries, a package Matt Knox and myself implemented for that very > reason. How would this help me here and where can I find out about it? cheers, Chris  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: Eric Firing <efiring@ha...>  20080318 21:18:36

Pierre GM wrote: > On Tuesday 18 March 2008 16:17:08 Eric Firing wrote: >> Chris Withers wrote: >>> Eric Firing wrote: >> You should use numpy.masked_where(numpy.isnan(aa), aa). (I meant numpy.ma.masked_where(...)) > > or use masked_invalid directly (shortcut to masked_where((isnan(aa)  > isinf(aa)) I don't see it in numpy.ma, with numpy from svn. In any case, the fastest method is masked_where(~numpy.isfinite(aa), aa): In [1]:import numpy In [2]:xx = numpy.random.rand(10000) In [3]:xx[xx>0.8] = numpy.nan In [6]:timeit numpy.ma.masked_where(~numpy.isfinite(xx), xx) 10000 loops, best of 3: 83.9 µs per loop In [7]:timeit numpy.ma.masked_where(numpy.isnan(xx), xx) 10000 loops, best of 3: 119 µs per loop In [9]:timeit numpy.ma.masked_where((numpy.isnan(xx)numpy.isinf(xx)), xx) 1000 loops, best of 3: 260 µs per loop So, wherever you do have masked_invalid defined, you might want to use the faster implementation with ~isfinite. Eric > > >>> I only wish that masked_equal didn't blow up when aa contains datetime >>> objects :( > > Could you send me an example of the kind of data you're using ? > As it seems you're dealing with series indexed in time, you may want to try > scikits.timeseries, a package Matt Knox and myself implemented for that very > reason. > >  > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Matplotlibusers mailing list > Matplotlibusers@... > https://lists.sourceforge.net/lists/listinfo/matplotlibusers 
From: Chris Withers <chris@si...>  20080319 11:14:24

Eric Firing wrote: > This is not doing what you think it is, Indeed, I guess I was seeing nans being treated as missing values rather than being masked... > You should use numpy.masked_where(numpy.isnan(aa), aa). I am now ;) However, I'm still running into problems when I try and plot the gappy data on a filled line as follows: dates = *an array of datetimes* values = *an array containing data values and a few nans* values = numpy.ma.masked_where(numpy.isnan(values),values) xs,ys = mlab.poly_between(dates,0,values) pylab.fill(xs,ys,'r') For starters, I get this warning: numpy\core\ma.py:609: UserWarning: Cannot automatically convert masked array to numeric because data is masked in one or more locations. ...and wherever a NaN occurs in the data, the line is plotted off the top of the axes. I want it to appear at 0 if there's no data. Well, ideally just not appear at all, but I'd settle for appearing at 0... Any ideas? cheers, Chris  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: Pierre GM <pgmdevlist@gm...>  20080321 00:46:29

Chris, My 2c: Your data is indexed in time, right ? Your xaxis is a date object ? Then use scikits.timeseries http://scipy.org/scipy/scikits/wiki/TimeSeries That package was designed to take missing dates/data into account. That way, you can plot your data with the gaps already taken into account: we have written a specific matplotlib interface, you'll find the details following the link above. I must admit we didn't implement poly_between for timeseries. Most likely, we'd have to implement it for regular masked arrays first, as mentioned by Eric. What you could do is to fill your array with some kind of baseline, such as 0, or your minimum data, or wtvr. That's just a quick trick and no fix. 
From: C M <cmpython@gm...>  20080321 03:39:56
Attachments:
Message as HTML

Pierre, I was interested in learning more about TimeSeries, and had a few questions... Your data is indexed in time, right ? Your xaxis is a date object ? Just to be clear on the language: "indexed in time" means data for which the xaxis is a series of dates, correct? But I am not sure what is meant by the "xaxis being a date object"wouldn't it be a axis object with the values comprising it being date objects? I'm not trying to split hairs, I'm just unclear about the way this is typically described and it would be useful for me to be clear about it. Then use > scikits.timeseries > http://scipy.org/scipy/scikits/wiki/TimeSeries > That package was designed to take missing dates/data into account. That > way, > you can plot your data with the gaps already taken into account: we have > written a specific matplotlib interface, you'll find the details following > the link above. I've looked at the link. Could you explain what TimeSeries does that the mpl modules dates and dateutil don't do, or when one would use one versus the other? For my part, I need to simply plot values with dates (and yes with some dates missing no doubt) as the xaxis and am looking for various ways to do it well. Thank you. 
From: Chris Withers <chris@si...>  20080321 17:01:49

Pierre GM wrote: > Your data is indexed in time, right ? Your xaxis is a date object ? Then use > scikits.timeseries > http://scipy.org/scipy/scikits/wiki/TimeSeries I'm not sure what this is giving me. The dates are all python datetimes in a list already. The missing values started off as '', I turned those into nan and then created a ma with the nan's masked. What more would TimeSeries give me? > the link above. I must admit we didn't implement poly_between for timeseries. > Most likely, we'd have to implement it for regular masked arrays first, as > mentioned by Eric. OK. > What you could do is to fill your array with some kind of baseline, such as 0, > or your minimum data, or wtvr. That's just a quick trick and no fix. Indeed, that's what I had to do. I have to admit, I see some interesting things while scanning that wiki page, but nothing that would have helped me... cheers, Chris (who might well be missing something...)  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: Eric Firing <efiring@ha...>  20080319 18:37:04

Chris, Both with respect to documentation and functionality, what you are encountering is the historical aspect of masked arrays as a tackedon part of python numeric packages, and of matplotlib. Support and integration are improving, but still far from perfect. A largely new, and substantially different, implementation of masked arrays has been transplanted into numpy since the last release. Similarly, mpl got a heart transplant since the last release, and it has some implications for the way nans and masked arrays are handled. There is lots more room for fundamental work on both numpy masked arrays (e.g., moving core code to pyrex/cython or C to speed them up) and on mpl. Now with respect to your particular case here, trying to plot a filled line with gaps: poly_between has no notion of masked arrays at present. If it did, how should it behave? At the very least, additional arguments are needed to specify what should happen for filltype plotting with missing values. If we can come up with a clear description of the behaviors that should be available, then maybe we can provide them in mpl. I would be happy to fix this gap in mpl's handling of gappy data, but I can't make it a priority use of my time right now. For a quick fix, it sounds like what you need is either a function to break up your data set into gapless chunks, each of which could be plotted by a call to fill, or a function (a variant of poly_between) that would replace the gap regions with top and bottom lines at the same place (the bottom level? the xaxis?) so the whole thing could be plotted in one call to fill, provided the patch outline is suppressed. I seem to recall someone else with a similar need in the past few months, so maybe someone on the list has a readymade solution for you. Eric Chris Withers wrote: > Eric Firing wrote: >> This is not doing what you think it is, > > Indeed, I guess I was seeing nans being treated as missing values rather > than being masked... > >> You should use numpy.masked_where(numpy.isnan(aa), aa). > > I am now ;) > > However, I'm still running into problems when I try and plot the gappy > data on a filled line as follows: > > dates = *an array of datetimes* > values = *an array containing data values and a few nans* > values = numpy.ma.masked_where(numpy.isnan(values),values) > xs,ys = mlab.poly_between(dates,0,values) > pylab.fill(xs,ys,'r') > > For starters, I get this warning: > > numpy\core\ma.py:609: UserWarning: Cannot automatically convert masked > array to numeric because data is masked in one or more locations. > > ...and wherever a NaN occurs in the data, the line is plotted off the > top of the axes. I want it to appear at 0 if there's no data. Well, > ideally just not appear at all, but I'd settle for appearing at 0... > > Any ideas? > > cheers, > > Chris > 
From: Chris Withers <chris@si...>  20080320 22:53:29

Eric Firing wrote: > Both with respect to documentation and functionality, what you are > encountering is the historical aspect of masked arrays as a tackedon > part of python numeric packages, and of matplotlib. *sigh* I feel lucky ;) > Support and > integration are improving, but still far from perfect. I wish I could help, but my knowledge is lacking... > Now with respect to your particular case here, trying to plot a filled > line with gaps: poly_between has no notion of masked arrays at present. > If it did, how should it behave? Well, what I actually settled on was juat doing using: my_masked_array.filled(0) ...to plot with. > At the very least, additional > arguments are needed to specify what should happen for filltype > plotting with missing values. Indeed, what I personally would have liked was a complete gap where the data is missing, but I guess that would have to return multiple polygons, and I don't know how that would work? > provide them in mpl. I would be happy to fix this gap in mpl's handling > of gappy data, ...heh ;) > but I can't make it a priority use of my time right now. No, I understand :) cheers, Chris  Simplistix  Content Management, Zope & Python Consulting  http://www.simplistix.co.uk 
From: C M <cmpython@gm...>  20080323 04:12:52
Attachments:
Message as HTML

On Fri, Mar 21, 2008 at 9:39 AM, Pierre GM <pgmdevlist@...> wrote: > On Thursday 20 March 2008 23:39:53 you wrote: > > Pierre, > > > > I was interested in learning more about TimeSeries, and had a few > > questions... > > > > Your data is indexed in time, right ? Your xaxis is a date object ? > > > > > > Just to be clear on the language: "indexed in time" means data for > which > > the xaxis is a series of dates, correct? But I am not sure what is > meant > > by the "xaxis being a date object" wouldn't it be a axis object with > the > > values comprising it being date objects? I'm not trying to split hairs, > > I'm just unclear about the way this is typically described and it would > be > > useful for me to be clear about it. > > Sorry, I wasn't clear enough: by xaxis, I was not referring to any python > object, but generic abscissae, as in "plot rain vs time". > By indexed in time, I mean that you would have something like: > yourdata[one_date] = some_value > > That's what scikits.timeseries was designed to do: handle data indexed in > time, giving the possibility to access the data directly by dates (instead > of > using an index in an array). We made sure we could handle gaps in your > data > (viz, data not regularly spaced in time...) > > > I've looked at the link. Could you explain what TimeSeries does that > the > > mpl modules dates and dateutil don't do, or when one would use one > versus > > the other? > > Not so much useful for plotting (even if there are some cool tricks) than > for > simplifying the analysis of your data: getting for example monthly > averages > from daily data is a breeze > > > For my part, I need to simply plot values with dates (and yes with some > > dates missing no doubt) as the xaxis and am looking for various ways to > do > > it well. > > You can just stick to mpl, using plot_dates instead of plot. But you may > want > to give timeseries a try. > > > Thank you. > I will certainly give it a try, it sounds like it could really add to what I want to do. Thanks! I'll be in touch if I have questions about timeseries. 