From: John H. <jdh...@ac...> - 2004-09-16 15:17:30
|
First, a general note about python2.2. It is becoming difficult to maintain adequate support for python2.2. The pyparsing module, on which mathtext relies, currently requires 2.3. Handling dates properly requires the datetime module (or mx.datetime but I'm not inclined to impose an external dependency). I am inclined to gently drop support for 2.2. By gently, I mean that some features will no longer work (mathtext and dates) but the core should, at least for the near future. How many people would this adversely affect? The dates module, aside from a bug that I patched yesterday in response to a post by Jim Boyle, has two fundamental problems: no timezone support and the date range supported by the built-in time functions (the 1972 epoch) is too narrow. Both of these limitations are imposed by trying to support python2.2. I would like to rewrite the dates module, and the ticker functions for dates, to use the python datetime module. Getting dates, timezones, and daylight savings time right is non-trivial, and I think the cleanest approach is to require python2.3 and datetime. Ie, I would jettison support for epoch times and mx datetimes, as well as the converter stuff. The new plot_date signature would be def plot_date(self, d, y, fmt='bo', **kwargs): where d would be an array of floats (no converter) and the floats would be the number of days since 1,1,1 (Gregorian calendar). The supported date range would be datetime.min to datetime.max (years 0001 - 9999). The dates module would provide some helper functions so that you could use to build date arrays from datetime and timedelta instances. It not be too hard to add some helper functions to convert existing epoch, mx, or datetime arrays to the required array of days floats. timezones, including timezones other than the local one, would be supported. Ie, if you are a financial guru in California, you could work with Eastern time zone stock quotes or Central time zone pork belly quotes. daylight savings time, etc, would be handled by the datetime module. The datetime module has functions toordinal and fromordinal to convert to integer number of days since the start of the Gregorian calendar, but not floating point. Ie, hours minutes, seconds, etc are lost. My guess is that it is done this way to avoid imprecisions in floating point, but am not sure. I have implemented to_ordinalf and from_ordinalf to do these conversions preserving the hours, etc. They seem to work. I occasionally get rounding error on the order of a couple microseconds, which I think should be tolerable for the vast majority of cases. If you need microsecond precision, you can use plot and not plot_date in any case. Below, I'm including some prototype code which does these conversions - if you have interest or experience with dates and timezones, please look over it to see if I'm making any fundamental or conceptual errors. There is also a function drange, which can be used to construct the floating point days arrays plot_date would require. Any other suggestions for improvement or changes to date handling welcome. Speak now, or forever hold your peace! JDH import sys, datetime from matplotlib.numerix import arange from matplotlib.dates import Central, Pacific, Eastern, UTC HOURS_PER_DAY = 24. MINUTES_PER_DAY = 60.*HOURS_PER_DAY SECONDS_PER_DAY = 60.*MINUTES_PER_DAY MUSECONDS_PER_DAY = 1e6*SECONDS_PER_DAY #tz = None tz = Pacific #tz = UTC def close_to_dt(d1, d2, epsilon=5): 'assert that datetimes d1 and d2 are within epsilon microseconds' delta = d2-d1 mus = abs(delta.days*MUSECONDS_PER_DAY + delta.seconds*1e6 + delta.microseconds) assert(mus<epsilon) def close_to_ordinalf(o1, o2, epsilon=5): 'assert that float ordinals o1 and o2 are within epsilon microseconds' delta = abs((o2-o1)*MUSECONDS_PER_DAY) assert(delta<epsilon) def to_ordinalf(dt): """ convert datetime to the Gregorian date as UTC float days, preserving hours, minutes, seconds and microseconds. return value is a float """ if dt.tzinfo is not None: delta = dt.tzinfo.utcoffset(dt) if delta is not None: dt -= delta base = dt.toordinal() return (base + dt.hour/HOURS_PER_DAY + dt.minute/MINUTES_PER_DAY + dt.second/SECONDS_PER_DAY + dt.microsecond/MUSECONDS_PER_DAY ) def from_ordinalf(x, tz=None): """ convert Gregorian float of the date, preserving hours, minutes, seconds and microseconds. return value is a datetime """ ix = int(x) dt = datetime.datetime.fromordinal(ix) remainder = x - ix hour, remainder = divmod(24*remainder, 1) minute, remainder = divmod(60*remainder, 1) second, remainder = divmod(60*remainder, 1) microsecond = int(1e6*remainder) dt = datetime.datetime(dt.year, dt.month, dt.day, int(hour), int(minute), int(second), microsecond, tzinfo=UTC()) if tz is not None: return dt.astimezone(tz) else: return dt def drange(dstart, dend, delta): """ Return a date range as float gregorian ordinals. dstart and dend are datetime instances. delta is a datetime.timedelta instance """ step = delta.days + delta.seconds/SECONDS_PER_DAY + delta.microseconds/MUSECONDS_PER_DAY f1 = to_ordinalf(dstart) f2 = to_ordinalf(dend) return arange(f1, f2, step) dt = datetime.datetime(1011, 10, 9, 13, 44, 22, 101010, tzinfo=tz) x = to_ordinalf(dt) newdt = from_ordinalf(x, tz) close_to_dt(dt, newdt) date1 = datetime.datetime( 2000, 3, 2, tzinfo=tz) date2 = datetime.datetime( 2000, 3, 5, tzinfo=tz) delta = datetime.timedelta(hours=8) print drange(date1, date2, delta) d1 = datetime.datetime( 2000, 3, 2, 4, tzinfo=tz) d2 = datetime.datetime( 2000, 3, 2, 12, tzinfo=UTC()) o1 = to_ordinalf(d1) o2 = to_ordinalf(d2) close_to_ordinalf(o1, o2) print 'all tests passed' |
From: Shin, D. <sd...@em...> - 2004-09-16 16:35:57
|
> I would like to rewrite the dates module, and the ticker functions for > dates, to use the python datetime module. Getting dates, timezones, > and daylight savings time right is non-trivial, and I think the > cleanest approach is to require python2.3 and datetime. Ie, I would > jettison support for epoch times and mx datetimes, as well as the > converter stuff. I agree that datetime is the cleanest approach. Always, better to choose standard libraries than 3rd party ones. > > The new plot_date signature would be > > def plot_date(self, d, y, fmt='bo', **kwargs): > > where d would be an array of floats (no converter) and the floats > would be the number of days since 1,1,1 (Gregorian calendar). The > supported date range would be datetime.min to datetime.max (years 0001 > - 9999). Actually, MATLAB adopts the same approach using floats for date and time. There, the epoch is 0000/00/00. The datenum and datestr functions provide the conversion between floats and strings. > > The dates module would provide some helper functions so that you could > use to build date arrays from datetime and timedelta instances. It > not be too hard to add some helper functions to convert existing > epoch, mx, or datetime arrays to the required array of days floats. > > timezones, including timezones other than the local one, would be > supported. Ie, if you are a financial guru in California, you could > work with Eastern time zone stock quotes or Central time zone pork > belly quotes. daylight savings time, etc, would be handled by the > datetime module. I have developed similar functions to convert an array of date or datetime to array of numbers. To avoid imprecision of floating points, I used seconds since 1900/1/1 without considering time zone. For my own domain, hydrologic modeling, I am rarely concerned about time zone. Based on my experience, I noticed the following things. The function names to_ordinalf and from_ordinalf are difficult to remember. How about just time2num and num2time? In addition, how about modifying the functions to handle an array of numbers or datetimes directly using map function? And the from_ordinalf function will produce an error message if date object, not datetime object, is tossed. I also recommend to provide strftime and isoformat functions to handle array of floating points directly. Thanks for your effort. Daehyok Shin |
From: Chris B. <Chr...@no...> - 2004-09-16 17:03:23
|
>>I think the >>cleanest approach is to require python2.3 and datetime. Ie, I would >>jettison support for epoch times and mx datetimes, as well as the >>converter stuff. +1 on this. >> def plot_date(self, d, y, fmt='bo', **kwargs): >> >>where d would be an array of floats (no converter) and the floats >>would be the number of days since 1,1,1 (Gregorian calendar). The >>supported date range would be datetime.min to datetime.max (years 0001 >>- 9999). > > Actually, MATLAB adopts the same approach using floats for date and time. I think you should generally not blindly apply the MATLAB approach. Python is a more powerful language than MATLAB, and support for more than just doubles is one it's features! How does the datetime module store the values in a datetime object? My first inclination would be to follow it's approach, but it may not be suited to arrays ( :-( ) A couple projects you might want to make use of: http://pytz.sourceforge.net/ and: https://moin.conectiva.com.br/DateUtil -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: John H. <jdh...@ac...> - 2004-09-16 18:40:56
|
>>>>> "Chris" == Chris Barker <Chr...@no...> writes: Chris> I think you should generally not blindly apply the MATLAB Chris> approach. Python is a more powerful language than MATLAB, Chris> and support for more than just doubles is one it's Chris> features! Agreed. That's why we're here, after all. Chris> How does the datetime module store the values in a datetime Chris> object? My first inclination would be to follow it's Chris> approach, but it may not be suited to arrays ( :-( ) From the header of datetime.h /* Fields are packed into successive bytes, each viewed as unsigned and * big-endian, unless otherwise noted: * * byte offset * 0 year 2 bytes, 1-9999 * 2 month 1 byte, 1-12 * 3 day 1 byte, 1-31 * 4 hour 1 byte, 0-23 * 5 minute 1 byte, 0-59 * 6 second 1 byte, 0-59 * 7 usecond 3 bytes, 0-999999 * 10 */ I am concerned about the performance and memory hit of using, for example, a list of a list of datetime objects rather than a numeric array of floats. In the float representation, you can efficiently create a large date range array with, for example, the drange code I posted. Another concern is implementation: I would have to special case setting the tick limits and locations, eg calls to set_xlim and set_xticks, to check for datetime instances. I'm not totally wed to the float array approach however, and it is possible to handle datetime conversions as special cases in the axis functions if the consensus is that this would be better. Chris> A couple projects you might want to make use of: Chris> http://pytz.sourceforge.net/ Chris> and: Chris> https://moin.conectiva.com.br/DateUtil These look very nice and their is a good likelihood I'll use both. DateUtil can extend and improve the date tickers considerably, and the pytz classes should keep users happy around the world! Thanks, JDH |
From: Shin, D. <sd...@em...> - 2004-09-19 20:54:32
|
> I am concerned about the performance and memory hit of using, for > example, a list of a list of datetime objects rather than a numeric > array of floats. In the float representation, you can efficiently > create a large date range array with, for example, the drange code I > posted. Yes. For intensive operations on time series, a list of datetime objects will slow overall performance down significantly. Sometimes, I dream a sort of time array in numarray like string array, or object array. It can be implemented by holding a datetime object indicating an epoch and an array of floating points indicating intervals from the epoch. Daehyok Shin |