From: Adam S. <as...@ad...> - 2003-04-12 15:12:33
|
(warning: newbie posting ahead) Sorry if this was covered already, but I couldn't find a discussion about timezone handling in the Date field spec, nor in the archives of the mailing list. Some yucky issues... 1) what are the legal timezones & legal tz representations? (secret issue: timezones change definition) 2) what about timezone conversion? and timezone printing? 3) daylight savings time? and if it would help... - probably a good idea to support unknown timezones, i.e. newly added ones -- but what happens then? - annoyingly, governments change timezone definitions from time to time, so it's important to have a little flexibility here. As you can imagine, this creates portability (OS,PL) nightmares, with some upgraded and others not. - daylight savings time is truly a mess: treat it as a timezone issue, i.e. the tz name encodes whether you're in DST or not. e.g. the "PDT" timezone means it's on DST, PST is not on DST in other words, there's a catch-22: - if you punt on timezone handling i.e. the tz part of Date fields is a non-whitespace string of any length, and we don't parse it, then users will seek out other libraries (e.g. Date::Manip) to get their work done, which will result in portability and correctness issues. - if you add timezone parsing, e.g. conversion to UCT/GMT, then you have to deal with conversion issues, which are yucky. a modest proposal: - support parsing/conversion from a minimum set of timezones, ones that are unlikely to ever change because companies would revolt against their governments due to conversion costs, e.g. EST,CST,MST,PST,etc. - support parsing/conversion from +/-NNNN timezone format. - allow implementations to have mechanism(s) for adding new timezones, incl. the ones provided by the OS - when parsing, allow an option to barf on unknown timezone as well as a flag to silently treat it as UCT/GMT. By default, it barfs. - add Unix date output as a supported format, so users can "cut and paste" - add seconds-since-the-epoch (GMT only) as a supported format. hope this helps, adam ps. hi Brian-- you may remember me as the guy whose startup makes the high-volume logging software for companies like yahoo! (which uses Inline.pm to reuse their C libraries inside the LMS) pps. not that we've been through hell over timezone issues in the random logs customers want us to support... no sirree... ;-) |
From: Mike O. <ms...@oz...> - 2003-04-12 17:31:11
|
On Sat, Apr 12, 2003 at 08:04:27AM -0700, Adam Sah wrote: > Sorry if this was covered already, but I couldn't find a discussion > about timezone handling in the Date field spec, nor in the archives > of the mailing list. > > - support parsing/conversion from +/-NNNN timezone format. Where are you finding a Date field? The !timestamp field follows ISO 8601, so it uses +/-NN(:NN)? only, with a special variant "Z" for UTC. So it has no knowledge of word abbreviations. http://yaml.org/type/timestamp/ There was a long discussion about !timestamp around December. It should be in the archive, possibly under a different subject. Or look through all the threads started by ir...@ms... (my old address). The summary is: Types like !timestamp used to be in the core spec, the time+TZ portion was required, and loaders automatically converted anything that looked like a time specification to a native DateTime object ("implicit typing"). That led to multiple problems. - What if you want it left as a string? - What if you want only the date portion? - What if you want to keep the date and time in separate fields? - What if you want to leave off the timezone (implying local time)? - Does a date-only representation imply midnight or noon? - Do absolute times, relative times and intervals require different types? - What if you want to specify only the year? Or the year-month? Do these need special types? Etc. - Do we need a date/time type at all, or is it too application-specific? The problem of "false positives" in implicit typing proved so great that all the specialized types were moved out of the spec to http://yaml.org/types/ . Now only !str, !seq and !map remain. The problem with not specifying the time zone is the data becomes invalid if it's later moved to a different timezone. Personally I don't think that's a problem because many applications spend their entire lifetime in one time zone, and for others there's a location field that implies the time zone (customer lives in Virginia, band plays in Washington, etc), but you have to compromise somewhere... The compromise with !timestamp was to reaffirm its inclusion, not have other date/time types (at least not right now), allow a date-only representation (implying noon UTC), require the numeric time zone, and say "this type doesn't do everything. If you need something this type doesn't provide, you'll have to define a private type or parse the string yourself." In particular that means if you choose the date-only feature, there is no time-only type so you have to handle that field on your own. The biggest inconvenience is configuration files, where users don't want to type "-08:00" all the time. The date-only representation implies noon UTC, whose proponents claim "allows the time point to retain its date, regardless of the time zone used." (I would have preferred midnight, but...) The !timestamp page links to a W3 note on ISO 8601 http://www.w3.org/TR/NOTE-datetime which links to a summary of date/time issues http://www.cl.cam.ac.uk/~mgk25/iso-time.html there in the Time zone section it says, "There exists no international standard that specifies abbreviations for civil time zones like CET, EST, etc. and sometimes the same abbreviation is even used for two very different time zones. In addition, politicians enjoy modifying the rules for civil time zones, especially for daylight saving times, every few years, so the only really reliable way of describing a local time zone is to specify numerically the difference of local time to UTC." If you really want to use timezone abbreviations anyway, the best reference is the Unix timezone library. The maintainers have spent years cataloging reasonable timezones for every locale, it ships with every Unix system, and it automatically adjusts the local time when daylight savings begins and ends. But even that is not necessarily 100% foolproof, since the choice of timezone and when DST starts/ends is arbitrary. I read that Indiana has a funny situation where a small part of the state follows DST but the rest of the state doesn't, so it *changes time zone* twice a year to remain at -06:00. -- -Mike (Iron) Orr, ms...@oz... (ir...@se...) English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Oren Ben-K. <or...@be...> - 2003-04-12 21:37:32
|
Mike Orr wrote: > The compromise with !timestamp was to reaffirm its inclusion, > not have other date/time types (at least not right now), > allow a date-only representation (implying noon UTC), require > the numeric time zone, > and say "this type doesn't do everything. If you need > something this type doesn't provide, you'll have to define a > private type or parse the string yourself." +1 > In particular > that means if you choose the date-only feature, there is no > time-only type so you have to handle that field on your own. There's a feature I'm promoting that handles this; it suggests allowing the use of ':' to indicate base 60 in numbers (integers and floats). This is useful for writing 08:00 for time, 90:00:00.00 for angles, etc. The idea is that time-only is just a number, in some particular units (e.g., seconds). The problem of numbers with units is a general one; it applies for time, currencies, distances, angles, densities... We really don't have a good solution for this problem. I doubt that there is such a solution... I was once in charge of developing a whole set of meshing applications and frameworks that dealt with graphic, geographical and time data. We set down an iron rule that *all* time is in seconds, *all* lengths are in meters, and *all* angles are in seconds. While specifying font size as 0.005 seemed weird at first, being very strict about this saved us untold grief when data was moved around (e.g., there were arithmetic operations mixing font size with geographical feature size when drawing maps - at a given scale). Alas, such an approach can only work in a tightly controlled system. Even NASA is "too big" to be able to get away with this approach :-) Another solution that doesn't work is demanding that all units be explicit. When units are explicit, they follow the number, e.g. "10.51 cm" or "2.99 USD". Now, there's a standard set of unit names for physical units (SI - http://physics.nist.gov/cuu/Units/units.html). However, even for physics, there are also imperial units. And physics is the least of our problems - there are computer units (K bit vs. K byte is fun)... The various ways angles can be measured... A horde of currency codes, with their added twist that the conversion rate depends on the date, and the set of currencies is dynamic... Application specific units... Historical units... The list is endless. Since there's only one physical namespace (all three-letter shorthands) collisions simply can't be avoided. It is therefore inevitable that the application must have *some* notion of the expected units of each and every number it reads, if only to decide which "domain" of possible unit names to use for it. Given the application must make such a decision anyway, most people just take the path of least resistance and have the application demand the use of a particular unit for each field. Back to time, it follows that there's nothing special in demanding that a "time" field be expressed, say, in minutes. All that's left is a convenient way for the author to express "eight and a half hours". That's what the ':' proposal allows; instead of writing this field as "510", it allows writing it in base 60 as "8:30". If the application interprets the field in, say, seconds, the same value must be written as "30600" or "8:30:00". This is completely equivalent to having a length field that may be written as "01" or "1" if the application reads it in meters and "0144" or "100" if it reads it in centimeters. > The biggest inconvenience is configuration files, where users > don't want to type "-08:00" all the time. That's admittedly a problem. However if we define the omission of a time zone to mean "the local time zone", we make YAML documents change their semantics whenever they get moved around the globe. Clark suggested we allow the omission of the time zone, and take it to implicitly mean UTC; this would allow most people to just ignore time zones, as long as coders ensure all I/O is done in UTC. True, the "real" semantics of the file isn't what they'd expect, but this wouldn't "normally" hurt anyone. > The date-only representation implies noon UTC, whose > proponents claim "allows the time point to retain its date, > regardless of the time zone used." (I would have preferred > midnight, but...) This is my proposal. It is still somewhat controversial, because it makes conversions to/from a full timestamp slightly more complex. I still think it is the best compromise, but if someone has a good argument against it I'm willing to be convinced. BTW, midnight is ambiguous - do you mean 00:00 or 24:00? :-) Have fun, Oren Ben-Kiki |
From: Mike O. <ms...@oz...> - 2003-04-13 03:28:37
|
On Sun, Apr 13, 2003 at 12:37:23AM +0300, Oren Ben-Kiki wrote: > There's a feature I'm promoting that handles this; it suggests allowing > the use of ':' to indicate base 60 in numbers (integers and floats). > This is useful for writing 08:00 for time, 90:00:00.00 for angles, etc. > The idea is that time-only is just a number, in some particular units > (e.g., seconds). That's not a bad idea. I guess '08:00' can be just as validly represented by 480 (= 8 * 60) as by '08:00' or Time(8, 0, 0). As long as the program is expecting "# minutes past midnight". However, realistically any application that needs to do something with the time will need a time object, and most time objects do not have constructors for "# minutes past midnight" or "# seconds past midnight". > > The biggest inconvenience is configuration files, where users > > don't want to type "-08:00" all the time. > > That's admittedly a problem. However if we define the omission of a time > zone to mean "the local time zone", we make YAML documents change their > semantics whenever they get moved around the globe. For many applications, this is a theoretical problem that doesn't make any difference. If I say '2003-04-12T07:55:00', I want it to store that number, period. It's none of YAML's concern whether it's this time zone or that time zone, so the tz is really "unknown" (null). That of course prevents it from being compared with a tz-fixed time, and it means all !timestamps in the world can't be compared with each other, but so what? I don't see anything in the YAML API that ever compares two !timestamps anyway. > Clark suggested we allow the omission of the time zone, and take it to > implicitly mean UTC; this would allow most people to just ignore time > zones, as long as coders ensure all I/O is done in UTC. True, the > "real" semantics of the file isn't what they'd expect, but this > wouldn't "normally" hurt anyone. This sounds like a theoretical solution to the theoretical problem. If YAML insists on tz-tagging all !timestamps, zulu time is better than anything else. However, I think I'd rather have my tz-blind time load into a tz-blind object, and my tz-fixed time load into a tz-fixed object, rather than having the time silently changed. (Python 2.3 will offer both, and the older 'time' module likewise works with both.) For an example of tz-blind times, consider airline schedules. Departure and arrival times are given in the respective local times. It's your responsibility to determine which zones those are and what effect that has on the flight time. > > The date-only representation implies noon UTC, whose > > proponents claim "allows the time point to retain its date, > > regardless of the time zone used." (I would have preferred > > midnight, but...) > > This is my proposal. It is still somewhat controversial, because it > makes conversions to/from a full timestamp slightly more complex. I > still think it is the best compromise, but if someone has a good > argument against it I'm willing to be convinced. BTW, midnight is > ambiguous - do you mean 00:00 or 24:00? :-) Both arguments are the same. All digital clocks I have ever seen count the day as 00:00 - 23:59. There is no 24:00. The second W3C article I cited also brings up the "two midnights", but when in practice does that ever happen? Society seems to have determined that 12:00am is midnight (the first minute in the day), 12:00pm is noon, and 11:59pm is the last minute. Again, all the schedules I have ever seen use that system. Having a date-only field be midnight (00:00) makes it trivial to treat a companion time field as an offset and to add the two. Having the date field be noon means you have to inexplicably subtract twelve hours. -- -Mike (Iron) Orr, ms...@oz... (ir...@se...) English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Oren Ben-K. <or...@be...> - 2003-04-14 06:47:51
|
> That's not a bad idea. I guess '08:00' can be just as validly > represented by 480 (= 8 * 60) as by '08:00' or Time(8, 0, 0). > As long as the program is expecting "# minutes past midnight". > > However, realistically any application that needs to do > something with the time will need a time object, and most > time objects do not have constructors for "# minutes past > midnight" or "# seconds past midnight" AFAIK using the # of seconds is a common way to construct a time object saying "08:30" (its definitely the only way to handle a time_t, for example). It is also trivial to use the # of seconds for constructors using any other convention. At any rate I don't think this is a problem in practice; no single method will work for every system anyway. > This sounds like a theoretical solution to the theoretical > problem. If YAML insists on tz-tagging all !timestamps, zulu > time is better than anything else. However, I think I'd > rather have my tz-blind time load into a tz-blind object, and > my tz-fixed time load into a tz-fixed object, rather than > having the time silently changed. (Python 2.3 will offer > both, and the older 'time' module likewise works with both.) I'm uncomfortable with requiring the notion of a time-zone blind timestamp (date + time). Usually when a library offers a date/timestamp data type it is always in the context of a specific time zone. It is nice that Python has it, but AFAIK Java doesn't. It seems Ruby doesn't either. While perl probably has a date module that does anything you want, but the built-in date/time mechanism in POSIX (or any other OS I know the details of) doesn't have the notion of a timezone-blind timestamp. > For an example of tz-blind times, consider airline schedules. > Departure and arrival times are given in the respective local > times. Exactly - *not* in a magical "blind" time zone. > It's your responsibility to determine which zones > those are and what effect that has on the flight time. That is acceptable in a human-readable document, but not in a computer-readable document. > Having a date-only field be midnight (00:00) makes it trivial > to treat a companion time field as an offset and to add the > two. Having the date field be noon means you have to > inexplicably subtract twelve hours. Like I said, the "noon" is controversial - it isn't set in stone (yet). And admittedly subtracting half a day in the above case is magical. Of course if you want a timestamp you might as well just use one (date + time) rather than splitting it to two... The idea of making the date be the UTC noon timestamp is an attempt to provide the *effect* of a time-zone-blind date object in implementations that mandate a time zone for every timestamp. No matter what time zone is attached to it, the date part remains the same. Of course, if you never convert the date to a timestamp (because your library has a pure date object - Java doesn't, but I think that Ruby does), the question doesn't even arise. Have fun, Oren Ben-Kiki |