|
From: <pl...@pi...> - 2010-03-27 10:55:29
|
Hi, I have just tried a quick plot of some historical C14 data and got the the plot split in two. the data format is: 68 5 31 31-May-68 -24.8 560.5 3.9 NZ2206 I'm reading it with : set timefmt "%d-%b-%y" the data goes from 1957 to 1990 but gets parsed incorrectly such that any dates prior to '70 gets plotted as dates upto 2070 This presumably has something to do with unix year zero. However, I don't see why underlying system internals should be exposed to the interpretation of user data. Is this some cunning feature or a bug? TIA, Peter. |
|
From: Jonathan T. <jt...@as...> - 2010-03-27 15:31:15
|
On Sat, 27 Mar 2010, pl...@pi... wrote:
> I have just tried a quick plot of some historical C14 data and got the
> the plot split in two.
>
> the data format is:
>
> 68 5 31 31-May-68 -24.8 560.5 3.9 NZ2206
>
> I'm reading it with :
> set timefmt "%d-%b-%y"
>
>
> the data goes from 1957 to 1990 but gets parsed incorrectly such that
> any dates prior to '70 gets plotted as dates upto 2070
The key is your date "31-May-68". Gnuplot has to guess whether this
means 31 May 1968 or 31 May 2068... and evidently it guessed wrong.
What happens if you change the data format to be unambiguous, i.e. a
date "31-May-1968" with set timefmt "%d-%b-%Y" . Does that parse
correctly?
> This presumably has something to do with unix year zero. However, I
> don't see why underlying system internals should be exposed to the
> interpretation of user data.
>
>
> Is this some cunning feature or a bug?
2-digit years are fundamentally ambiguous, so some (arbitrary) decision
must be made to resolve that ambiguity. I'm not sure how gnuplot implements
this internally, but The X/Open standard documents a function strptime()
whose man page on my computer says
| %y the year within the current century. When a century is not other-
| wise specified, values in the range 69-99 refer to years in the
| twentieth century (1969 to 1999 inclusive); values in the range
| 00-68 refer to years in the twenty-first century (2000 to 2068 in-
| clusive). Leading zeros are permitted but not required.
ciao,
--
-- "Jonathan Thornburg [remove -animal to reply]" <jt...@as...>
Dept of Astronomy, Indiana University, Bloomington, Indiana, USA
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam
|
|
From: <pl...@pi...> - 2010-03-27 18:58:06
|
On 03/27/10 16:31, Jonathan Thornburg wrote:
> On Sat, 27 Mar 2010, pl...@pi... wrote:
>> I have just tried a quick plot of some historical C14 data and got the
>> the plot split in two.
>>
>> the data format is:
>>
>> 68 5 31 31-May-68 -24.8 560.5 3.9 NZ2206
>>
>> I'm reading it with :
>> set timefmt "%d-%b-%y"
>>
>>
>> the data goes from 1957 to 1990 but gets parsed incorrectly such that
>> any dates prior to '70 gets plotted as dates upto 2070
>
> The key is your date "31-May-68". Gnuplot has to guess whether this
> means 31 May 1968 or 31 May 2068... and evidently it guessed wrong.
> What happens if you change the data format to be unambiguous, i.e. a
> date "31-May-1968" with set timefmt "%d-%b-%Y" . Does that parse
> correctly?
>
short of editting the whole text file that is not too practical. Well I
suppose I could start scatching my head trying to make an awk script to
pre-process the data but....
>
>> This presumably has something to do with unix year zero. However, I
>> don't see why underlying system internals should be exposed to the
>> interpretation of user data.
>>
>>
>> Is this some cunning feature or a bug?
>
> 2-digit years are fundamentally ambiguous, so some (arbitrary) decision
> must be made to resolve that ambiguity. I'm not sure how gnuplot implements
> this internally, but The X/Open standard documents a function strptime()
> whose man page on my computer says
> | %y the year within the current century. When a century is not other-
> | wise specified, values in the range 69-99 refer to years in the
> | twentieth century (1969 to 1999 inclusive); values in the range
> | 00-68 refer to years in the twenty-first century (2000 to 2068 in-
> | clusive). Leading zeros are permitted but not required.
>
> ciao,
>
OK, that is certainly the source of what is happening. That seems a
pretty arbitrary and dumb way to define behaviour that for no good
reason is based on the otherwise irrelevant unix year dot. Still that's
the way it is and it's outside of gnuplot. I'm curious as to whether
this works that same on windows which thinks the world was created in
1980, not 1970.
This dependancy on "current century" is a beauty. Any program dependant
on this behaviour would have gone tits-up in y2k roll-over. No wonder
they grounded all aircraft !
However, gnuplot's help timefmt tells me:
Format Explanation
%d day of the month, 1--31
%m month of the year, 1--12
%y year, 0--99
%Y year, 4-digit
This is what I refered to when I hit this issue and it indicated I had
done the right thing. This is wrong and hence a bug. There is no mention
of 0-69 this century : 70-99 next century.
I find this behaviour so inherently stupid that maybe gnuplot should be
coding around it.
This must be a pretty typical situation to hit since the turn of the
century , is there no better solution than pre-parse all my data files
with awk?!
Thanks for pointing out the root of the problem .
best,
Peter.
|
|
From: sfeam (E. Merritt) <eam...@gm...> - 2010-03-27 19:27:11
|
On Saturday 27 March 2010, pl...@pi... wrote:
> On 03/27/10 16:31, Jonathan Thornburg wrote:
> > 2-digit years are fundamentally ambiguous, so some (arbitrary) decision
> > must be made to resolve that ambiguity. I'm not sure how gnuplot implements
> > this internally, but The X/Open standard documents a function strptime()
> > whose man page on my computer says
> > | %y the year within the current century. When a century is not other-
> > | wise specified, values in the range 69-99 refer to years in the
> > | twentieth century (1969 to 1999 inclusive); values in the range
> > | 00-68 refer to years in the twenty-first century (2000 to 2068 in-
> > | clusive). Leading zeros are permitted but not required.
From the gnuplot source file time.c:
case 'y': /* year number */
s = read_int(s, 2, &tm->tm_year);
/* In line with the current UNIX98 specification by
* The Open Group and major Unix vendors,
* two-digit years 69-99 refer to the 20th century, and
* values in the range 00-68 refer to the 21st century.
*/
if (tm->tm_year <= 68)
tm->tm_year += 100;
date++;
tm->tm_year += 1900;
break;
|
|
From: <pl...@pi...> - 2010-03-27 19:47:50
|
On 03/27/10 20:27, sfeam (Ethan Merritt) wrote: > On Saturday 27 March 2010, pl...@pi... wrote: >> On 03/27/10 16:31, Jonathan Thornburg wrote: > >>> 2-digit years are fundamentally ambiguous, so some (arbitrary) decision >>> must be made to resolve that ambiguity. I'm not sure how gnuplot implements >>> this internally, but The X/Open standard documents a function strptime() >>> whose man page on my computer says >>> | %y the year within the current century. When a century is not other- >>> | wise specified, values in the range 69-99 refer to years in the >>> | twentieth century (1969 to 1999 inclusive); values in the range >>> | 00-68 refer to years in the twenty-first century (2000 to 2068 in- >>> | clusive). Leading zeros are permitted but not required. > > From the gnuplot source file time.c: > > case 'y': /* year number */ > s = read_int(s, 2,&tm->tm_year); > /* In line with the current UNIX98 specification by > * The Open Group and major Unix vendors, > * two-digit years 69-99 refer to the 20th century, and > * values in the range 00-68 refer to the 21st century. > */ > if (tm->tm_year<= 68) > tm->tm_year += 100; > date++; > tm->tm_year += 1900; > break; > > > Thanks Ethan, shouldn't the doc as well as the source indicate this oddity? regards. |
|
From: Ethan M. <merritt@u.washington.edu> - 2010-03-27 22:15:01
|
On Saturday 27 March 2010, pl...@pi... wrote: >> /* In line with the current UNIX98 specification by >> * The Open Group and major Unix vendors, >> * two-digit years 69-99 refer to the 20th century, and >> * values in the range 00-68 refer to the 21st century. >> */ >> > OK, that is certainly the source of what is happening. That seems a > pretty arbitrary and dumb way to define behaviour that for no good > reason is based on the otherwise irrelevant unix year dot. Still that's > the way it is and it's outside of gnuplot. I'm curious as to whether > this works that same on windows which thinks the world was created in > 1980, not 1970. Apparently MSWin (or at least Excel) uses a rule that 00-29 is a 21st century date, while 30-99 is a 20th century date. I have no idea what they are expecting to happen in 2030. |
|
From: Jonathan T. <jt...@as...> - 2010-03-27 22:45:48
|
On Sat, 27 Mar 2010, Ethan Merritt quoted from the gnuplot source code:
# /* In line with the current UNIX98 specification by
# * The Open Group and major Unix vendors,
# * two-digit years 69-99 refer to the 20th century, and
# * values in the range 00-68 refer to the 21st century.
# */
On Saturday 27 March 2010, pl...@pi... wrote:
> OK, that is certainly the source of what is happening. That seems a
> pretty arbitrary and dumb way to define behaviour that for no good
> reason is based on the otherwise irrelevant unix year dot.
It's not based on the Unix time epoch -- the gnuplot %y-interpretation
switch is between 31 Dec 1968 and 1 Jan 1969, while the Unix time epoch
is a year later, on 1 Jan 1970.
> Still that's
> the way it is and it's outside of gnuplot. I'm curious as to whether
> this works that same on windows which thinks the world was created in
> 1980, not 1970.
On Sat, 27 Mar 2010, Ethan Merritt wrote:
# Apparently MSWin (or at least Excel) uses a rule that
# 00-29 is a 21st century date, while 30-99 is a 20th century date.
# I have no idea what they are expecting to happen in 2030.
If I write the 2-digit year "50", does it mean 1950 or 2050?
[Let's leave aside the possibility of other centuries.]
In the original poster's case, I gather that 1950 would have been the
correct interpretation. But if I were plotting future values of a
retirement portfolio, 2050 would probably be the better choice.
Fundamentally, there is *no* *way* for gnuplot to know the "correct"
choice, so it has to make an *arbitrary* choice (which will be wrong
a fair number of times).
IMHO gnuplot's current choice is a reasonable one (it's certainly the
only one I've ever seen in other software, albeit I've never used
MS-Excel). So, IMHO this is a feature, not a bug.
But it would be useful to document this feature. Here's a suggested
patch ('diff -u' format) to gnuplot.doc:
*** gnuplot.doc.orig Sun May 17 22:47:35 2009
--- gnuplot.doc Sat Mar 27 18:42:43 2010
***************
*** 9687,9696 ****
--- 9687,9700 ----
it can still be printed with the "%a", "%A", "%b", or "%B" specifier:
see `set format` for more details about these and other options for printing
timedata. (`gnuplot` will determine the proper month and weekday from the
numerical values.)
+ In line with the current UNIX98 specification by The Open Group and major
+ Unix vendors, when reading two-digit years with %y, values 69-99 refer to
+ the 20th century, while values 00-68 refer to the 21st century.
+
See also `set xdata` and `Time/date` for more information.
Example:
set timefmt "%d/%m/%Y\t%H:%M"
tells `gnuplot` to read date and time separated by tab. (But look closely at
--
-- "Jonathan Thornburg [remove -animal to reply]" <jt...@as...>
Dept of Astronomy, Indiana University, Bloomington, Indiana, USA
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam
|
|
From: <pl...@pi...> - 2010-03-27 23:54:07
|
On 03/27/10 23:45, Jonathan Thornburg wrote:
> On Sat, 27 Mar 2010, Ethan Merritt quoted from the gnuplot source code:
> # /* In line with the current UNIX98 specification by
> # * The Open Group and major Unix vendors,
> # * two-digit years 69-99 refer to the 20th century, and
> # * values in the range 00-68 refer to the 21st century.
> # */
>
> On Saturday 27 March 2010, pl...@pi... wrote:
>> OK, that is certainly the source of what is happening. That seems a
>> pretty arbitrary and dumb way to define behaviour that for no good
>> reason is based on the otherwise irrelevant unix year dot.
>
> It's not based on the Unix time epoch -- the gnuplot %y-interpretation
> switch is between 31 Dec 1968 and 1 Jan 1969, while the Unix time epoch
> is a year later, on 1 Jan 1970.
>
>
>> Still that's
>> the way it is and it's outside of gnuplot. I'm curious as to whether
>> this works that same on windows which thinks the world was created in
>> 1980, not 1970.
>
> On Sat, 27 Mar 2010, Ethan Merritt wrote:
> # Apparently MSWin (or at least Excel) uses a rule that
> # 00-29 is a 21st century date, while 30-99 is a 20th century date.
> # I have no idea what they are expecting to happen in 2030.
>
>
> If I write the 2-digit year "50", does it mean 1950 or 2050?
> [Let's leave aside the possibility of other centuries.]
> In the original poster's case, I gather that 1950 would have been the
> correct interpretation. But if I were plotting future values of a
> retirement portfolio, 2050 would probably be the better choice.
> Fundamentally, there is *no* *way* for gnuplot to know the "correct"
> choice, so it has to make an *arbitrary* choice (which will be wrong
> a fair number of times).
That is correct , so maybe that defines the need for a way to specify
the correct breakpoint.
As an arbitrary rule the MS choice probably gives the most useful range
(for the foreseeable future) since it allows data back to 1931 - 2030 on
two digit format.
It seems a bit mad that with data all within a 50 year period in the
same century I can't deal with this without 'awk'ward hacking of the
data files.
This isn't a bug as such but a better mechanism could be provided. My
previous comment was that this was a documentation bug , so thanks for
providing a patch to clarify the help.
.
I should be pretty simple to provide a variable to over-ride the
rollover date for this sort of case.
CENTURY_ROLLOVER=1969
This would mimic current behaviour and provide backwards compatability.
Setting such a variable could then cope with any data set that spans
less than 100 years, in the two digit format.
The current functionality can only handle this if all data is pre 1969
or post 1969. This being far from a typical situation it would seem to
be bit restrictive.
Does that seem like a useful solution to a valid short-coming?
Regards. Peter.
>
> IMHO gnuplot's current choice is a reasonable one (it's certainly the
> only one I've ever seen in other software, albeit I've never used
> MS-Excel). So, IMHO this is a feature, not a bug.
>
> But it would be useful to document this feature. Here's a suggested
> patch ('diff -u' format) to gnuplot.doc:
>
> *** gnuplot.doc.orig Sun May 17 22:47:35 2009
> --- gnuplot.doc Sat Mar 27 18:42:43 2010
> ***************
> *** 9687,9696 ****
> --- 9687,9700 ----
> it can still be printed with the "%a", "%A", "%b", or "%B" specifier:
> see `set format` for more details about these and other options for printing
> timedata. (`gnuplot` will determine the proper month and weekday from the
> numerical values.)
>
> + In line with the current UNIX98 specification by The Open Group and major
> + Unix vendors, when reading two-digit years with %y, values 69-99 refer to
> + the 20th century, while values 00-68 refer to the 21st century.
> +
> See also `set xdata` and `Time/date` for more information.
>
> Example:
> set timefmt "%d/%m/%Y\t%H:%M"
> tells `gnuplot` to read date and time separated by tab. (But look closely at
>
>
|
|
From: Ethan M. <merritt@u.washington.edu> - 2010-03-28 00:18:23
|
On Saturday 27 March 2010, Jonathan Thornburg wrote: > If I write the 2-digit year "50", does it mean 1950 or 2050? > [Let's leave aside the possibility of other centuries.] > In the original poster's case, I gather that 1950 would have been the > correct interpretation. But if I were plotting future values of a > retirement portfolio, 2050 would probably be the better choice. > Fundamentally, there is no way for gnuplot to know the "correct" > choice, so it has to make an arbitrary choice (which will be wrong > a fair number of times). Indeed. And if you are plotting a timeline of Roman emperors, "68" really does mean 68, the year of Nero's death. There is no end to this quagmire. The behaviour of a program may even depend on which library version it is linked against. Consider this lovely bit from the man page for strptime: The 'y' (year in century) specification is taken to specify a year in the 20th century by libc4 and libc5. It is taken to be a year in the range 1950-2049 by glibc 2.0. It is taken to be a year in 1969-2068 since glibc 2.1. So yes, I'll add a note in the documentation. But better by far not to use two-digit dates. Ethan |
|
From: <pl...@pi...> - 2010-03-28 00:38:40
|
On 03/28/10 01:18, Ethan Merritt wrote: > On Saturday 27 March 2010, Jonathan Thornburg wrote: >> If I write the 2-digit year "50", does it mean 1950 or 2050? >> [Let's leave aside the possibility of other centuries.] >> In the original poster's case, I gather that 1950 would have been the >> correct interpretation. But if I were plotting future values of a >> retirement portfolio, 2050 would probably be the better choice. >> Fundamentally, there is no way for gnuplot to know the "correct" >> choice, so it has to make an arbitrary choice (which will be wrong >> a fair number of times). > > Indeed. > And if you are plotting a timeline of Roman emperors, "68" really > does mean 68, the year of Nero's death. > > There is no end to this quagmire. The behaviour of a program may even > depend on which library version it is linked against. Consider this > lovely bit from the man page for strptime: > The 'y' (year in century) specification is taken to specify a year in > the 20th century by libc4 and libc5. It is taken to be a year in the > range 1950-2049 by glibc 2.0. It is taken to be a year in 1969-2068 > since glibc 2.1. > Argh , what a mess. Can someone explain the meaning of backwards compatibile to gnu.org ?! In view of the ugly truth would it be better to use , or make available , a different way to parse this data rather than relying on the moving sand algorithm ? In order to get stable behaviour this type of data could be read in as a string and parsed by gnuplot rather than strptime. I recall quite a while back you commented this area was a bit hairy , I'm starting to see what your were refering to. > So yes, I'll add a note in the documentation. But better by far not to > use two-digit dates. I agree, but I did not design the data format . I suppose awk may be the short answer , though it would be good to have a more stable implementation that would give some insurance against the next arbitrary change to glibc et al. regards. > > Ethan > > |
|
From: <pl...@pi...> - 2010-03-28 00:03:22
|
Hi, I have an additional problem/suggestion relating to a similar set of data F-00238 378 700323-700330 511.0 -28.8 516.6 11 the date (range) is contained in $3 this one being 23rd March 1970 - 30th March 1970 Unless I have missed a trick it does not seem possible to read this in with gnuplot (more awking required?) What would be useful is a dead text specifier. The useful data matches %y%m%d but I can't tell it to ignore the rest. A new specifier like %i to ignore the rest of the string would be useful (and easy to add ). "%y%m%d-%i" Have I missed an easy way to handle this? TIA, Peter. |
|
From: <pl...@pi...> - 2010-03-28 00:27:51
|
On 03/28/10 01:03, pl...@pi... wrote: > Hi, > > I have an additional problem/suggestion relating to a similar set of data > > F-00238 378 700323-700330 511.0 -28.8 516.6 11 > > the date (range) is contained in $3 this one being 23rd March 1970 - > 30th March 1970 > > Unless I have missed a trick it does not seem possible to read this in > with gnuplot (more awking required?) > > What would be useful is a dead text specifier. The useful data matches > %y%m%d but I can't tell it to ignore the rest. > > A new specifier like %i to ignore the rest of the string would be useful > (and easy to add ). > > "%y%m%d-%i" > > Have I missed an easy way to handle this? > > TIA, Peter. > Oops, the error was not what I thought . This is readable. Sorry for the noise. /P/ |
|
From: Ethan M. <merritt@u.washington.edu> - 2010-03-28 00:35:36
|
On Saturday 27 March 2010, pl...@pi... wrote: > Hi, > > I have an additional problem/suggestion relating to a similar set of data > > F-00238 378 700323-700330 511.0 -28.8 516.6 11 > > the date (range) is contained in $3 this one being 23rd March 1970 - > 30th March 1970 > > Unless I have missed a trick it does not seem possible to read this in > with gnuplot (more awking required?) I have never figured out how to use gnuplot's builtin time/date routines effectively. I suggest you read in the relevant field of the input file as a string or a pure integer and do the date formatting yourself using strptime()/strftime(). > What would be useful is a dead text specifier. The useful data matches > %y%m%d but I can't tell it to ignore the rest. > > A new specifier like %i to ignore the rest of the string would be useful > (and easy to add ). > > "%y%m%d-%i" > > Have I missed an easy way to handle this? See above. Read it as a string and chop it into the pieces you need. mystartdate(i) = strcol(i)[5:6]."-".strcol(i)[3:4]."-19".strcol(i)[1:2] plot foo using (mystartdate(3)): .... will turn your 700323-700330 into "23-03-1970", which you can now do with whatever you like. Of course this particular function forces all your dates into C20. You'd need a more complicated function to match a different convention of input dates. Ethan Ethan |
|
From: Tait <gnu...@t4...> - 2010-03-29 09:10:26
|
> I have just tried a quick plot of some historical C14 data and got the > the plot split in two. > ... > I'm reading it with : > set timefmt "%d-%b-%y" > ... > This presumably has something to do with unix year zero. ... Perhaps the most sensible thing to do -- both for gnuplot and for Peter -- is to simply change the format to %Y. Gnuplot will, I assume, interpret this as the literal year 68 (as in the first century), and the user can add 1900 years (or 2000 years, or 1800, or whatever) in the using statement. For data sets spanning a century division, a (condition? true-cond : false-cond) will work in the using statement to make the division. Tait |
|
From: wino <wi...@pi...> - 2010-03-29 16:41:20
|
On 03/29/10 11:10, Tait wrote: >> I have just tried a quick plot of some historical C14 data and got the >> the plot split in two. >> ... >> I'm reading it with : >> set timefmt "%d-%b-%y" >> ... >> This presumably has something to do with unix year zero. ... > > Perhaps the most sensible thing to do -- both for gnuplot and for > Peter -- is to simply change the format to %Y. Gnuplot will, I assume, > interpret this as the literal year 68 (as in the first century), > and the user can add 1900 years (or 2000 years, or 1800, or whatever) > in the using statement. For data sets spanning a century division, a > (condition? true-cond : false-cond) will work in the using statement to > make the division. > > Tait > > Excellent idea , I'll give it a try. ;) In the context maybe just letting gnuplot label the axis in two digits would be fine. Thanks for the bright idea. Peter. |