Menu

#2390 Time formats %W %U broken (was: Request to consider Epidemiological Weeks - epiweeks)

None
closed-fixed
nobody
2021-06-02
2020-12-24
Roge
No

Fiddling around with bins, I was able to generate COVID number of cases by the calendar week (at least that is what i think), however, reading a little bit about epidemiology, an epidemiological week is different from a regular calendar week. How difficult could it be to get gnuplot to understand about epidemiological weeks or epiweeks? For example, this year there are 53 epidemiological weeks, and November has 5 epidemiological weeks. Most likely, doing some creative coding gnuplot can already do this, but perhaps a specific mode/option for this purpose is viable and useful?

Thanks.

The cleaned up "DB of covid cases" is two columns where each positive case is a line with the symptoms date and a 1.

2020-12-13 1
2020-12-14 1
2020-12-14 1
2020-12-14 1

Main Code used to generate the graph

 plot ["\"2019-12-31\"":"\"2021-01-01\""][0:*] \
 "< /usr/bin/gawk -F',' '$36 ~ /[1238]/ && $8 ~ /32/ {print $12, 1}'\
/mnt/home/Downloads/covid-mx/BDSS/20201223-COVIDMEXICO.csv" u 1:2 bins=52 binrange ["2020-01-01":"2020-12-31"] with boxes
1 Attachments

Discussion

  • Hans-Bernhard Broeker

    gnuplot already supports an "epiweek": the ISO standard one. Just because some countries choose to ignore international standards doesn't mean everybody else should.

     
  • Ethan Merritt

    Ethan Merritt - 2021-01-03

    I am moving this to Bugs because whatever it is that gnuplot is doing right now is clearly wrong for both formats %U and %W. The documentation says

           %U       week of the year (week starts on Sunday)
           %W       week of the year (week starts on Monday) (ignored on input)
    

    Neither format is working as documented. Here is the output from a test script:

          date   %a  %w  %d   %j  %W  %U
    ====================================
    27.12.2020  Sun  00  27  362  52  53
    28.12.2020  Mon  01  28  363  53  53
    29.12.2020  Tue  02  29  364  53  53
    30.12.2020  Wed  03  30  365  53  53
    31.12.2020  Thu  04  31  366  53  53
    01.01.2021  Fri  05  01  001  01  01   ?!
    02.01.2021  Sat  06  02  002  01  01   ?!
    03.01.2021  Sun  00  03  003  00  01   ?!
    04.01.2021  Mon  01  04  004  01  01
    05.01.2021  Tue  02  05  005  01  01
    06.01.2021  Wed  03  06  006  01  01
    07.01.2021  Thu  04  07  007  01  01
    08.01.2021  Fri  05  08  008  01  01
    09.01.2021  Sat  06  09  009  01  01
    10.01.2021  Sun  00  10  010  01  02
    11.01.2021  Mon  01  11  011  02  02
    12.01.2021  Tue  02  12  012  02  02
    13.01.2021  Wed  03  13  013  02  02
    14.01.2021  Thu  04  14  014  02  02
    15.01.2021  Fri  05  15  015  02  02
    

    Both %U and %W report an incorrect week number for the first few days of 2021.

    Notes

    The time format specs and the functions tm_wday() and friends were introduced into gnuplot somewhere between version 3.5 (1993) and 3.7.1 (1999). So far as I can tell the current strange behaviour has been there ever since.

    The ISO 8601 standard week starts on a Monday and counts "week 1" as the week that contains 4 January. Any days before the first Monday are counted as falling in the final week of the previous year, which would give the first three days of 2021 a week number of 53. I would be OK with fixing gnuplot's %W either to follow the ISO 8601 standard and report "week 53" for these days or follow what seems to be the original intent of the gnuplot code and report "week 0". But a non-monotonic output is clearly wrong.

    The ISO standard week starts on Monday. The CDC "EPI week" starts on Sunday. That may have been the intent of gnuplot's %U format but the CDC standard does not put 9 days in the first week of 2021.

    The ISO C time structure defined in <struct_tm.h> reports day-of-week as [0..6] to represent [Sun..Sat]. So ISO C and ISO 8601 disagree about which day is the start of the week. That may or may not explain the confusion in the original gnuplot code.

     
  • Ethan Merritt

    Ethan Merritt - 2021-01-03

    Ticket moved from /p/gnuplot/feature-requests/519/

    Can't be converted:

    • _milestone:
    • _priority: 5
     
  • theozh

    theozh - 2021-01-04

    The following suggestion should do the fix... well, in gnuplot code, not in C-code ...
    Maybe there are shorter and smarter solutions...

    Code:

    ### Bug fix week number ISO
    reset session
    
    dow(t)   = int(tm_wday(t)) ? tm_wday(t) : 7              # day of week 1=Mon, ..., 7=Sun
    week(t)  = int((11 + tm_yday(t) - dow(t))/7)             # "raw"week of year
    jan01(t) = tm_wday(strptime("%Y",strftime("%Y",t)))                 # dow of Jan 1st
    dec31(t) = tm_wday(strptime("%d.%m.%Y","31.12.".strftime("%Y",t)))  # dow of Dec 31
    wpy(t)   = 52 + ((jan01(t)==4 || dec31(t)==4) ? 1 : 0)   # weeks per year
    woy(t)   = week(t) < 1 ? wpy(tm_year(t)-1) : \
               week(t) > wpy(tm_year(t)) ? 1 : week(t)       # week of year
    yow(t)   = int(week(t) < 1 ? tm_year(t)-1 : \
               week(t) > wpy(tm_year(t)) ? \
               tm_year(t)+1 : tm_year(t))  # year of week (could be previous, current or next)
    
    StartDate = "24.12.2020"
    myTimeFmt = "%d.%m.%Y"
    SecondsPerDay = 3600*24
    
    print "      date   %a DoW  %d   %j   YoW WoY"
    print "======================================"
    do for [i=0:20] {
        t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
        myDate = strftime(myTimeFmt."  %a", t)
        myDate2 = strftime("%d  %j", t)
        print sprintf("%s  %02d  %s  %04d-W%02d", myDate, dow(t), myDate2, yow(t), woy(t))
    }
    ### end of code
    

    Result:

          date   %a DoW  %d   %j   YoW WoY
    ======================================
    24.12.2020  Thu  04  24  359  2020-W52
    25.12.2020  Fri  05  25  360  2020-W52
    26.12.2020  Sat  06  26  361  2020-W52
    27.12.2020  Sun  07  27  362  2020-W52
    28.12.2020  Mon  01  28  363  2020-W53
    29.12.2020  Tue  02  29  364  2020-W53
    30.12.2020  Wed  03  30  365  2020-W53
    31.12.2020  Thu  04  31  366  2020-W53
    01.01.2021  Fri  05  01  001  2020-W53
    02.01.2021  Sat  06  02  002  2020-W53
    03.01.2021  Sun  07  03  003  2020-W53
    04.01.2021  Mon  01  04  004  2021-W01
    05.01.2021  Tue  02  05  005  2021-W01
    06.01.2021  Wed  03  06  006  2021-W01
    07.01.2021  Thu  04  07  007  2021-W01
    08.01.2021  Fri  05  08  008  2021-W01
    09.01.2021  Sat  06  09  009  2021-W01
    10.01.2021  Sun  07  10  010  2021-W01
    11.01.2021  Mon  01  11  011  2021-W02
    12.01.2021  Tue  02  12  012  2021-W02
    13.01.2021  Wed  03  13  013  2021-W02
    
     
    • Ethan Merritt

      Ethan Merritt - 2021-01-04

      The final days of the year are another corner case, not handled well by either the current gnuplot code or your algorithm above. Consider for example the 2007->2008 transition. According to the ISO standard Monday 31.12.2007 is in week 1 of 2008 (2008-W01).

      Anyhow, we need an internal fix for the code in time.c, not a user-level script. Code welcome!
      I have a preliminary fix but am currently missing a simple but 100% correct way to determine if the previous year has 52 or 53 weeks.

       
  • theozh

    theozh - 2021-01-04

    Oops, a bug in the bug fix. Here is a version which should be correct now also for 2007/2008.

    A year has 53 weeks if either Jan 01 or Dec 31 is a Thursday.

    Sorry, I cannot help with C-code.

    Edit: made the gnuplot-code a bit "nicer".

    Code:

    ### Bug fix V1.2 for week number according to ISO 8601
    reset session
    
    dow(t)      = int(tm_wday(t)) ? tm_wday(t) : 7                 # day of week 1=Mon, ..., 7=Sun
    week(t)     = int((11 + tm_yday(t) - dow(t))/7)                # "raw"week of year
    wday(d,m,y) = tm_wday(strptime("%d.%m.%Y",sprintf("%02d.%02d.%04d",d,m,y)))  # week day of certain date
    wpy(y)      = wday(1,1,y)==4 || wday(31,12,y)==4 ? 53 : 52     # weeks per year
    woy(t)      = week(t) < 1 ? wpy(tm_year(t)-1) : \
                  week(t) > wpy(tm_year(t)) ? 1 : week(t)                   # week of year
    yow(t)      = int(week(t) < 1 ? tm_year(t)-1 : week(t) > wpy(tm_year(t)) ? \
                  tm_year(t)+1 : tm_year(t))          # year of week (could be previous, current or next)
    
    StartDate = "24.12.2007"
    myTimeFmt = "%d.%m.%Y"
    SecondsPerDay = 3600*24
    
    print "      date   %a DoW  %d   %j   YoW WoY"
    print "======================================"
    do for [i=0:20] {
        t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
        myDate = strftime(myTimeFmt."  %a", t)
        myDate2 = strftime("%d  %j", t)
        print sprintf("%s  %02d  %s  %04d-W%02d", myDate, dow(t), myDate2, yow(t), woy(t))
    }
    ### end of code
    

    Result:

          date   %a DoW  %d   %j   YoW WoY
    ======================================
    24.12.2007  Mon  01  24  358  2007-W52
    25.12.2007  Tue  02  25  359  2007-W52
    26.12.2007  Wed  03  26  360  2007-W52
    27.12.2007  Thu  04  27  361  2007-W52
    28.12.2007  Fri  05  28  362  2007-W52
    29.12.2007  Sat  06  29  363  2007-W52
    30.12.2007  Sun  07  30  364  2007-W52
    31.12.2007  Mon  01  31  365  2008-W01
    01.01.2008  Tue  02  01  001  2008-W01
    02.01.2008  Wed  03  02  002  2008-W01
    03.01.2008  Thu  04  03  003  2008-W01
    04.01.2008  Fri  05  04  004  2008-W01
    05.01.2008  Sat  06  05  005  2008-W01
    06.01.2008  Sun  07  06  006  2008-W01
    07.01.2008  Mon  01  07  007  2008-W02
    08.01.2008  Tue  02  08  008  2008-W02
    09.01.2008  Wed  03  09  009  2008-W02
    10.01.2008  Thu  04  10  010  2008-W02
    11.01.2008  Fri  05  11  011  2008-W02
    12.01.2008  Sat  06  12  012  2008-W02
    13.01.2008  Sun  07  13  013  2008-W02
    
     

    Last edit: theozh 2021-01-05
  • Ethan Merritt

    Ethan Merritt - 2021-01-05
    • summary: Request to consider Epidemiological Weeks - epiweeks --> Time formats %W %U broken (was: Request to consider Epidemiological Weeks - epiweeks)
    • Priority: 5 -->
     
  • Ethan Merritt

    Ethan Merritt - 2021-01-05

    A fix for %W is now in the development branch.
    It adds a user-callable function tm_week(time) that returns the week number in accord with the ISO 8601 "week date" standard. The same new underlying routine is used to re-implement %W.

    TODO:

    1) It would be nice to have a unit test / demo that exercises %W across various corner case year transitions.

    2) It would also be nice to have a demo script showing how to construct or interpret week dates in the ISO system. This is tricky to do with the current input formats. For example, to generate the calendar date 30 Dec 2008 from the ISO week date "2009-W01-2" requires changing the year.

    3) The %U format is similarly broken but there is no documentation of what output is expected other than it uses a week starting with Sunday rather than Monday. We could deprecate it, or we could bring it into accord with the CDC MMWR "epi week" as in the original feature request.

     
  • Ethan Merritt

    Ethan Merritt - 2021-01-09

    Thank you for pointing out problems with gnuplot existing %U and %W formats and the difficulty of plotting data for which the date is encoding in a "week date" convention rather than a calendar date.

    Fixes for the formats and new time functions to support plotting epi week data are now in the development version.

    Demo here: epi_data.dem

    Version 5.5 documentation updated: Gnuplot_5_5.pdf

    gnuplot> help tm_week

    The tm_week(t, standard) function interprets its first argument t as a time
    in seconds from 1 Jan 1970. Despite the name of this function it does not
    report a field from the POSIX tm structure.

    If standard = 0 it returns the week number in the ISO 8601 "week date" system.
    This corresponds to gnuplot's %W time format.
    If standard = 1 it returns the CDC epidemiological week number ("epi week").
    This corresponds to gnuplot's %U time format.
    For corresponding inverse functions that convert week dates to calendar time
    see weekdate_iso, weekdate_cdc.

    In brief, ISO Week 1 of year YYYY begins on the Monday closest to 1 Jan YYYY.
    This may place it in the previous calendar year. For example Tue 30 Dec 2008
    has ISO week date 2009-W01-2 (2nd day of week 1 of 2009). Up to three days
    at the start of January may come before the Monday of ISO week 1;
    these days are assigned to the final week of the previous calendar year.
    E.g. Fri 1 Jan 2021 has ISO week date 2020-W53-05.

    The US Center for Disease Control (CDC) epidemiological week is a similar
    week date convention that differs from the ISO standard by defining a week as
    starting on Sunday, rather than on Monday.

    gnuplot> help weekdate_iso

    Syntax:
    time = weekdate_iso( year, week [, day] )

    This function converts from the year, week, day components of a date in
    ISO 8601 "week date" format to the calendar date as a time in seconds since
    the epoch date 1 Jan 1970. Note that the nominal year in the week date
    system is not necessarily the same as the calendar year. The week is an
    integer from 1 to 53. The day parameter is optional. If it is omitted
    or equal to 0 the time returned is the start of the week. Otherwise day
    is an integer from 1 (Monday) to 2 (Sunday).
    See tm_week for additional information on an inverse function that converts
    from calendar date to week number in the ISO standard convention.

    Example:
    # Plot data from a file with column 1 containing ISO weeks
    # Week cases deaths
    # 2020-05 432 1
    calendar_date(w) = weekdate_iso( int(w[1:4]), int(w[6:7]) )
    set xtics time format "%b\n%Y"
    plot FILE using (calendar_date(strcol(1))) : 2 title columnhead

     
  • Roge

    Roge - 2021-01-11

    Hi,

    One small question, what does the w [ ] mean?
    I can infer that w[1:4] and w[6:7 extract the date characters from the date column, however, is this an array or a function? I failed to find the documentation for it.

    Thanks for fixing the weeks parameters.

     

    Last edit: Roge 2021-01-11
    • Ethan Merritt

      Ethan Merritt - 2021-01-11

      It is neither an array nor a function. [] is the substring operator.

      For string s="ABCDE"
      s[1:1] is "A"
      s[1:3] is "ABC"
      s[4:5] is "DE"
      s[3:] is "CDE"

      and so on.

      The function strcol(N) returns the content of column N as a string. Since it returns a string, you can apply the substring operator to it.

       
  • Ethan Merritt

    Ethan Merritt - 2021-01-21
    • status: open --> pending
     
  • Ethan Merritt

    Ethan Merritt - 2021-04-09
    • status: pending --> pending-fixed
     
  • Ethan Merritt

    Ethan Merritt - 2021-06-02
    • Status: pending-fixed --> closed-fixed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.