Fiddling around with bins, I was able to generate COVID number of cases by the calendar week (at least that is what i think), however, reading a little bit about epidemiology, an epidemiological week is different from a regular calendar week. How difficult could it be to get gnuplot to understand about epidemiological weeks or epiweeks? For example, this year there are 53 epidemiological weeks, and November has 5 epidemiological weeks. Most likely, doing some creative coding gnuplot can already do this, but perhaps a specific mode/option for this purpose is viable and useful?
Thanks.
The cleaned up "DB of covid cases" is two columns where each positive case is a line with the symptoms date and a 1.
2020-12-13 1
2020-12-14 1
2020-12-14 1
2020-12-14 1
Main Code used to generate the graph
plot ["\"2019-12-31\"":"\"2021-01-01\""][0:*] \ "< /usr/bin/gawk -F',' '$36 ~ /[1238]/ && $8 ~ /32/ {print $12, 1}'\ /mnt/home/Downloads/covid-mx/BDSS/20201223-COVIDMEXICO.csv" u 1:2 bins=52 binrange ["2020-01-01":"2020-12-31"] with boxes
gnuplot already supports an "epiweek": the ISO standard one. Just because some countries choose to ignore international standards doesn't mean everybody else should.
I am moving this to Bugs because whatever it is that gnuplot is doing right now is clearly wrong for both formats %U and %W. The documentation says
Neither format is working as documented. Here is the output from a test script:
Both %U and %W report an incorrect week number for the first few days of 2021.
Notes
The time format specs and the functions tm_wday() and friends were introduced into gnuplot somewhere between version 3.5 (1993) and 3.7.1 (1999). So far as I can tell the current strange behaviour has been there ever since.
The ISO 8601 standard week starts on a Monday and counts "week 1" as the week that contains 4 January. Any days before the first Monday are counted as falling in the final week of the previous year, which would give the first three days of 2021 a week number of 53. I would be OK with fixing gnuplot's %W either to follow the ISO 8601 standard and report "week 53" for these days or follow what seems to be the original intent of the gnuplot code and report "week 0". But a non-monotonic output is clearly wrong.
The ISO standard week starts on Monday. The CDC "EPI week" starts on Sunday. That may have been the intent of gnuplot's %U format but the CDC standard does not put 9 days in the first week of 2021.
The ISO C time structure defined in
<struct_tm.h>
reports day-of-week as [0..6] to represent [Sun..Sat]. So ISO C and ISO 8601 disagree about which day is the start of the week. That may or may not explain the confusion in the original gnuplot code.Ticket moved from /p/gnuplot/feature-requests/519/
Can't be converted:
The following suggestion should do the fix... well, in gnuplot code, not in C-code ...
Maybe there are shorter and smarter solutions...
Code:
Result:
The final days of the year are another corner case, not handled well by either the current gnuplot code or your algorithm above. Consider for example the 2007->2008 transition. According to the ISO standard Monday 31.12.2007 is in week 1 of 2008 (2008-W01).
Anyhow, we need an internal fix for the code in time.c, not a user-level script. Code welcome!
I have a preliminary fix but am currently missing a simple but 100% correct way to determine if the previous year has 52 or 53 weeks.
Oops, a bug in the bug fix. Here is a version which should be correct now also for 2007/2008.
A year has 53 weeks if either Jan 01 or Dec 31 is a Thursday.
Sorry, I cannot help with C-code.
Edit: made the gnuplot-code a bit "nicer".
Code:
Result:
Last edit: theozh 2021-01-05
A fix for %W is now in the development branch.
It adds a user-callable function tm_week(time) that returns the week number in accord with the ISO 8601 "week date" standard. The same new underlying routine is used to re-implement %W.
TODO:
1) It would be nice to have a unit test / demo that exercises %W across various corner case year transitions.
2) It would also be nice to have a demo script showing how to construct or interpret week dates in the ISO system. This is tricky to do with the current input formats. For example, to generate the calendar date 30 Dec 2008 from the ISO week date "2009-W01-2" requires changing the year.
3) The %U format is similarly broken but there is no documentation of what output is expected other than it uses a week starting with Sunday rather than Monday. We could deprecate it, or we could bring it into accord with the CDC MMWR "epi week" as in the original feature request.
Thank you for pointing out problems with gnuplot existing %U and %W formats and the difficulty of plotting data for which the date is encoding in a "week date" convention rather than a calendar date.
Fixes for the formats and new time functions to support plotting epi week data are now in the development version.
Demo here: epi_data.dem
Version 5.5 documentation updated: Gnuplot_5_5.pdf
Hi,
One small question, what does the w [ ] mean?
I can infer that
w[1:4]
andw[6:7
extract the date characters from the date column, however, is this an array or a function? I failed to find the documentation for it.Thanks for fixing the weeks parameters.
Last edit: Roge 2021-01-11
It is neither an array nor a function. [] is the substring operator.
For string s="ABCDE"
s[1:1] is "A"
s[1:3] is "ABC"
s[4:5] is "DE"
s[3:] is "CDE"
and so on.
The function
strcol(N)
returns the content of column N as a string. Since it returns a string, you can apply the substring operator to it.