From: Hans-Bernhard B. <br...@ph...> - 2004-06-10 20:52:30
|
On Wed, 9 Jun 2004, Ethan Merritt wrote: > Here is the entirety of the change at issue: > /* > * EAM - Oct 2002 Distinguish between DF_MISSING and DF_BAD. > * Previous versions would never notify caller of either case. > * Now missing data will be noted. Bad data should arguably be > * noted also, but that would change existing default behavior. > */ > else if ((column <= df_no_cols) && (df_column[column - 1].good == DF_MISSING)) > return DF_UNDEFINED; > > The comment was correct. Without this change the code was IMHO broken, > and worked as it did only by accident. It did match the documentation, though, which I don't think was an accident. > With the change, the missing data is explicitly reported as such by > the code in datafile.c. Not quite. df_tokenize() reports DF_MISSING, but df_readline now translates that into DF_UNDEFINED instead of 'no data'. The effect is that of an empty line, not a missing line, in the case of 'with lines'. > As I recall, the reason I noticed the original problem was that with the > version 3.7 code it was not possible to guarantee reasonable behavior > from missing data for box plots or candlesticks. I don't know that I can > reproduce an example of the failure quickly, but I can try if needed. Please do. > The other problem was that you could not tell gnuplot that '0' or 'NaN' > or some other numerically-parsable flag indicated missing data. NaN, if scanf() supports it at all will quite certainly do that in 4.0, because there's now an additional sanity check between datafile reading and datapoints being used. And if you want to use '0' for that, I don't agree 'set missing' is the way to do that. That's what we have using 1:($3==0?0/0:$3) or similar constructs for. > This was a problem for tabular data that used 0 as a place-holder. Such tabular data is IMHO broken by design. > > So: which do we change: the code or the docs? > > I think the docs should be changed. > should be modified so that missing data does not produce > breaks in the line. As long as missing data output DF_UNDEFINED from datafile.c, it must. Otherwise we'ld be changing the behaviour of piecewise functions in a seriously broken manner, e.g. plot [-10:10] abs(x)>5?abs(x):0/0 w l would suddenly have a connection between points (-5,-5) and (5,5), which we quite definitely don't want. -- Hans-Bernhard Broeker (br...@ph...) Even if all the snow were burnt, ashes would remain. |
From: Hans-Bernhard B. <br...@ph...> - 2004-06-13 13:20:39
|
On Sat, 12 Jun 2004, Ethan A Merritt wrote: > I think the documentation is wrong, and always was wrong, because > the behavior it describes has nothing to do with whether the data field > contains the "missing" flag or not. It simply describes what happens if the > field is *illegal*, which to my mind is a different thing from being > "missing". Intent is hard to be certain about, this far into the project's lifetime... > By contrast to version 3.7, the test for missing data in 4.0 actually > does something useful. I'm not convinced that this is true. Note that there are a total of about 12 distinct cases to consider: the data entry in question can be a number, unparseable garbage, something contained in quotes, or the marker defined by 'set misssing', and the plot can have no using specification at all, a simple one ("using 1:2" style) or an extended one ("using ($1):($2)") for the column in question. In version 3.7, it seems the main the effect of 'set missing' was to change the behaviour in the case of non-numeric input in the face of simple using specs. Without it, they would would be treated as undefined data (--> break in 'with lines' plots), but if they matched a defined 'set missing' string, they would treat them like in the case of no using specification: they'ld ignore it (--> no break, because there's no trace left of that entire data point). We may have over-done the matter in the 3.8 series, rendering 'set missing' useless, or at least giving it an entirely different meaning than it used to have. E.g., the whole idea that quoted entries in datafile get special treatment is new. > As the documentation says, the first plot will incorrectly draw > a line through (2,3) because the 2nd field is an illegal numerical > value. With "*" set as a missing data flag, however, the second > plot is drawn correctly. To be precise, it's drawn in one of two possible ways that could be called "correct" here: it has a break in it, because the datapoint was kept in the internal lists, but flagged as DF_UNDEFINED. > This didn't used to happen. For the case of no using spec at all, it didn't. I suspect 'set missing' simply never had any effect on such plots, in 3.7. So arguing over their behaviour may be pointless. The compatibility we should worry about is what happens in those case where 3.7 did behave at least marginally sensibly, i.e. the 'using 1:2' and 'using ($1):($2)' ones. > > > I am open to the suggestion that 'plot with lines' in particular > > > should be modified so that missing data does not produce > > > breaks in the line. > > > > As long as missing data output DF_UNDEFINED from datafile.c, it must. > > Otherwise we'ld be changing the behaviour of piecewise functions > > in a seriously broken manner, e.g. > > > > plot [-10:10] abs(x)>5?abs(x):0/0 w l > > > > would suddenly have a connection between points (-5,-5) and (5,5), which > > we quite definitely don't want. > > I don't quite follow you here. There was no such connection before the > change (in 3.7) and no such connection after the change (in 4.0). Exactly. But as of your October 2002 change, flagged "missing" input ends up as datapoint with DF_UNDEFINED in the internal point lists, just like the points on the gap of the above function do. So a change to let 'plot with lines' continue across DF_UNDEFINED points would change the function plot's behaviour, and we don't want that. > I have just modified the code in cvs so that df_readline() passes the > DF_MISSING up to the callers. I have serious doubts about that being the right idea, but don't have the time right now to investigate it fully, sorry. -- Hans-Bernhard Broeker (br...@ph...) Even if all the snow were burnt, ashes would remain. |
From: Ethan A M. <merritt@u.washington.edu> - 2004-06-13 01:22:47
|
On Thursday 10 June 2004 01:50 pm, Hans-Bernhard Broeker wrote: I think the documentation is wrong, and always was wrong, because the behavior it describes has nothing to do with whether the data field contains the "missing" flag or not. It simply describes what happens if the field is *illegal*, which to my mind is a different thing from being "missing". By contrast to version 3.7, the test for missing data in 4.0 actually does something useful. Consider the following example (modified from the docs) set style data lines plot '-' 1 10 2 20 3 * 4 40 5 50 e set datafile missing "*" plot '-' 1 10 2 20 3 * 4 40 5 50 e As the documentation says, the first plot will incorrectly draw a line through (2,3) because the 2nd field is an illegal numerical value. With "*" set as a missing data flag, however, the second plot is drawn correctly. This didn't used to happen. > > I am open to the suggestion that 'plot with lines' in particular > > should be modified so that missing data does not produce > > breaks in the line. > > As long as missing data output DF_UNDEFINED from datafile.c, it must. > Otherwise we'ld be changing the behaviour of piecewise functions > in a seriously broken manner, e.g. > > plot [-10:10] abs(x)>5?abs(x):0/0 w l > > would suddenly have a connection between points (-5,-5) and (5,5), which > we quite definitely don't want. I don't quite follow you here. There was no such connection before the change (in 3.7) and no such connection after the change (in 4.0). If I put in an explicit test for DF_MISSING it won't add a connection then either. I have just modified the code in cvs so that df_readline() passes the DF_MISSING up to the callers. There only three callers (in plot2d.c plot3d.c and fit.c), and all three continue to treat this case as they have been by falling through to the DF_UNDEFINED handler. But if people want 'plot with lines' to draw lines through missing data then I can add a simple test so that the code section in plot2d.c looks like this: case DF_MISSING: if (current_plot->plot_style == LINES) continue; /* Otherwise missing data is treated the same as undefined */ case DF_UNDEFINED: /* bad result from extended using expression */ current_plot->points[i].type = UNDEFINED; i++; continue; That is OK with me, or at least I don't object very strongly. I suppose logically we would test also for LINESPOINTS. I was just trying to point out that this returns us to the case (for "plot with lines") that there is no difference between setting or not setting the missing data flag. Why have a flag if it doesn't make any difference which way it is set? If the user doesn't want to make a distinction between missing and undefined points, then he doesn't have to use the "set datafile missing <char>" option at all. If he wants to indicate missing data then he can set the flag, and the resulting plot will show a break in the line where there is missing data. That seems better to me, but either way it is up to the user to choose which he wants. I hope I have explained my thoughts better this time. -- Ethan A Merritt Department of Biochemistry & Biomolecular Structure Center University of Washington, Seattle |