From: Hans-Bernhard B. <br...@ph...> - 2004-06-13 13:20:39
|
On Sat, 12 Jun 2004, Ethan A Merritt wrote: > I think the documentation is wrong, and always was wrong, because > the behavior it describes has nothing to do with whether the data field > contains the "missing" flag or not. It simply describes what happens if the > field is *illegal*, which to my mind is a different thing from being > "missing". Intent is hard to be certain about, this far into the project's lifetime... > By contrast to version 3.7, the test for missing data in 4.0 actually > does something useful. I'm not convinced that this is true. Note that there are a total of about 12 distinct cases to consider: the data entry in question can be a number, unparseable garbage, something contained in quotes, or the marker defined by 'set misssing', and the plot can have no using specification at all, a simple one ("using 1:2" style) or an extended one ("using ($1):($2)") for the column in question. In version 3.7, it seems the main the effect of 'set missing' was to change the behaviour in the case of non-numeric input in the face of simple using specs. Without it, they would would be treated as undefined data (--> break in 'with lines' plots), but if they matched a defined 'set missing' string, they would treat them like in the case of no using specification: they'ld ignore it (--> no break, because there's no trace left of that entire data point). We may have over-done the matter in the 3.8 series, rendering 'set missing' useless, or at least giving it an entirely different meaning than it used to have. E.g., the whole idea that quoted entries in datafile get special treatment is new. > As the documentation says, the first plot will incorrectly draw > a line through (2,3) because the 2nd field is an illegal numerical > value. With "*" set as a missing data flag, however, the second > plot is drawn correctly. To be precise, it's drawn in one of two possible ways that could be called "correct" here: it has a break in it, because the datapoint was kept in the internal lists, but flagged as DF_UNDEFINED. > This didn't used to happen. For the case of no using spec at all, it didn't. I suspect 'set missing' simply never had any effect on such plots, in 3.7. So arguing over their behaviour may be pointless. The compatibility we should worry about is what happens in those case where 3.7 did behave at least marginally sensibly, i.e. the 'using 1:2' and 'using ($1):($2)' ones. > > > I am open to the suggestion that 'plot with lines' in particular > > > should be modified so that missing data does not produce > > > breaks in the line. > > > > As long as missing data output DF_UNDEFINED from datafile.c, it must. > > Otherwise we'ld be changing the behaviour of piecewise functions > > in a seriously broken manner, e.g. > > > > plot [-10:10] abs(x)>5?abs(x):0/0 w l > > > > would suddenly have a connection between points (-5,-5) and (5,5), which > > we quite definitely don't want. > > I don't quite follow you here. There was no such connection before the > change (in 3.7) and no such connection after the change (in 4.0). Exactly. But as of your October 2002 change, flagged "missing" input ends up as datapoint with DF_UNDEFINED in the internal point lists, just like the points on the gap of the above function do. So a change to let 'plot with lines' continue across DF_UNDEFINED points would change the function plot's behaviour, and we don't want that. > I have just modified the code in cvs so that df_readline() passes the > DF_MISSING up to the callers. I have serious doubts about that being the right idea, but don't have the time right now to investigate it fully, sorry. -- Hans-Bernhard Broeker (br...@ph...) Even if all the snow were burnt, ashes would remain. |