gnuplot / Bugs / #2666 'sample' keyword is not tolerated after a `plot for[]` iterator

Ethan Merritt - 2023-10-28

I think there may be a misunderstanding here. It may be that you do understand what the sample keyword means and I am misunderstanding your description of a problem, or it may be that you misunderstand what the keyword means.

1) There is only one x-axis, one x2-axis, etc per plot no matter how many lines or boxes or whatever are drawn in that plot command. So the entire plot has only one xrange. You can let it default or you can set it before hand with set xrange. For historical reasons you can also provide it as the very first thing in a plot or splot command. That historical option was IMHO a big mistake that has caused much grief over the past 20 years or so as it was preserved in the name of backwards compatibility. Be that as it may, it can cannot appear more than once. The same applies to the x2, y, y2, and z axis ranges.

2) However all components of a plot command are free to generate samples for plotting. The problem is that defining the sample set uses a syntax that looks like a range. So if you put it at the beginning of a plot command the program cannot tell whether you are providing an axis range or a sample condition. This ambiguity cannot arise anywhere else in the plot command because an axis range is not possible anywhere else in a plot command.

With those points in mind, I don't think your messiness bullet points correctly express what is going on:

(a). required when the <axis-range-list> is incomplete

Complete or incomplete is not relevant. The sample keyword is required at the beginning of a plot command because otherwise the [beg:end] would be mistaken as an axis range when it is really something else.

(b). tolerated when the <axis-range-list> is complete

No. There is never a case where sample is "tolerated". It is either required or it is incorrect. You may have a point that there are places it would be harmless to accept it even when not neede, and we can get back to that in a minute....

(c). forbidden when the prior token is a 'for' iterator </axis-range-list></axis-range-list>

No. This is point (1) above. It is never possible to have more than one x-axis range in a plot, so an axis range can never appear inside a for iteration because that would attempt to define it multiple times. Therefore the syntax you show is not possible. However it is possible to define a new sampling rule in each iteration, so in a command like this
plot for [i=1:N] [q=10:20:2] '+' using ($1+i):($1) with points
the section beginning q= is unambiguously a sampling rule, since it cannot possibly be an axis range. Also the presence of a third item inside the square brackets is another giveaway.

If it were entirely up to me and if backwards-compatibility were not a requirement, I would solve all of this confusion by forbidding axis ranges inside a plot command. Then the sample keyword would not be needed because all the square-bracket thingies [something:something:foo] in a plot command would unambiguously refer to sampling anyhow. Alas that's not where we are ;-/

So anyhow, yes I suppose it would be possible to silently accept an unnecessary sample keyword inside a plot iteration. In that case I suppose it would logically also ignore it as an unnecessary keyword when it appears multiple times in a plot command. But does it really reduce user confusion? I imagine users would then be wondering "what is this keyword that seems to be accepted but has no visible effect on the plot?".

Last edit: Ethan Merritt 2023-10-28

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-10-28

Re commas:

The reason a comma may or may not appear after a definition is simply that the plot may consist of <definition> <first plot component>, <second plot component>
or
<definition> <empty plot component>, <first plot component>, ...

I.e. it's not that the comma is optional, it's just that the plot component immediately following the definition may be <null>. Am I making sense?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-10-28

So I had another thought....

Perhaps the keywords sample should instead have been something meaning end_of_axis_ranges. That's a terrible keyword, but maybe it is more obvious where it must go. If nothing else, maybe the documentation could take this approach to explaining it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- najevi - 2023-10-29
  
  Oh yes! That is exactly what I've been struggling to articulate these past couple of days.
  end_of_axis_ranges is equivalent to my Strawman 2.
  Read my reply below and I think you'll see that Strawman 1 is even better than Strawman 2.
  
  Last edit: najevi 2023-10-29
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

najevi - 2023-10-29

To be honest I lost the plot (sic!) while writing this up! Preparing my reply to your responses helped bring clarity.

Please consider the difference between what a keyword descriptively MEANS and the effect that a keyword CAUSES. In the case of this keyword sample I think it is fair to say that it is a misnomer.

This particular keyword causes the gnuplot parser to stop waiting for [ = : ] lexical constructs/elements that are to be interpreted as axis-range specifiers. This sample keyword is not the only lexical element that causes the parser to stop waiting for axis-range specifiers. The script attached to this bug report demonstrates that both a <definition> and the for keyword of an iterator also cause the gnuplot parser to "stop waiting" for axis-range specifiers.

The current description for the axis-range-list provides for a so-called placeholder in the form of the empty axis-range specifier. viz.

Use an empty range [] as a placeholder if necessary.

So this sample keyword (that undeniably carries semantic baggage or implicit description) is causing an effect (on the parser) that 3, 4 or 5 empty ranges could equally well achieve.

In this type of situation I find it helpful to exhaustively enumerate all possible (regardless of how probable) lexical element sequences may give rise to the kind of ambiguity that sample was apparently designed to resolve. viz.

Within each of the following four groupings of (s)plot command line fragments, each line is equivalent to the other lines in that one group:

# Table 1 splot sample [dum=beg:end:int] { [dum=beg:end:int] } splot []sample [dum=beg:end:int] { [dum=beg:end:int] } splot [][]sample [dum=beg:end:int] { [dum=beg:end:int] } splot [][][] [dum=beg:end:int] { [dum=beg:end:int] } plot sample [dum=beg:end:int] plot []sample [dum=beg:end:int] plot [][]sample [dum=beg:end:int] plot [][][]sample [dum=beg:end:int] plot [][][][] [dum=beg:end:int] set parametric splot sample [dum=beg:end:int] { [dum=beg:end:int] } splot []sample [dum=beg:end:int] { [dum=beg:end:int] } splot [][]sample [dum=beg:end:int] { [dum=beg:end:int] } splot [][][]sample [dum=beg:end:int] { [dum=beg:end:int] } splot [][][][]sample [dum=beg:end:int] { [dum=beg:end:int] } splot [][][][][] [dum=beg:end:int] { [dum=beg:end:int] } plot sample [dum=beg:end:int] plot []sample [dum=beg:end:int] plot [][]sample [dum=beg:end:int] plot [][][]sample [dum=beg:end:int] plot [][][][]sample [dum=beg:end:int] plot [][][][][] [dum=beg:end:int]

I did not think it relevant to enumerate those cases where an intervening for or <definition> appear since those cases give rise to no ambiguity and so the sample keyword is not required in those cases.

As a human parsing Table 1, I cannot avoid arriving at the conclusion that sample is compensating for a "badly formed" (i.e. incomplete) axis-range-list.
It really has nothing to do with announcing a sampling-range-list.
If it's "practical cause" or "primary role" were to announce a sampling-range-list then that same announcement should not throw an error when used in other scenarios (other sequences of lexical elements) that permit a sampling-range-list to follow. The script attached to this bug report identifies those scenarios using ## comments.
(Please don't misunderstand the primary thrust of my proposal. I do not recommend tolerating the sample keyword in places where it is redundant. I use that demonstrable fact to highlight the misnomer that is the sample keyword.)

Also the presence of a third item inside the square brackets is another giveaway.

This sentence of your reply was the most reassuring for me to read.
I understand your remark alludes to the following paragraph from the manual:

The range specifiers for sampling on u and v can include an explicit sampling interval to control the number and spacing of samples:
splot sample [u=30:70:1][v=0:50:5] '++' using 1:2:(func($1,$2))

Noting that x is the independent variable when not in parametric mode, do you agree that the very same can be said of:
"The range specifier (singular) for sampling on either t or x can include an explicit sampling interval to control the number and spacing of samples?"

Incidentally that language: "range specifiers for sampling" very closely matches the terminology that I am advocating: "sampling-range specifier" and "sampling-range-list".

Based on the above common understanding, I think we can agree that the gnuplot parser cannot possibly mistake a [dum=beg:end:int] construct for an axis-range specifier.
If we do agree on the above then it should be easy to recognize that an equally effective resolution to the ambiguity arising from (what I persistently argue is) an incomplete axis-range-list situation is to ensure that all sampling-range specifiers adhere to a two-colon syntax: [ {=} : : ].

Said differently and, I would suggest, far more succinctly than the existing "messiness" (see Strawman 0, later) :-

# Strawman 1 sampling-range-list: [{<dum>=}{<beg>}:{<end>}:{<int>}] { [{<dum>=}{<beg>}:{<end>}:{<int>}] }

I observe the following of the gnuplot parser:
1. While waiting for axis-range specifiers, the gnuplot parser of today throws an error when that second colon is encountered. (Error: ']' expected)
2. While waiting for a sample-range specifier, the gnuplot parser of today throws an error when that second colon is not followed by an expression. (Error: invalid expression)

So if the gnuplot parser can throw the first error then it ought just as easily use that same event to "stop waiting" for axis-range specifiers ... n'est-ce pas?

Assuming you buy into that proposition then we still need to consider the situation where the interval, <int>, is not specified (i.e. is blank) for a sampling-range specifier.
Is it any more complex to have the parser recognize a required second colon and treat the absence of an expression before the closing ] brace in just the same way as it currently treats the absence of a second colon before that closing brace?

I respectfully submit that it may be worthwhile considering if mandating such a humble second colon in a relevant lexical element does a less confusing job of resolving ambiguity than the misnamed sample keyword in a non-relevant lexical element.

In this "alternate syntax universe" the formerly ambiguous scenarios described in Table 1 now look as follows:

# Table 2 splot [dum=beg:end:int] { [dum=beg:end:int] } splot [] [dum=beg:end:int] { [dum=beg:end:int] } splot [][] [dum=beg:end:int] { [dum=beg:end:int] } splot [][][] [dum=beg:end:int] { [dum=beg:end:int] } plot [dum=beg:end:int] plot [] [dum=beg:end:int] plot [][] [dum=beg:end:int] plot [][][] [dum=beg:end:int] plot [][][][] [dum=beg:end:int] set parametric splot [dum=beg:end:int] { [dum=beg:end:int] } splot [] [dum=beg:end:int] { [dum=beg:end:int] } splot [][] [dum=beg:end:int] { [dum=beg:end:int] } splot [][][] [dum=beg:end:int] { [dum=beg:end:int] } splot [][][][] [dum=beg:end:int] { [dum=beg:end:int] } splot [][][][][] [dum=beg:end:int] { [dum=beg:end:int] } plot [dum=beg:end:int] plot [] [dum=beg:end:int] plot [][] [dum=beg:end:int] plot [][][] [dum=beg:end:int] plot [][][][] [dum=beg:end:int] plot [][][][][] [dum=beg:end:int]

Isn't that a whole lot less confusing than before?

To reiterate the key point of my proposal:
At the same point when the current gnuplot parser throws an error at the second colon the alternate universe parser recognizes that second colon as the signal to "stop waiting" for more axis-range specifiers.
If at some future release the axis-range-list is (we can only hope!) deprecated then the (albeit one character longer) mandatory components of the sampling-range-list syntax need not be changed.

Next I should probably ask, "Which derivative of Backus-Naur Form does the gnuplot manual aspire to adhere to?"
I struggle to recognize the abbreviated form but I think I have the gist of using the limited syntactic toolbox described in Part I of the manual.

I ask you to consider how a derivative B-N form of syntax definition might best document the current situation with the sample keyword.
I've given this considerable thought and there is no way I can come up with a description that involves the syntax definition for the sampling-range-list.
The difficulty I keep running into is finding some precedent for what I can best describe as a "conditionally required" keyword.
Usually keywords are either mandatory or optional.
Conditionally mandatory keywords are not something I can remember encountering.
In this case the condition is:
(!expected_size_axis_range_list_parsed && !for_keyword_parsed && !definition_parsed)
or the logically equivalent condition:
!(expected_size_axis_range_list_parsed || for_keyword_parsed || definition_parsed)
I've seen some XML-based examples that handle this type of "stateful" situation but nothing that resembles what I see being used in the gnuplot manual.

The best I can come up with using just the syntactic toolbox described in Part I gnuplot - Syntax, involves what is currently called {ranges} and what I prefer to call axis-range-list.
It is truly messy ... I hesitate to even share it here ... however doing so helps me take a step toward proffering Strawman 2.

# Strawman 0 (close to -- but not quite exactly -- the current parser behavior) <axis-range-list>: (splot not in parametric mode) [<xrange>] {sample | {[<yrange>] {sample | {[<zrange>]}}}} <axis-range-list>: (plot not in parametric mode) [<xrange>] {sample | {[<yrange>] {sample | {[<x2range>] {sample | {[<y2range>]}}}}}} <axis-range-list>: (splot in parametric mode) [<urange>] {sample | {[<vrange>] {sample | {[<xrange>] {sample | {[<yrange>] {sample | {[<zrange>]}}}}}}}} <axis-range-list>: (plot in parametric mode) [<trange>] {sample | {[<xrange>] {sample | {[<yrange>] {sample | {[<x2range>] {sample | {[<y2range>]}}}}}}}} <trange>, <urange>, <vrange>, <xrange>, <yrange>, <zrange>, <x2range>, <y2range> : {<dummy-var> =} {{<min>} : {<max>}}

The "but not quite exactly" remark refers to that situation when an intervening for keyword or an intervening <definition> is discovered by the parser.
Please indulge me this minor point since this is really only a stepping stone to Strawman 2. (Eyes on the prize: Strawman 1 is what I am advocating.)

From the above messy syntax I think it is a very small step to the proposition that the axis-range-list needs it's own terminating token.

Such a token ought to be similar in form to an axis-range specifier yet different in form from a sampling-range specifier.
It could be a 'smily face emoji' if you have a sense of humor or it could be a mundane PERIOD aka, full stop!
So, for the sake of argument, I'm proffering this alternate, Strawman 2, for comparison.

Let [.] be the terminating token at the end of an axis-range-list. (No, I am not serious but please, go with this idea just for now if only to appreciate the comparison.) Feel free to replace the period . character with your favorite punctuation mark!
With that choice the above Strawman 0 syntax can be greatly simplified to just:

# Strawman 2 <axis-range-list>: (splot not in parametric mode) [<xrange>] {[<yrange>] {[<zrange>]}} [.] <axis-range-list>: (plot not in parametric mode) [<xrange>] {[<yrange>] {[<x2range>] {[<y2range>]}}} [.] <axis-range-list>: (splot in parametric mode) [<urange>] {[<vrange>] {[<xrange>] {[<yrange>] {[<zrange>]}}}} [.] <axis-range-list>: (plot in parametric mode) [<trange>] {[<xrange>] {[<yrange>] {[<x2range>] {[<y2range>]}}}} [.] <trange>, <urange>, <vrange>, <xrange>, <yrange>, <zrange>, <x2range>, <y2range> : {<dummy-var> =} {{<min>} : {<max>}}

Then the exhaustive list (Table 1 above) of previously ambiguous scenarios now looks like:

# Table 3 splot [.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][][.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][][][.] [dum=beg:end:int] {[dum=beg:end:int]} plot [.] [dum=beg:end:int] plot [][.] [dum=beg:end:int] plot [][][.] [dum=beg:end:int] plot [][][][.] [dum=beg:end:int] plot [][][][][.] [dum=beg:end:int] set parametric splot [.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][][.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][][][.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][][][][.] [dum=beg:end:int] {[dum=beg:end:int]} splot [][][][][][.] [dum=beg:end:int] {[dum=beg:end:int]} plot [.] [dum=beg:end:int] plot [][.] [dum=beg:end:int] plot [][][.] [dum=beg:end:int] plot [][][][.] [dum=beg:end:int] plot [][][][][.] [dum=beg:end:int] plot [][][][][][.] [dum=beg:end:int]

Personally, if given the choice between Table 3 and Table 2 I'd choose Table 2 in a heart beat!
The "range-ified" [.] period looks like a clumsy appendage just as the sample keyword looks like a messy misnomer!!

So please consider the virtues of Strawman 1 :
1. It is brief/succinct to describe/learn and to script.
2. It shifts burden away from the script writer and toward the parser.
3. It does not ask the parser to keep track of state any more than the parser does today.
4. It reinforces the visible difference between an axis-range specifier and a sampling-range specifier.
5. It does not need to change if/when ever the problematic axis-range-list is deprecated.

What I've done in this reply is offered up a couple of strawmen.
This method of argument can sometimes carry the risk of the proffered strawman overlooking some important premise. As I am new to gnuplot that is a risk I am keenly aware of.
Please feel free to set me straight if I've overlooked an important premise.
I never mind wearing egg on my face ... when it's warranted! ;-)

Last edit: najevi 2023-10-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-11-01

I wish you had been around to contribute to this discussion ten years ago when the mechanism of giving sampling ranges for '+' '++' or autogeneration of samples inside the plot command was first introduced for version 5.

Anything we do or change now is constrained by the requirement not to have version 6 break existing scripts written for version 5. It's OK to mark a bit of pre-version 6 syntax "deprecated" if it has been replaced by a better alternative, but the old deprecated syntax still has to be accepted. That means we can't just require that all sampling specifiers contain three colon-separated values, even though in retrospect that would have avoided this mess.

I do like the idea of recognizing a 2-colon sampling specifier as such if it is encountered at the start of a plot command rather than issuing the error message Error: ']' expected. Unfortunately after a first look I don't see an easy way to do that without adding a lot of code. The issue is that parsing and storing the first two fields can have side-effects (changes axis ranges and autoscale settings). If a second colon is then encountered, those side-effects would have to be reverted before continuing. A dumb-but-simple look-ahead to see if there is another colon coming up would work most of the time, but would fail in corner cases like
plot [x=(foo?min1:min2) : (foo?max1:max2)] f(x)
I will take a closer look later; maybe the code can be refactored to make backing out not so painful.

And yes, probably an empty expression after the second colon could be treated the same as not having a colon at all (defaults to either 1 or to range/samples depending on the context). I'll need to test this carefully.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

najevi - 2023-11-01

I understand the importance of backward compatibility in the context of a ubiquitous phenomenon like x86 assembly code. I am not as clear about the virtue of maintaining backward compatibility for a tool such as gnuplot. However, I do understand that "it is what it is" so ... I'll just move on!

I reckon we might already have a suitable end_of_axis_rangestoken!

You wrote that a <definition> (especially one without a following comma) is a legitimate lexical element between the <axis-range-list> and the first <plot-element>. (That fact is now explicitly documented via my recent edits of the syntax definition for (s)plot.) So it seems to me that absolutely any dummy definition will serve as an effective end_of_axis_rangestoken. Can you think of any reason why this is not so?

a=b is 3 char shorter than sample and exactly the same length as the ugly appendage [.] that I proffered for Straw man 2. So in my head I am running with that idea for now. It is not as elegant as Straw man 1 but it comes at zero cost.
Now, this might be one of those lingering handy uses of string macro expansion. Consider:
r="xyzzy=1" ... so @r is now a convenient end_of_axis_rangestoken and is just two chars to type! (at first I did try a greek letter like Ω but the parser balked at @Ω)

Have you (or anyone else) taken the time to write out a (quite possibly very lengthy) BNF-like syntax description for the various gnuplot commands?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-11-02

I did try a greek letter like Ω but the parser balked at @Ω

Heh, I consider that a bug, if rather low priority. The @-as-macro code is old and almost certainly predates the utf8-ification of strings and variable names. Care to fille it as a separate tracker item?

Have you (or anyone else) taken the time to write out a (quite possibly very lengthy) BNF-like syntax description for the various gnuplot commands?

The manual descriptions are intended to be BNF-like, but they are certainly not very strict about it.
One of the oldest TODO suggestions in the code repository is this one "from way back when":

completely rewrite the parser. Use yacc/bison if possible.

maybe rewrite the scanner in lex? The benefits are not so obvious, because the current scanner is extremely simple. This is probably only worthwhile if the parser is rewritten at the same time.

No one ever took that on as a project. The program has grown tremendously since then, so the project would be much bigger now.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- najevi - 2023-11-02
  
  I am nowhere near ready to "look under the hood" at code for gnuplot but, ... never say never ... maybe someday!
  
  It would be foolhardy for anybody to pick up the lexical scanner task you mentioned without first having a comprehensive gnuplot syntax chart to work from.
  
  An "Appendix" of BNF-style syntax definitions is something I would like to contribute (really for my own understanding but also for the above) however I won't make in-roads on that until after Valentine's Day. For now the best I can manage is to incorporate as much as I can glean from the various main paragraph text into the most relevant snippets of BNF-like Syntax definitions that are in place for each subsection. This will necessarily grow the number of lines in those "Syntax:" headed subsections so if that verbosity is undesirable then please advise and I'll start some Appendix-like repository for the longer-winded syntax definitions and leave the more casual "syntax-by-example" sections as they currently are.
  
  It progresses in fits and starts because I cannot feel confident about my edits until I've hands-on tested the various commands and features that I'm writing about seven ways from Sunday! Fortunately the plethora of demo files makes for a great bootstrap in doing just that.
  
  5 Dec is a hard cut off (to my availability) at so for the time being I am focused on making as many (hopefully valued) edits as I can to the existing documentation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-11-05

Update:

The latest commit series in 6.1 takes a couple of steps in the direction you are advocating for. When the plot command is parsed, If a range specification is seen to contain three fields [min:max:increment] rather than only two, it is recognized as a sampling range rather than an axis range. This removes the need for the sample keyword in many cases, but it does require that you use the three-field form. To make this easier, an empty third field can be used to indicate that the default sampling increment should be used. This is either 1 or (max-min)/samples depending on context. I have modified the relevant documentation sections to desribe the new behaviour.

Thus all three of the plot commands below are now equivalent:

set samples 100 plot sample [t=0:10] f(t) plot [t=0:10:0.1] f(t) plot [t=0:10:] f(t)

I am not 100% certain there are no unintended side-effects of this change since it involves a try-once-and-revert-if-it-fails step in the parsing code. There may also be some odd cases where it makes a difference whether you do or do not include a variable name at the beginning of the range. And perhaps there is some way to further obviate the need for a separate sample keyword.

For these reasons I think it is not ready for inclusion in the initial 6.0 release. Let it cook in the development branch for a while.

Last edit: Ethan Merritt 2023-12-07
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

'sample' keyword is not tolerated after a `plot for[]` iterator

A portable, multi-platform, command-line driven graphing utility

Priority

Searches

Help

#2666 'sample' keyword is not tolerated after a `plot for[]` iterator

Discussion