As Brian pointed out a few years back, tabs are a devilish thing. Here's
a new twist: how do tabs relate to comments.
Here's an example (^ stands for the start of the line, a series of ```
stands for a tab):
^# comment 1
^```# comment 2
^literal:```|```# comment 3
^ # content
^``` # TWISTED 1
^ # comment 4
^flow:```[ foo,```# comment 5
^ # comment 6
^ ```# comment 7
^# ERROR 1
^``` # TWISTED 2
^ bar ]
Under the current spec, a comment needs to have some minimal indentation
(depending on the context). This indentation must consist of only
spaces (0 spaces in case of leading document comments, at least one
space inside the flow collection, etc.). Following the minimal required
indentation, the comment may contain leading discarded white space
(including tab characters).
- Comments 1, 4, 6 are obviously valid (no tabs, sufficient
- Comments 3 and 5 are also valid - a comment trailing a line is, by
definition, sufficiently indented, so using tab characters is not an
problem. It would make little sense to forbid the use of tabs in front
of the '#' when it is allowed elsewhere in the line (e.g., after the
- Comments 2 and 7 are also valid - they has sufficient spaces before
the tab character (0 spaces for comment 2 and 1 space for comment 1).
- ERROR 1 is invalid because it is not sufficiently indented.
So far, so good. However...
- TWISTED 1 is a valid comment that terminates the literal scalar,
because it is "not indented".
- TWISTED 2 is an error, because (again) it is "not indented".
While this is consistent with the rules, I'm uncomfortable with it.
- I see little point in requiring minimal indentation from comments. If
we remove this whole notion, the spec would become simpler, and ERROR 1
and TWISTED 2 become legal - which I think is more intuitive.
Note that removing the notion of minimal indentation doesn't cause any
ambiguity, in particular it still allows the first line of a literal
scalar to begin with '#' without having to specify an explicit
- The TWISTED 1 case, however, is not that easily solved. I would dearly
love to make it an error, but:
1. Forbidding tabs from being used in comment lines is an overkill,
given we must allow tab characters in comments ending a line and that
in 99% of the cases tabs are harmless.
2. Production-wise, it is trivial to forbid the first comment line
following block scalar content from using any tab characters. That
seems rather hackish, though...
3. Saying the indentation is the number of spaces up to the first tab
leads to TWISTED 1 being a valid comment line, which would be rather
surprising to the reader.
A dilemma. As Brian said, tabs are the invention of the devil (BTW, the
original bible was written without any white space :-).
If forced to choose, I'd go with option 2 (the first comment line to
follow a block scalar must not contain leading tabs). But I'm far from
thrilled by it. Anyone has a better idea?
Debugging the productions is such a fun job ;-)