From: Osamu T. <os...@bi...> - 2009-09-02 18:52:00
|
Dear the YAML spec writers, I am now testing my YAML 1.2 parser written in C# along with the YAML 1.2 specification dated 2009-07-21. Meanwhile, I have some questions and comments on the specification. Please allow me to post them in several independent emails. At first, I noticed the example 7.17. is not reproduced by my parser nor ypaste. To reproduce the result, it is required to replace ( ":" followed by an ns-char ) in the the definition of ns-plain-char(c) by ( ":" followed by an ns-plain-safe(c) ). ns-plain-char(c) ::= ns-plain-safe(c) | ( /* An ns-char preceding */ “#” ) | ( “:” /* Followed by an ns-plain-safe(c) */ ) ~~~~~~~~~~~~~~~~ I feel like the result of this example is more desirable than accepting "omitted value:" as the key for the entry. What is the official way to accept the source? Osamu Takeuchi PS The followings are the mechanical defects I found in the spec in addition to the ones indicated by Brad on 2009-07-25. - Example 6.4.: A line break in literal text is missing in the result. - Example 8.21. shows a completely deferent result. - Example 9.5. : Map node does not seem to exist in the source. |
From: Oren Ben-K. <or...@be...> - 2009-09-30 05:34:01
|
On Thu, 2009-09-03 at 03:34 +0900, Osamu TAKEUCHI wrote: > I noticed the example 7.17. is not reproduced by my > parser nor ypaste. To reproduce the result, it is required to > replace ( ":" followed by an ns-char ) in the the definition > of ns-plain-char(c) by ( ":" followed by an ns-plain-safe(c) ). I don't see how. The example does work in ypaste. "followed by ns-char" obviously allows _more_ than "followed by ns-plain-safe(c)". This does not matter, since the following character is _not_ consumed by this production, and is left to be consumed by a normal ns-plain-char production later. > PS > - Example 8.21. shows a completely deferent result. How so? literal: |2 value folded: !foo >1 value Vs.: %YAML 1.2 --- !!map { ? !!str "literal" : !!str "value", ? !!str "folded" : !<!foo> "value", } Seems the same to me... If the !<!foo> is throwing you off, see "Verbatim Tags". Have fun, Oren Ben-Kiki |
From: Osamu T. <os...@bi...> - 2009-09-30 06:20:33
|
Oren, > On Thu, 2009-09-03 at 03:34 +0900, Osamu TAKEUCHI wrote: >> I noticed the example 7.17. is not reproduced by my >> parser nor ypaste. To reproduce the result, it is required to >> replace ( ":" followed by an ns-char ) in the the definition >> of ns-plain-char(c) by ( ":" followed by an ns-plain-safe(c) ). > > I don't see how. The example does work in ypaste. "followed by ns-char" > obviously allows _more_ than "followed by ns-plain-safe(c)". This does > not matter, since the following character is _not_ consumed by this > production, and is left to be consumed by a normal ns-plain-char > production later. ypaste seems to accept "omitted value:" as the key, instead of "omitted value". >> PS >> - Example 8.21. shows a completely deferent result. > > How so? Sorry, I do not see what I though wrong. I think I was careless. Best, Osamu TAKEUCHI |
From: Oren Ben-K. <or...@be...> - 2009-09-30 08:45:28
|
On Wed, 2009-09-30 at 15:20 +0900, Osamu TAKEUCHI wrote: > > On Thu, 2009-09-03 at 03:34 +0900, Osamu TAKEUCHI wrote: > >> I noticed the example 7.17. is not reproduced by my > >> parser nor ypaste. To reproduce the result, it is required to > >> replace ( ":" followed by an ns-char ) in the the definition > >> of ns-plain-char(c) by ( ":" followed by an ns-plain-safe(c) ). > > > > I don't see how. The example does work in ypaste. "followed by ns-char" > > obviously allows _more_ than "followed by ns-plain-safe(c)". This does > > not matter, since the following character is _not_ consumed by this > > production, and is left to be consumed by a normal ns-plain-char > > production later. > > ypaste seems to accept "omitted value:" as the key, instead of > "omitted value". Yikes! You are right. Two nits though: First, it should be followed by ns-plain-char and not ns-plain-safe. Second, in c-ns-flow-map-separate-value, c-mapping-value must not be followed by an ns-plain-char. Nice catch! I've fixed this immediately into the 2009-09-29 patched version. Thanks! Oren Ben-Kiki |
From: Osamu T. <os...@bi...> - 2009-09-30 13:05:29
|
Oren, >> On Wed, 2009-09-30 at 15:20 +0900, Osamu TAKEUCHI wrote: >>>> On Thu, 2009-09-03 at 03:34 +0900, Osamu TAKEUCHI wrote: >>>>> I noticed the example 7.17. is not reproduced by my >>>>> parser nor ypaste. To reproduce the result, it is required to >>>>> replace ( ":" followed by an ns-char ) in the the definition >>>>> of ns-plain-char(c) by ( ":" followed by an ns-plain-safe(c) ). >>>> I don't see how. The example does work in ypaste. "followed by ns-char" >>>> obviously allows _more_ than "followed by ns-plain-safe(c)". This does >>>> not matter, since the following character is _not_ consumed by this >>>> production, and is left to be consumed by a normal ns-plain-char >>>> production later. >>> ypaste seems to accept "omitted value:" as the key, instead of >>> "omitted value". >> Yikes! You are right. Two nits though: >> First, it should be followed by ns-plain-char and not ns-plain-safe. > > Yes, this makes more sense. Sorry, I did not think enough for this issue. Imagine an input: {::::} With the new rule set, I at first expected the next. !!map ? !!str ":::" : !!null But actually, this input became invalid under the new rule set. Note that when the last ":" is not ns-plain-char(c), the previous one is not ns-plain-char(c), then the previous one is not ns-plain-char(c), then the previous one is not ns-plain-first(c). On the other hand, when we have ( ns-plain-safe(c) | "#" ) at the last, {::::a} this will be successfully parsed as the next. !!map ? !!str "::::a" : !!null This is confusing. The answer seems to be the next. ( ":" /* Followed by ( ns-plain-safe(c) | "#" | ":" ) */ ) Best, Osamu TAKEUCHI |
From: Oren Ben-K. <or...@be...> - 2009-09-30 14:29:44
|
On Wed, 2009-09-30 at 22:05 +0900, Osamu TAKEUCHI wrote: > Imagine an input: > > {::::} > > With the new rule set, I at first expected the next. > > !!map > ? !!str ":::" > : !!null > > But actually, this input became invalid under the new rule set. I must say I _like_ this being an error. The idea is that ':' is treated as part of the plain character if it is followed by something that makes it _clear_ it is not a key:value separator. Obviously in this case, there's no such indication. IMO _however_ you interpret it, some people will be confused. Given out #1 goal is readability, I think treating this as an error is a _good thing_. > The answer seems to be the next. > > ( ":" /* Followed by ( ns-plain-safe(c) | "#" | ":" ) */ ) There's no need to add '#' since it will be caught by ns-plain-char(c) since it is not preceded by a space. Nope, I think the rule as it is now actually does the right thing... Have fun, Oren Ben-Kiki P.S. Your post did remind me I forgot to flip the order of goals #2 and #3, I just did that. Lets consider this a loooong September 29th ;-) Oren. |
From: Osamu T. <os...@bi...> - 2009-10-01 00:02:36
|
Oren, > On Wed, 2009-09-30 at 22:05 +0900, Osamu TAKEUCHI wrote: >> Imagine an input: >> >> {::::} >> >> With the new rule set, I at first expected the next. >> >> !!map >> ? !!str ":::" >> : !!null >> >> But actually, this input became invalid under the new rule set. > > I must say I _like_ this being an error. > The idea is that ':' is treated as part of the plain character if it is > followed by something that makes it _clear_ it is not a key:value > separator. > Obviously in this case, there's no such indication. > IMO _however_ you interpret it, some people will be confused. > > Given out #1 goal is readability, I think treating this as an error is a > _good thing_. Your definition does not seem to increase readability. Let me show some examples. Your opinion: {a} => { "a" : null } {a:} => { "a" : null } {a: b} => { "a" : "b" } {a::} => error {a:: b} => error {a::a} => { "a::a" : null } {a::a:} => { "a::a" : null } {a::a: b} => { "a::a" : "b" } {a::a::} => error {a::a:: b} => error My opinion: {a} => { "a" : null } {a:} => { "a" : null } {a: b} => { "a" : "b" } {a::} => { "a:" : null } {a:: b} => { "a:" : "b" } {a::a} => { "a::a" : null } {a::a:} => { "a::a" : null } {a::a: b} => { "a::a" : "b" } {a::a::} => { "a::a:" : null } {a::a:: b} => { "a::a:" : "b" } The idea is that ':' is treated as part of the plain text unless it is followed by a delimiter in the context, which includes a space and a line break. My definition will be more understandable because we do not have to think what is "something that makes it _clear_ it is not a key:value separator." In addition, we have to look ahead only one character. Your rule requires to look ahead many characters. {a:::::::::::b} => { "a:::::::::::b" : null } {a::::::::::::} => error Note that, in this case, we have to look ahead more than ten characters in order to determine whether or not the first ":" is ns-plain-char. >> The answer seems to be the next. >> > >> > ( ":" /* Followed by ( ns-plain-safe(c) | "#" | ":" ) */ ) > > There's no need to add '#' since it will be caught by ns-plain-char(c) > since it is not preceded by a space. There's a need to add '#' since it will not be caught by ns-plain-safe(c) even though it is not preceded by a space nor line break. ;) Best, Osamu TAKEUCHI |
From: Oren Ben-K. <or...@be...> - 2009-10-01 02:59:22
|
On Thu, 2009-10-01 at 09:02 +0900, Osamu TAKEUCHI wrote: > Your definition does not seem to increase readability. > Let me show some examples... I'm fine with them. IMO interpreting a::::: as either "a::::": or "a:::::" is hard to understand. Some people will "obviously" see it as one, some will "obviously" see it as the other, both groups with the same level of conviction. That's not readable; a readable construct has only one "obvious" interpretation. And readability is our #1 goal. > The idea is that ':' is treated as part of the plain text > unless it is followed by a delimiter in the context, which > includes a space and a line break. You make a good case and I can see how this would seem consistent on its own. But - I feel it is inconsistent with the spirit of other YAML indicator rules. In general we tend to interpret indicators as indicators by default, unless there's some good reason not to - as opposed to the other way around. E.g., we decided to disallow the oh-so-attractive 1,000,000.00 format for numbers - we always treat an unquoted ',' as a separator. The winning argument there was again "unsurprising" readability: what does {1,234} mean? > In addition, we have to look ahead > only one character. Your rule requires to look ahead many > characters. > > {a:::::::::::b} => { "a:::::::::::b" : null } > {a::::::::::::} => error True, but since we have a limited 1K lookahead in such cases anyway, this is not really an issue. Especially since trailing ':' characters are a rare use case (embedded ':' and prefix ':' are much more common - URLs, C++ namespaces, etc.). We never hesitated to sacrifice parser complexity for gaining readability (the plain scalar is hellish to parse, we know). Finally, there's also the question of giving ourself a way to back out of bad decisions (which you may feel this one to be :-). I'd much rather play it safe and make trailing ':' characters an error for now. If this turns out to be a mistake, it is easy to change the rules to allow them, while maintaining compatibility with all existing files. If on the other hand we start allowing them now, then decide this was a mistake, it would be harder to switch back - as this would break once-valid files. Have fun, Oren Ben-Kiki |
From: Osamu T. <os...@bi...> - 2009-10-01 15:15:15
|
Oren, > On Thu, 2009-10-01 at 09:02 +0900, Osamu TAKEUCHI wrote: >> Your definition does not seem to increase readability. >> Let me show some examples... > > I'm fine with them. IMO interpreting a::::: as either "a::::": or > "a:::::" is hard to understand. Some people will "obviously" see it as > one, some will "obviously" see it as the other, both groups with the > same level of conviction. That's not readable; a readable construct has > only one "obvious" interpretation. And readability is our #1 goal. > >> The idea is that ':' is treated as part of the plain text >> unless it is followed by a delimiter in the context, which >> includes a space and a line break. > > You make a good case and I can see how this would seem consistent on its > own. But - I feel it is inconsistent with the spirit of other YAML > indicator rules. In general we tend to interpret indicators as > indicators by default, unless there's some good reason not to - as > opposed to the other way around. E.g., we decided to disallow the > oh-so-attractive 1,000,000.00 format for numbers - we always treat an > unquoted ',' as a separator. The winning argument there was again > "unsurprising" readability: what does {1,234} mean? Do you really think the following? {abc} is obviously { "abc" : null } {abc:} is obviously { "abc" : null } {abc::} might be read as { "abc::" : null } {abc:: a} might be read as { "abc::" <missing> "a" } I do not see how you got this idea. :( Could you please explain why the last ":" character in "abc::" can be read as a non indicator character? My reason why it must be read as an indicator character is very simple, because it is followed by a delimiter "}" as same as the case of "abc:". > Finally, there's also the question of giving ourself a way to back out > of bad decisions (which you may feel this one to be :-). I'd much rather > play it safe and make trailing ':' characters an error for now. If this > turns out to be a mistake, it is easy to change the rules to allow them, > while maintaining compatibility with all existing files. If on the other > hand we start allowing them now, then decide this was a mistake, it > would be harder to switch back - as this would break once-valid files. The compatibility across revisions does not support your proposal. Let's study some examples. +---+----------+--------------------+-------------------+-------------------+ | # | Input | old rule | my rule | your rule | +---+----------+--------------------+-------------------+-------------------+ |(1)| {abc: } | { "abc" : null } | { "abc" : null } | { "abc" : null } | |(2)| {abc:} | { "abc:" : null } | { "abc" : null } | { "abc" : null } | |(3)| {abc:: } | { "abc:" : null } | { "abc:" : null } | error! | |(4)| {abc::} | { "abc::" : null } | { "abc:" : null } | error! | +---+----------+--------------------+-------------------+-------------------+ Note that these four inputs had been valid YAML 1.2 documents before the patch. This behavior was explained, and is still explained as the following. >> Normally, YAML insists the “:” mapping value indicator be separated from >> the value by white space. A benefit of this restriction is that the “:” >> character can be used inside plain scalars, as long as it is not followed by >> white space. This allows for unquoted URLs and timestamps. It is also a >> potential source for confusion as “a:1” is a plain scalar and not a key: value >> pair. It clearly says that the ":" character can be used inside plain scalars, as long as it is not followed by a white space (or a line break, in reality). This statement is almost the same as my proposal. I wonder if you think this is inconsistent with the spirit of other YAML indicator rules. So, the following input had been also allowed without any uncertainty. - abc:: def - ? abc:: def - abc:: def - [ abc:: def ] The only one problem was the inconsistency between (1) and (2), and (3) and (4). Since ":" in (2) and (4) are not followed by a white space, they were included in the plain text in the old rule. Such a case only occurs when ":" is followed by an indicator character, namely one of "{", "}", "[", "]" and "," in the flow context. Although this behavior was not very much incompatible to the description in the spec, the result was confusing. Since Example 7.17 suggested that the spec writer (you?) thought {abc:} should have been accepted as { "abc" : null } contrarily to the old BNF syntax, I proposed a new rule as: The ':' character can be used inside plain scalars, as long as it is not followed by a delimiter in the context, which includes a space, a line break and other indicators depending on the context. So, it was very surprizing for me that you suddenly decided not to accept {abc:: } any more. It had nothing to do with the problem I pointed out. At least, you can not justify your proposal by the possible future compatibility because your proposal is, by itself, too much incompatible to the old rule, and even to the description in the current spec. Your patch already broke once-valid files and probably still-valid files. My current proposal is merely reviving such files. If you still think your new rule is superior, let's discuss how we should explain its behavior, its benefit and its compatibility, instead of suddenly introducing it in the spec. Best, Osamu TAKEUCHI |
From: Oren Ben-K. <or...@be...> - 2009-10-01 14:22:06
|
On Thu, 2009-10-01 at 22:15 +0900, Osamu TAKEUCHI wrote: > Do you really think the following? > > {abc} is obviously { "abc" : null } > {abc:} is obviously { "abc" : null } Yes. > {abc::} might be read as { "abc::" : null }\ > {abc:: a} might be read as { "abc::" <missing> "a" } No. In my rule this would be an error. I never suggested this would be what you show above. > I do not see how you got this idea. :( > Could you please explain why the last ":" character in "abc::" can > be read as a non indicator character? No, I can't, because I think it should be an error. > So, it was very surprizing for me that you suddenly decided not to > accept {abc:: } any more. Yes, you got me there. I guess I'll have to back up from it, to maintain compatibility :-( > If you still think your new rule is superior, I do, but not enough to break compatibility at this point :-( So basically the new rule should be: /* Followed by (ns-char - c-flow-indicator) */ Instead of: /* Followed by ns-char */ (originally) /* Followed by ns-plain-char(c) */ (my rule) This should be compatible with today's rule and do the (almost :-) right thing. Sigh. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@be...> - 2009-10-01 15:05:57
|
On Thu, 2009-10-01 at 07:12 -0700, Oren Ben-Kiki wrote: > So basically the new rule should be: > > /* Followed by (ns-char - c-flow-indicator) */ > I updated the spec accordingly. Have fun, Oren Ben-Kiki |
From: Osamu T. <os...@bi...> - 2009-10-01 15:20:35
|
Oren, > On Thu, 2009-10-01 at 22:15 +0900, Osamu TAKEUCHI wrote: >> Do you really think the following? >> >> {abc} is obviously { "abc" : null } >> {abc:} is obviously { "abc" : null } > > Yes. > >> {abc::} might be read as { "abc::" : null }\ >> {abc:: a} might be read as { "abc::" <missing> "a" } > > No. In my rule this would be an error. I never suggested this would be > what you show above. > >> I do not see how you got this idea. :( >> Could you please explain why the last ":" character in "abc::" can >> be read as a non indicator character? > > No, I can't, because I think it should be an error. Yes, you did, to justify the error. >> I'm fine with them. IMO interpreting a::::: as either "a::::": or >> "a:::::" is hard to understand. Some people will "obviously" see it as >> one, some will "obviously" see it as the other, both groups with the >> same level of conviction. That's not readable; a readable construct has >> only one "obvious" interpretation. And readability is our #1 goal. I read you did think {a:::::} could be interpreted as either { "a::::" : null } or { "a:::::" : null } and both of the interpretation were with the same level of conviction. If you don't, I do not see why you want to have it as an error. > So basically the new rule should be: > > /* Followed by (ns-char - c-flow-indicator) */ > > Instead of: > > /* Followed by ns-char */ (originally) > /* Followed by ns-plain-char(c) */ (my rule) No, it is not good enough, if you really want to have the compatibility. Note that, out of a flow context, it has been allowed to include indicators in a plain scalar. - a[b}c:{:,def:] It has to be: /* Followed by ( ns-plain-safe(c) | "#" | ":" ) */ (my rule) Best, Osamu TAKEUCHI P.S. It was not me but BlueG who pointed out the order of goals. Please revise the errata. > P.S. Your post did remind me I forgot to flip the order of goals #2 and > #3, I just did that. Lets consider this a loooong September 29th ;-) >> 1.1. Goals >> Switched the order between goals 2 and 3 (pointed out by Osamu >> Takeuchi). |
From: Oren Ben-K. <or...@be...> - 2009-10-01 17:00:27
|
On Fri, 2009-10-02 at 00:20 +0900, Osamu TAKEUCHI wrote: > > So basically the new rule should be: > > > > /* Followed by (ns-char - c-flow-indicator) */ > > > > Instead of: > > > > /* Followed by ns-char */ (originally) > > /* Followed by ns-plain-char(c) */ (my rule) > > No, it is not good enough, if you really want to have the > compatibility. > > Note that, out of a flow context, it has been allowed to > include indicators in a plain scalar. Damn. You're right... wishful thinking on my part - I was for not allowing that in the 1st place. I _knew_ it will get us into trouble :-) > It has to be: > > /* Followed by ( ns-plain-safe(c) | "#" | ":" ) */ (my rule) That's getting too complex. Better to define ns-plain-follow and reuse that (in three different places). Sigh. > Best, > Osamu TAKEUCHI > > > P.S. It was not me but BlueG who pointed out the order of goals. > Please revise the errata. Oh - right. Sorry, BlueGM! I'll fix that as well. Thanks, Oren Ben-Kiki. |
From: Oren Ben-K. <or...@be...> - 2009-10-01 18:09:05
|
Ok, I uploaded all the fixes. The latest version is now 2009-10-01. Oren. |
From: Osamu T. <os...@bi...> - 2009-09-30 09:15:51
|
Oren, > On Wed, 2009-09-30 at 15:20 +0900, Osamu TAKEUCHI wrote: >>> On Thu, 2009-09-03 at 03:34 +0900, Osamu TAKEUCHI wrote: >>>> I noticed the example 7.17. is not reproduced by my >>>> parser nor ypaste. To reproduce the result, it is required to >>>> replace ( ":" followed by an ns-char ) in the the definition >>>> of ns-plain-char(c) by ( ":" followed by an ns-plain-safe(c) ). >>> I don't see how. The example does work in ypaste. "followed by ns-char" >>> obviously allows _more_ than "followed by ns-plain-safe(c)". This does >>> not matter, since the following character is _not_ consumed by this >>> production, and is left to be consumed by a normal ns-plain-char >>> production later. >> ypaste seems to accept "omitted value:" as the key, instead of >> "omitted value". > > Yikes! You are right. Two nits though: > First, it should be followed by ns-plain-char and not ns-plain-safe. Yes, this makes more sense. > Second, in c-ns-flow-map-separate-value, c-mapping-value must not be > followed by an ns-plain-char. Fmm, I do not see why this is needed. Could you please give some example that makes unwanted result with the old rule? Thanks, Osamu TAKEUCHI |