From: William S. <sp...@rh...> - 2009-11-20 21:49:25
|
libyaml treats a file containing nothing but zero or more newlines as being completely empty and parses nothing out of it. However this appears to be inconsistent: The text "---\nx\n..." and "x" produce the same result: a start, "x" scalar, and an end token. But the text "---\n\n..." and "" produce different results. The first produces start, a "" scalar, and an end. The second produces nothing. In addition writing a single "" scalar using libyaml without the document start/end produces a file containing only a newline and thus is not inverted by the parser. This may seem trivial but I am using yaml internally to store serialized values of small pieces of data, and blank strings were breaking because of this. I had to modify libyaml to quote zero-length strings but I do not think that is the desired solution. My recommendation is that a completely blank yaml file be parsed as a single "" scalar. This may seem dangerous, because this file will "vanish" if concatenated with other data, but concatenation is not supposed to work anyway if the surrounding "---" and "..." are missing. Any opinions or have I missed anything? Thanks Bill Spitzak Rhythm & Hues software department. |
From: Ingy d. N. <in...@in...> - 2009-11-21 03:21:10
|
On Fri, Nov 20, 2009 at 1:11 PM, William Spitzak <sp...@rh...> wrote: > libyaml treats a file containing nothing but zero or more newlines as > being completely empty and parses nothing out of it. However this > appears to be inconsistent: > > The text "---\nx\n..." and "x" produce the same result: a start, "x" > scalar, and an end token. > > But the text "---\n\n..." and "" produce different results. The first > produces start, a "" scalar, and an end. The second produces nothing. > > In addition writing a single "" scalar using libyaml without the > document start/end produces a file containing only a newline and thus is > not inverted by the parser. > > This may seem trivial but I am using yaml internally to store serialized > values of small pieces of data, and blank strings were breaking because > of this. I had to modify libyaml to quote zero-length strings but I do > not think that is the desired solution. > > My recommendation is that a completely blank yaml file be parsed as a > single "" scalar. This may seem dangerous, because this file will > "vanish" if concatenated with other data, but concatenation is not > supposed to work anyway if the surrounding "---" and "..." are missing. > > Any opinions or have I missed anything? > It has always been the intent of YAML, that a Stream may contain zero or more Documents. This implies that there needs to be a way to serialize zero objects. BTW, a stream containing just comments is another example of something that will parse as having zero documents. It may not be written like this in the spec, but one way to think of it is as follows: 1. Every YAML Document in a YAML Stream begins with '---'. 2. If the first thing a parser sees in a stream, after skipping all ignorable whitespace (including comments), is not '---', nor a YAML directive, the parser should assume it saw "---\n". 3. If, after skipping all ignorable whitespace, the parser reaches the End Of Stream, it should report an STREAM_END event, without reporting any documents. Being able to serialize zero documents is important. For a real use case, imagine a service listening for YAML documents over a web socket. If the socket is closed after receiving no data, the service should *not* parse that as one document containing an empty string. After testing all these cases against libyaml, IMHO it gets things right every time. Cheers, Ingy > Thanks > Bill Spitzak > Rhythm & Hues software department. > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > |
From: William S. <sp...@rh...> - 2009-11-21 03:01:54
|
Ok, sounds reasonable. In that case the bug is that libyaml output when told to write a single "" scalar, does not write the correct thing, instead writing text that looks the same as an empty document. It is true that if you ask for document-start to be written it fixes it but there does not seem to be a requirement for that, and in fact I would prefer not to as the documents I am writing are so short that the extra 8 bytes are a significant fraction. > It has always been the intent of YAML, that a Stream may contain zero or > more Documents. This implies that there needs to be a way to serialize > zero objects. > > BTW, a stream containing just comments is another example of something > that will parse as having zero documents. > > It may not be written like this in the spec, but one way to think of it > is as follows: > > 1. Every YAML Document in a YAML Stream begins with '---'. > 2. If the first thing a parser sees in a stream, after skipping all > ignorable whitespace (including comments), is not '---', nor a > YAML directive, the parser should assume it saw "---\n". > 3. If, after skipping all ignorable whitespace, the parser reaches > the End Of Stream, it should report an STREAM_END event, without > reporting any documents. > > Being able to serialize zero documents is important. For a real use > case, imagine a service listening for YAML documents over a web socket. > If the socket is closed after receiving no data, the service should > *not* parse that as one document containing an empty string. > > After testing all these cases against libyaml, IMHO it gets things right > every time. > > Cheers, Ingy > > > Thanks > Bill Spitzak > Rhythm & Hues software department. > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Yaml-core mailing list > Yam...@li... <mailto:Yam...@li...> > https://lists.sourceforge.net/lists/listinfo/yaml-core > > |
From: Osamu T. <os...@bi...> - 2009-11-21 04:51:57
|
Bill and Ingy, Thanks Ingy. I finally understood the importance of providing a way to serialize zero objects by reading your post. Before that, I was thinking as the same as Bill. Then, according to the spec, a YAML serializer can encode an empty string (in YAML 1.1 document) or a null object (in YAML 1.2 document) as an empty node in all the cases except for the one where the whole YAML stream becomes empty. I think an explicit warning should be given in the spec to clarify this point with some explanation of the importance of providing a way to serialize zero objects. It will be very much helpful. In addition, I point out that any empty string should not be encoded into empty nodes in YAML 1.1 document in any sense because it will not be compatible with YAML 1.2. In YAML 1.2 spec, an empty node represents a null object. So, anyways, all the YAML 1.1 serializers should be patched to encode all the empty strings in quoted styles "" or ''. BTW, I do not think a YAML 1.1 library will accept "x" as a valid YAML stream if the library strictly implements the specification. In YAML 1.1, a plain scalar node must be followed by s-l-comments that contains at least one line break. So, "x\n" is a valid document but "x" is not. This is the same problem pointed out for YAML 1.2 spec, first by Joshua Choi, http://sourceforge.net/mailarchive/forum.php?thread_name=200e06280904252219i1da33755h200bdac14058ba36%40mail.gmail.com&forum_name=yaml-core and then by me. http://sourceforge.net/mailarchive/forum.php?thread_name=1254356424.5043.224.camel%40nero&forum_name=yaml-core Probably, most of the real implementations ignore the spec to be compatible with hand-written documents possibly without a line break at the end. But, at least, a YAML serializer should not output a YAML document without a line break at the end. Best, Osamu Takeuchi |
From: Ingy d. N. <in...@in...> - 2009-11-21 06:38:58
|
On Fri, Nov 20, 2009 at 8:22 PM, Osamu TAKEUCHI <os...@bi...> wrote: > Bill and Ingy, > > Thanks Ingy. I finally understood the importance of providing a way to > serialize zero objects by reading your post. Before that, I was thinking as > the same as Bill. > > Then, according to the spec, a YAML serializer can encode an empty string > (in YAML 1.1 document) or a null object (in YAML 1.2 document) as an empty > node in all the cases except for the one where the whole YAML stream becomes > empty. > I think an explicit warning should be given in the spec to clarify this > point with some explanation of the > importance of providing a way to serialize zero objects. > It will be very much helpful. > > In addition, I point out that any empty string should not > be encoded into empty nodes in YAML 1.1 document in any sense because it > will not be compatible with YAML 1.2. In YAML 1.2 spec, an empty node > represents a null object. > To put a finer point on it, an empty node parses as a plain, (unquoted) empty string, scalar event. YAML loaders are encouraged (but not required) to construct a null object from this event. To be honest, I was not aware that this changed between 1.1 and 1.2. Can you point me to the appropriate part of the spec that says this? --Ingy > So, anyways, all the YAML 1.1 serializers should be patched to encode all > the empty strings in quoted styles "" or ''. > > > BTW, I do not think a YAML 1.1 library will accept "x" as a valid YAML > stream if the library strictly implements the specification. In YAML 1.1, a > plain scalar node must be followed by s-l-comments that contains at least > one line break. So, "x\n" is a valid document but "x" is not. > > This is the same problem pointed out for YAML 1.2 spec, first by Joshua > Choi, > > > http://sourceforge.net/mailarchive/forum.php?thread_name=200e06280904252219i1da33755h200bdac14058ba36%40mail.gmail.com&forum_name=yaml-core > > and then by me. > > > http://sourceforge.net/mailarchive/forum.php?thread_name=1254356424.5043.224.camel%40nero&forum_name=yaml-core > > Probably, most of the real implementations ignore the spec to be compatible > with hand-written documents possibly without a line break at the end. But, > at least, a YAML serializer should not output a YAML document without a line > break at the end. > > Best, > Osamu Takeuchi > > |
From: Osamu T. <os...@bi...> - 2009-11-21 09:36:08
|
Ingy, > In addition, I point out that any empty string should not > be encoded into empty nodes in YAML 1.1 document in any sense > because it will not be compatible with YAML 1.2. In YAML 1.2 spec, > an empty node represents a null object. > > > To put a finer point on it, an empty node parses as a plain, (unquoted) > empty string, scalar event. YAML loaders are encouraged (but not > required) to construct a null object from this event. > > To be honest, I was not aware that this changed between 1.1 and 1.2. Can > you point me to the appropriate part of the spec that says this? According to the spec, this change was done for JSON compatibility. A YAML 1.2 processor seems to be required to construct a null object from an empty node. In this point, YAML 1.2 is completely incompatible from YAML 1.1. Please compare example 7.3 in YAML 1.2 spec with example 8.13 in YAML 1.1 spec. *** YAML 1.2 spec Status of this Document http://www.yaml.org/spec/1.2/spec.html >>> this is a minor revision. >>> ... >>> >>> We have removed unique implicit typing rules and have updated >>> these rules to align them with JSON's productions. 7.2. Empty Nodes http://www.yaml.org/spec/1.2/spec.html#id2786563 >>> Nodes with empty content are interpreted as if they were plain >>> scalars with an empty value. Such nodes are commonly resolved >>> to a "null" value. Example 7.3. Completely Empty Flow Nodes >>> { >>> ? foo :, >>> : bar, >>> } >>> %YAML 1.2 >>> --- >>> !!map { >>> ? !!str "foo" : !!null "", >>> ? !!null "" : !!str "bar", >>> } 10.2.1.1. Null http://www.yaml.org/spec/1.2/spec.html#id2803362 >>> Note that a null is different from an empty string. *** YAML 1.1 spec 8.5.1. Flow Nodes http://yaml.org/spec/1.1/#id902924 >>> A node with empty content is considered to be an empty plain scalar. Example 8.13. Completely Empty Flow Nodes >>> { >>> ? foo :, >>> ? : bar, >>> } >>> %YAML 1.1 >>> --- >>> !!map { >>> ? !!str "foo" >>> : !!str "", >>> ? !!str "", >>> : !!str "bar", >>> } Both the spec says an empty node is considered to be an empty plain scalar. This seems to be the excuse how the spec describe YAML 1.2 is almost always compatible to YAML 1.1. However, an empty plain scalar is considered as an empty string in YAML 1.1 but as a null object in YAML 1.2. Actually, in the type repository for _YAML1.1_ also specifies an empty plain scalar value to be resolved as !!null type. It seems incompatible to the YAML 1.1 spec. I imagine this is due to some historical reason. *** Null Language-Independent Type for YAML Version 1.1 http://yaml.org/type/null.html >>> Shorthand: >>> >>> !!null >>> >>> Regexp: >>> >>> ~ # (canonical) >>> |null|Null|NULL # (English) >>> | # (Empty) This point was more clearly stated in this mailing list. http://sourceforge.net/mailarchive/forum.php?thread_name=1248226963.15646.1326179237%40webmail.messagingengine.com&forum_name=yaml-core I think it's better to point out this incompatibility more clearly in the YAML 1.2 spec as well. Best, Osamu Takeuchi |
From: Osamu T. <os...@bi...> - 2009-11-26 03:05:02
|
Hi Kirill, >> BTW, I do not think a YAML 1.1 library will accept "x" as a valid YAML >> stream if the library strictly implements the specification. In YAML >> 1.1, a plain scalar node must be followed by s-l-comments that contains >> at least one line break. So, "x\n" is a valid document but "x" is not. > > I don't think any pre-YAML 1.2 processors uses the grammar described in > the spec as the basis for the parser. I read that Oren modified the spec in YAML 1.2 to accept an input without a line break at the end of the stream merely for JSON compatibility. I thought that YAML 1.1 spec refused such an input not accidentally but intentionally. http://sourceforge.net/mailarchive/forum.php?thread_name=200e06280904252219i1da33755h200bdac14058ba36%40mail.gmail.com&forum_name=yaml-core Anyway, as I wrote, I don't think so, too, except for the reference parser. ;p >> According to the spec, this change was done for JSON compatibility. >> A YAML 1.2 processor seems to be required to construct a null object >> from an empty node. >> >> In this point, YAML 1.2 is completely incompatible from YAML 1.1. >> Please compare example 7.3 in YAML 1.2 spec with example 8.13 in YAML >> 1.1 spec. >> > > I don't think it's the correct interpretation of the spec. I'd say that > YAML 1.2 provide a recommended scheme for tag resolution, while YAML 1.1 > doesn't, leaving the decision of choosing the default scheme to the > processor authors. I believe all existing YAML producers interpret an > empty plain scalar as a null value unless instructed otherwise. Fmm, I didn't know that. I don't understand how we can interprete YAML 1.1 spec as you wrote with provided example 8.13 and 8.15, but if only I misinterpreted the spec, I'm happier to hear that. I was affraid to have the possible incompatibility between YAML 1.1 and 1.2. Best, Osamu Takeuchi |
From: William S. <sp...@rh...> - 2009-12-01 21:47:18
|
Note that my original comments are still true. The parser does not care if there is a newline at the end and "fails" even if the newline is there: libyaml if told to write only a scaler of "" will produce a file consisting of a single newline. If told to read this it will return nothing. Removing the newline (and also adding any number of extra newlines) does not make a difference. Recommended fix is to add logic so that if the very last thing in the file is a "" scalar that libyaml somehow modifies the output, such as by forcing the --- trailing text. I absolutly agree with the trailing newline being optional! Osamu TAKEUCHI wrote: > I read that Oren modified the spec in YAML 1.2 to accept an input without > a line break at the end of the stream merely for JSON compatibility. > I thought that YAML 1.1 spec refused such an input not accidentally but > intentionally. > > http://sourceforge.net/mailarchive/forum.php?thread_name=200e06280904252219i1da33755h200bdac14058ba36%40mail.gmail.com&forum_name=yaml-core > > Anyway, as I wrote, I don't think so, too, except for the reference > parser. ;p > > >>> According to the spec, this change was done for JSON compatibility. >>> A YAML 1.2 processor seems to be required to construct a null object >>> from an empty node. >>> >>> In this point, YAML 1.2 is completely incompatible from YAML 1.1. >>> Please compare example 7.3 in YAML 1.2 spec with example 8.13 in YAML >>> 1.1 spec. >>> >> I don't think it's the correct interpretation of the spec. I'd say that >> YAML 1.2 provide a recommended scheme for tag resolution, while YAML 1.1 >> doesn't, leaving the decision of choosing the default scheme to the >> processor authors. I believe all existing YAML producers interpret an >> empty plain scalar as a null value unless instructed otherwise. > > Fmm, I didn't know that. > > I don't understand how we can interprete YAML 1.1 spec as you wrote > with provided example 8.13 and 8.15, but if only I misinterpreted > the spec, I'm happier to hear that. I was affraid to have the > possible incompatibility between YAML 1.1 and 1.2. > > Best, > Osamu Takeuchi > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Kirill S. <xi...@ga...> - 2009-11-21 17:34:07
|
Hi Osamu, > Then, according to the spec, a YAML serializer can encode > an empty string (in YAML 1.1 document) or a null object (in > YAML 1.2 document) as an empty node in all the cases except > for the one where the whole YAML stream becomes empty. > I think an explicit warning should be given in the spec > to clarify this point with some explanation of the > importance of providing a way to serialize zero objects. > It will be very much helpful. I believe all existing yaml processors interpret an empty plain scalar as the null value by default. > > In addition, I point out that any empty string should not > be encoded into empty nodes in YAML 1.1 document in > any sense because it will not be compatible with YAML 1.2. > In YAML 1.2 spec, an empty node represents a null object. > So, anyways, all the YAML 1.1 serializers should be patched > to encode all the empty strings in quoted styles "" or ''. > > > BTW, I do not think a YAML 1.1 library will accept "x" as a > valid YAML stream if the library strictly implements the > specification. In YAML 1.1, a plain scalar node must be > followed by s-l-comments that contains at least one line > break. So, "x\n" is a valid document but "x" is not. I don't think any pre-YAML 1.2 processors uses the grammar described in the spec as the basis for the parser. Thanks, Kirill |
From: Ingy d. N. <in...@in...> - 2009-11-21 06:20:23
|
On Fri, Nov 20, 2009 at 7:01 PM, William Spitzak <sp...@rh...> wrote: > Ok, sounds reasonable. > > In that case the bug is that libyaml output when told to write a single "" > scalar, does not write the correct thing, instead writing text that looks > the same as an empty document. > That sounds like a bug. A YAML processor should not output invalid YAML (or in this case, valid YAML that means the wrong thing). > It is true that if you ask for document-start to be written it fixes it but > there does not seem to be a requirement for that, and in fact I would prefer > not to as the documents I am writing are so short that the extra 8 bytes are > a significant fraction. > By 8, you mean 4 ("---\n") + 4 ("...\n")? If you have more than one document in a stream, then you need "---\n" anyway for all but the first document. You really don't need "...\n" at all, unless timing is an issue. If you are writing single document streams to disk, then certainly you are consuming more space for the inode and file name etc. I don't know what your exact use case is though. The shortest YAML serialization for an empty string is 2 single quotes, and libyaml should support that. (Actually it probably adds a newline as well which is a sane thing for a general purpose library.) --Ingy It has always been the intent of YAML, that a Stream may contain zero or >> more Documents. This implies that there needs to be a way to serialize zero >> objects. >> >> BTW, a stream containing just comments is another example of something >> that will parse as having zero documents. >> >> It may not be written like this in the spec, but one way to think of it is >> as follows: >> >> 1. Every YAML Document in a YAML Stream begins with '---'. >> 2. If the first thing a parser sees in a stream, after skipping all >> >> ignorable whitespace (including comments), is not '---', nor a >> YAML directive, the parser should assume it saw "---\n". >> 3. If, after skipping all ignorable whitespace, the parser reaches >> >> the End Of Stream, it should report an STREAM_END event, without >> reporting any documents. >> >> Being able to serialize zero documents is important. For a real use case, >> imagine a service listening for YAML documents over a web socket. If the >> socket is closed after receiving no data, the service should *not* parse >> that as one document containing an empty string. >> >> After testing all these cases against libyaml, IMHO it gets things right >> every time. >> >> Cheers, Ingy >> >> Thanks >> Bill Spitzak >> Rhythm & Hues software department. >> >> >> ------------------------------------------------------------------------------ >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 >> 30-Day >> trial. Simplify your report design, integration and deployment - and >> focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Yaml-core mailing list >> Yam...@li... <mailto: >> Yam...@li...> >> >> https://lists.sourceforge.net/lists/listinfo/yaml-core >> >> >> |
From: Kirill S. <xi...@ga...> - 2009-11-21 17:41:33
|
Hi Osamu, > > According to the spec, this change was done for JSON compatibility. > A YAML 1.2 processor seems to be required to construct a null object > from an empty node. > > In this point, YAML 1.2 is completely incompatible from YAML 1.1. > Please compare example 7.3 in YAML 1.2 spec with example 8.13 in > YAML 1.1 spec. > I don't think it's the correct interpretation of the spec. I'd say that YAML 1.2 provide a recommended scheme for tag resolution, while YAML 1.1 doesn't, leaving the decision of choosing the default scheme to the processor authors. I believe all existing YAML producers interpret an empty plain scalar as a null value unless instructed otherwise. Thanks, Kirill |
From: Kirill S. <xi...@ga...> - 2009-11-21 17:34:07
|
William Spitzak wrote: > libyaml treats a file containing nothing but zero or more newlines as > being completely empty and parses nothing out of it. However this > appears to be inconsistent: > > The text "---\nx\n..." and "x" produce the same result: a start, "x" > scalar, and an end token. > > But the text "---\n\n..." and "" produce different results. The first > produces start, a "" scalar, and an end. The second produces nothing. > > In addition writing a single "" scalar using libyaml without the > document start/end produces a file containing only a newline and thus is > not inverted by the parser. > > This may seem trivial but I am using yaml internally to store serialized > values of small pieces of data, and blank strings were breaking because > of this. I had to modify libyaml to quote zero-length strings but I do > not think that is the desired solution. You could ask libyaml to always produce the '---' indicator by setting the flag 'implicit' to 1 when calling 'yaml_document_start_event_initialize()'. You could also ask it to produce a quoted string by setting the parameter 'style' to 'YAML_SINGLE_QUOTED_SCALAR_STYLE' or 'YAML_DOUBLE_QUOTED_SCALAR_STYLE' when calling 'yaml_scalar_event_initialize()'. Thanks, Kirill |
From: Ingy d. N. <in...@in...> - 2009-11-22 21:45:18
|
On Sat, Nov 21, 2009 at 10:30 AM, Bill Spitzak <sp...@gm...> wrote: > Okay I agree that a blank should be a "null" and not a zero-length scalar. > > It does appear that my solution for scalars (which is to force the quoting > on for a zero-length one) matches the proposed solution. > > However it still means that a blank file is different than one that has a > "---" in it. A blank file is nothing, but a single "---" is a single null. > So I think the problem still exists: libyaml on output, if told to write > nothing other than a null, will have to write it somehow other than a blank > file. > I agree with Bill. No matter how the libyaml options are set, libyaml (or any other implementation for that matter) should not produce an incorrect YAML representation. So it's a bug in libyaml. Kirill, I'll try to patch libyaml and pyyaml. Cheers, Ingy |