From: Oren Ben-K. <or...@be...> - 2008-04-06 11:22:28
|
Is done and has been uploaded! It has cost me the spring vacation but it has been three years of "not getting around to do it", so... Clark will need to install it in the http://yaml.org/spec/cvs directory; currently it is available in http://ben-kiki.org/oren/YamlReference/spec/current{.html,.ps,.pdf}. It is accompanied by version 0.9 of the YamlReference implementation, available at http://ben-kiki.org/oren/YamlReference and of course the http://hackage.haskell.org/cgi-bin/hackage-scripts/package/YamlReference package database entry. It is also running the current http://dev.yaml.org/ypaste version so feel free to play with it. I have tried to incorporate all the spec fixes people have sent to the list in the last three years. Kirill's site was also very helpful (thanks Kirill!). I have lost track at some point of all the people and corrections (there were a lot of e-mails, on and off the list). My apologies to everyone for me taking so long to do this, and my thanks for your effort in reviewing the old spec! The new spec uses the improved YamlReference productions, which are somewhat simpler than the old set. They also have the advantage of being testable using YamlReference and ypaste, so I am much more confident they are "correct". I have also revamped the structure (and to some degree the wording) of the syntax chapters. I believe the result is (at least somewhat) less opaque than the previous spec. That said, I found Tim Hochberg observation to be very apt: > YAML has always seemed precariously balanced between user friendliness > and chaos. I think this *perfectly* expresses YAML and the inherent complexity of the spec. We really should work this sentence into the spec somewhere :-) So, whether you are a YAML library developer, casual user, or a would-be language lawyer - please give this a look, try stuff out in ypaste, and post your comments to this list! My intention is to do a second pass of corrections by the end of the month and assuming nothing major comes up, finalize the result as "the" YAML 1.1 spec. We could then focus our efforts in bringing the implementations to conformance, the type repository (which also needs a lot of work), and so on. This is all subject to Clark's OK of course. There are several categories of comments people can make, and it would be helpful if they would clarify which one they have in mind: Omission: The spec does not cover some edge case. I doubt we'll see any of those. Then again these are the hardest issues to find... so most cookies for these ones :-) Language: There is a better way to express some point (or I made a typo). English isn't my native language and I am not a professional writer so I have no illusions about the quality of my writing :-) If you have a better idea on how to convey something, I'll be more than happy to incorporate it. Bug: The spec contradicts itself. I hope I have eliminated all(most) of these. Note the productions are technically ambiguous. The syntax "works" by relying on productions being "greedy". This is still a valid formal definition... just barely so :-) Error: The spec describes behavior which is different than the consensus we have reached on this list. The best way to demonstrate this is using ypaste - executable formal specifications are fun! Feature: The spec describes behavior which is different than the consensus we "should" have reached on this list :-) That is, a change request. In general these will be rejected unless they have a very good use case they resolve, have low implementation cost, do not harm existing documents, have negligible downside, and are in general considered to be a good idea by almost everyone. We had 3 years to collect such ideas after all... Here is a list of the "Features" that were added to this spec (compared to the 3 year old version): - JSON compatibility. {"a":12} is a mapping and http://example.com is a plain scalar. This is the main new feature in this spec. - Tag resolution. Applying a ! tag to any node "disables" the "implicit typing". This works for plain scalars, as well as for all mappings and sequences (so ! { x: 1, y: 2 } is a vanilla mapping/hash and not a "Point"). Quoted/block scalars are already "disabled" (are interpreted as strings). Note this is all by convention and under complete application control. - Anchor names are restricted not to contain , [ ] { } characters to avoid confusion inside flow collections. That is, [&a,1] is the same as [ &a , 1 ] as people expect. - The stream format is "sloppy", allowing for end markers to be repeated etc., for easy appending and concatenation of streams. Here are examples of features that we already discussed ad nauseum and did not make the cut. These are huge potential black holes of conversations, so please let us not re-open them: - http://example.com is a plain scalar and not a key: value pair. So {a:12} is a set with one entry 'a:12'. Write {a: 12} or {"a":12}. - Plain scalars allow , [ ] { } in block collections but disallow them in flow collections (and keys) to avoid ambiguity. - Types other than string, mapping and sequence are out of scope of the spec. Yes, the repository is woefully out of date. Lets get the spec out the door and then fix the repository. - Defining a canonical format is out of scope. The new spec does make the suggestion we define a YSON format that can serve this purpose (and also be a stepping stone between JSON and YAML). Again, something we can devote effort for after we nail the spec. Here is a sample feature idea that we did *not* discuss and we just might be able to squeeze in at the last second. If enough people like it. Maybe. Comments about it are of course welcome. --- Title: Restricted Tag Character Set Description: Disallow , [ ] { } in tags (similar to disallowing them in anchors which we already do) to minimize the potential confusion when tags are used in flow collections. Note that if such characters must be in a tag (for some obscure reason), it is possible to escape them using %xx. Use case: Writing { !foo,[] } is interpreted today as { !<!foo,[]> "" }. Under the above proposal it would be { !<!foo> "", [ ] } which is what people expect. Implementation effort: This can be viewed as the natural extension of the current rule that forces '!' to be escaped as %21 in certain cases (see example 6.26 in the spec). The proposal would make ns-tag-char to be more restrictive (will also exclude the , [ ] { } from it) and then use ns-tag-char instead of ns-uri-char everywhere (the TAG directive could still use the full set). If anything, this simplifies the implementation (and the spec). Effect on current documents: AFAIK the language-specific tags generated by Ruby/Perl/etc. do not use ! , { } [ ] in their tags. No sane human would use them either. So it seems that there would be no effect. Downsides: No obvious ones so far. ... Thanks again to everyone for investing all the effort in bringing the spec to where it is today! "Once more into the breach" Oren Ben-Kiki |
From: Brad B. <bm...@ma...> - 2008-04-06 17:50:22
|
Hello, My comments below fall under this heading: "Language: There is a better way to express some point (or I made a typo). English isn't my native language and I am not a professional writer so I have no illusions about the quality of my writing :-) If you have a better idea on how to convey something, I'll be more than happy to incorporate it." But I would first like to say that the entire document is very easy to read. You say that English isn't your native language, but it doesn't show. I consider most of my comments to be minor and trivial. Minor in that some may disagree and want the text to stay as is, and trivial in that I felt that I followed your meaning in every case. Hopefully, I've rendered this well for email; here goes ... 1.2. Prior Art ... The syntax of YAML was motivated by Internet Mail (RFC0822) and remains partially compatible with that standard. Further, borrowing from MIME (RFC2045), YAML's top-level production is a stream of independent documents; ideal for message-based distributed processing systems. My English teacher would have considered the sentence above incorrect. A semi-colon may separate independent clauses but the clause beginning with "ideal" is not independent. Perhaps just use a comma instead? ... YAML was designed to support incremental interfaces that include both input ("getNextEvent()") and output "sendNextEvent()") one-pass interfaces. Together, these enable YAML to support the processing of large documents (e.g. transaction logs) or continuous streams (e.g. feeds from a production machine). Missing open parenthesis after "output". 1.4. Relation to JSON ... YAML can therefore be viewed as a natural superset of JSON, offering improved human readability and a more complete information model. This is also the case in practice; every JSON file is also a valid YAML file. This makes it easy to migrate from JSON to YAML if/when the additional features are requitred. Typo: requitred 1.5. Terminology ... May The word may, or the adjective optional, mean that conforming YAML processors are permitted, but need not behave as described. I think this sentence could be worded better. Perhaps: The word may, or the adjective optional, mean that conforming YAML processors are permitted to, but need not, behave as described. I'm not so sure about that either, so someone else might make a suggestion. 2.2. Structures ... Repeated nodes are first identified by an anchor (marked with the ampersand - "&"), and are then aliased (referenced with an asterisk - "*") thereafter. Even though "nodes" is hyperlinked to an explanation, it might be helpful to further clarify it in this sentence, since this is its first appearance. Perhaps: Repeated nodes (i.e., scalars, sequences, mappings) are first identified by an anchor (marked with the ampersand - "&"), and are then aliased (referenced with an asterisk - "*") thereafter. Or maybe not--it's just that on first reading, I didn't quite grok "nodes" without clicking it. 2.3. Scalars ... Example 2.14. In the plain scalar, newlines become spaces This example struck me as out of place, because it precedes rather then follows the paragraph that mentions plain style flow scalars. 3.1. Processes This section details the processes shown in the diagram above. Note a YAML processor need not provide all these processes. For example, a YAML library may provide only YAML input ability, for loading configuration files, or only output ability, for sending data to other applications. I would change "Note" to either "Note," or "Note that". There are cases where you would begin a sentence with just "Note", but this isn't one of them, IMO. 3.2.1.2. Tags ... YAML tags are used to associate meta information with each node. In particular, each tag must specify the expected node kind (scalar, sequence, or mapping). Scalar tags must also provide mechanism for converting formatted scontent to a canonical form for supporting equality testing. Change "mechanism" to "a mechanism" or "mechanisms". 3.2.2. Serialization Tree To express a YAML representation using a serial API, it necessary to impose an order on mapping keys and employ alias nodes to indicate a subsequent occurrence of a previously encountered node. Change "it necessary" to "it is necessary". 3.3.2. Resolved Tags ... Note resolution must not consider presentation details such as comments, indentation and node style. Change "Note" to "Note," or "Note that". 3.3.2. Resolved Tags ... If a document contains unresolved tags, the YAML processor is unable to compose a complete representation graph. In such a case, the YAML processor may compose an partial representation, based on each node's kind and allowing for non-specific tags. Change "an partial" to "a partial". 3.3.3. Recognized and Valid Tags ... In contrast, a YAML processor can always compose a complete representation for an unrecognized or an invalid collection, since collection equality does not depend upon knowledge of the collection's data type. However, such a complete representation can not be used to construct a native data structure. I would change "can not" to "cannot". The following site explains why better than I would be able to: http://alexfiles.com/cannot.shtml 6.2. Separation Spaces Outside indentation and scalar content, YAML uses white space characters for separation between tokens within a line. Note such white space may safely include tab characters. Change "Note" to "Note," or "Note that". Named Handles ... The name of the handle is a a presentation detail and must not be used to convey content information. In particular, the YAML processor need not preserve the handle name once parsing is completed. Change "a a" to "a". 7.4. Flow Collection Styles A flow collection may be nested within a block collection (flow-out context), nested within another flow collection (flow-in context), or be a part of an implicit key (flow-key context). Flow collection entries are separated by the "," indicator. The final "," may be omitted. This does not cause ambiguity because flow collection entries can never be completely empty. To me, this mention of the final "," seems slightly backward. First, it says that entries are *separated* by the "," indicator. This does not imply that there is a final "," indicator--rather the opposite. So I might say instead, 'A final "," may be included.' Or something like that. Example 7.17. Flow Mapping Separate Values { unquoted・:・"separate", http://foo.com, omitted:°, °:・omitted, } %YAML 1.1 --- !!map { ? !!str "unquoted" : !!str "separate", ? !!str "http://foo.com" : !!null "", ? !!str "ommitted" : !!null "", ? !!null "" : !!str "ommitted", } Change multiple "ommitted"s to "omitted". Example 7.21. Single Pair Implicit Entries - [ YAML・: entry ] - [ °: empty key entry ] - [ {JSON: like}:adjacent ] %YAML 1.1 --- !!seq [ !!seq [ !!map { ? !!str "YAML" : !!str "entry" }, ], !!seq [ !!map { ? !!null "" : !!str "empty key entry" }, ], !!seq [ !!map { ? !!map { ? !!str "JSON" : !!str "like" } : "entry", }, ], ] The word "adjacent" became "entry" after JSON like. Example 7.23. Flow Content - [ a, b ] - { a: b } - "a" - 'b' - c %YAML 1.1 --- !!seq [ !!seq [ !!str "a", !!str "b" ], !!str "a", !!str "b", !!str "c", ] Where did "- { a: b }" go? Example 7.24. Flow Nodes - !!str "a" - 'b' - &anchor "c" - *anchor - a - !!str b - !!str° %YAML 1.1 --- !!seq [ !!str "a", !!str "b", &A !!str "c", *A, !!str "b", !!str "", ] Where did "- a" go? Example 8.14. Block Sequence block sequence: - one↓ - two : three↓ %YAML 1.1 --- !!map { ? !!str "block" : !!seq [ !!str "one", !!map { ? !!str "two" : !!str "three" }, ], } The key "block sequence" became just "block". Example 8.16. Block Mappings block mapping: ・key: value↓ %YAML 1.1 --- !!map { ? !!str "block" : !!map { ? !!str "key" : !!str "value", }, } The key "block mapping" became just "block". Example 8.18. Implicit Block Mapping Entries plain key: inline value °:° # both empty "quoted key:: - entry %YAML 1.1 --- !!map { ? !!str "plain key" : !!str "inline value", ? !!null "" : !!null "", ? !!str "quoted key\n" : !!seq [ !!str "entry" ], } That first "quoted key" part doesn't look right (unless I'm just not getting it). Example 8.21. Block Scalar Nodes literal: |2 ・・value folded:↓ ・・・!foo ・・>1 ・value %YAML 1.1 --- !!map { ? !!str "literal" : !!str "value", ? !<!foo> "folded" : !!str "value", } This may be my misunderstanding, but doesn't !foo apply to "value", not to "folded"? Since people perceive the "-" indicator as indentation, nested block sequences may be indented by one less space to compensate - except if nested inside another block sequence, of course (block-out context vs. block-in context). I think that might read better as, "except, of course, if nested inside another block sequence (block-out context vs. block-in context)." 9.1. Document Markers A documents may be prefixed by a byte order mark and optional comments. The document end marker may also be followed by comments. Change "A documents" to "A document". Regards, -- Brad |
From: Brad B. <bm...@ma...> - 2008-04-07 18:28:49
|
Hello all, I don't know if these qualify as omissions, bugs, or just misunderstandings on my part. I was attempting to track down why ypaste (terrific tool) was not accepting this document: --- -1 ... The spec says, 7.3.3. Plain Style ... Plain scalars must not begin with most indicators, as this would cause ambiguity with other YAML constructs. However, the ":", "?" and "-" indicators may be used as the first character if followed by a non-space character, as this causes no ambiguity. ... So I expected the above document to be okay. I started to experiment with the three stated indicators. The streams below demonstrate what I've found so far. This first stream shows that the colon indicator ':' seems to work as the spec states, i.e., it may be the first character of a plain scalar if the following character is not whitespace. --- # scalar value ':v' - :v --- # scalar value ':v' k: :v --- # scalar key ':k' :k: --- # scalar value ':s' :s ... The next two streams show that the hyphen '-' and question mark '?' indicators work as the spec states unless (apparently) they appear as the first non-whitespace character, um, somewhere other than where they do work--it's not quite clear to me where that is. (Obviously, below they are the first character on the line, but the next example demonstrates another location.) --- # scalar value '-v' - -v --- # scalar value '-v' k: -v --- # fails: expected scalar key '-k' -k: --- # fails: expected scalar value '-s' -s ... --- # scalar value '?v' - ?v --- # scalar value '?v' k: ?v --- # fails: expected scalar key '?k' ?k: --- # fails: expected scalar value '?s' ?s ... The next stream shows another location where similarly ':' is accepted as the first character, but '-' and '?' are not. --- # simple sample, i.e., [{key: value}] - key: value --- # scalar key ':key', i.e., [{':key': value}] - :key: value --- # fails: expected scalar key '-key' - -key: value --- # fails: expected scalar key '?key' - ?key: value ... Regards, -- Brad |
From: Oren Ben-K. <or...@be...> - 2008-04-09 16:58:50
|
Hmmm. Seems like a bug somewhere. I'll look into it. Nice catch! Thanks for doing my spec review for me :-) Oren. On Mon, 2008-04-07 at 14:21 -0400, Brad Baxter wrote: > Hello all, > > I don't know if these qualify as omissions, bugs, or just > misunderstandings > on my part. I was attempting to track down why ypaste (terrific tool) > was > not accepting this document: > > --- > -1 > ... > > The spec says, > > 7.3.3. Plain Style > ... > Plain scalars must not begin with most indicators, as this would cause > ambiguity with other YAML constructs. However, the ":", "?" and "-" > indicators may be used as the first character if followed by a > non-space character, as this causes no ambiguity. > ... > > So I expected the above document to be okay. I started to experiment > with > the three stated indicators. The streams below demonstrate what I've > found > so far. > > This first stream shows that the colon indicator ':' seems to work as > the spec > states, i.e., it may be the first character of a plain scalar if the > following > character is not whitespace. > > --- > # scalar value ':v' > - :v > --- > # scalar value ':v' > k: :v > --- > # scalar key ':k' > :k: > --- > # scalar value ':s' > :s > ... > > The next two streams show that the hyphen '-' and question mark '?' > indicators work as the spec states unless (apparently) they appear as > the first non-whitespace character, um, somewhere other than where > they do work--it's not quite clear to me where that is. (Obviously, > below they are the first character on the line, but the next example > demonstrates another location.) > > --- > # scalar value '-v' > - -v > --- > # scalar value '-v' > k: -v > --- > # fails: expected scalar key '-k' > -k: > --- > # fails: expected scalar value '-s' > -s > ... > > --- > # scalar value '?v' > - ?v > --- > # scalar value '?v' > k: ?v > --- > # fails: expected scalar key '?k' > ?k: > --- > # fails: expected scalar value '?s' > ?s > ... > > The next stream shows another location where similarly ':' is accepted > as > the first character, but '-' and '?' are not. > > --- > # simple sample, i.e., [{key: value}] > - key: value > --- > # scalar key ':key', i.e., [{':key': value}] > - :key: value > --- > # fails: expected scalar key '-key' > - -key: value > --- > # fails: expected scalar key '?key' > - ?key: value > ... > > > Regards, > > -- Brad |
From: Oren Ben-K. <or...@be...> - 2008-04-09 22:23:42
|
The "-1" problem was due to the "plain scalar vs. anything else" ambiguity. The short of it is, the c-l-block-seq-entry production should only match the "-" if it is not followed by a non-space character. Makes perfect sense, otherwise the - obviously starts a plain scalar... I added this negative lookahead annotation to the productions running ypaste, it seemed to have solved the problem. Of course, if the parser did any backtracking, it would have been able to resolve this itself, but I don't have much faith in backtracking parsers :-) "Precariously balanced between user friendliness and chaos" indeed :-) I'll upload a new draft (and YamlReference) on the weekend with all the fixes so far, including this one. Nice catch Brad! In other news - after discussing the issue on the IRC channel, Clark, Kirill and myself agreed it is reasonable to restrict tags properties to avoid using the , [ ] { } and characters. These would still be allowed in verbatim tags (i.e. !<foo,[]{}> bar) and in the tag directive (e.g. % TAG !! tag:yaml.org,2002: uses the ","). Clark also wants to ban the ' character there - the use case is parsing something like [!foo:'bar']. Currently this would parse as [!<! foo:'bar'> ""]. If ' was invalid in tag properties, it would have been parsed as [!<!foo:> "bar"]. I'm not certain this is an improvement... An alternative is to enhance the JSON compatibility rules to state that :" :' :, :{ :} :[ :] all indicate a key/value pair, as in a:"b" and c:[1,2]. If this were the case then banning ' from tag properties would make sense and [!foo:'bar'] would be parsed as [{!<!foo> '': 'bar'}] as expected. This seems to me to be the less surprising thing to do, however it means a bit more of a change. Opinions? Oren. |
From: Brad B. <bm...@ma...> - 2008-04-07 19:47:32
|
Oren, I have to think that this is a bug in ypaste: --- - 1 - 0 - "0" - "00" ... 0 and "0" are treated very oddly. ---↵↓ -·1↵↓ -·°°↵↓ -·"°"↵↓ -·"00"↵↓ ...↓ ↓ (I don't know if those special characters will travel through email.) -- Brad |
From: Oren Ben-K. <or...@be...> - 2008-04-09 16:57:48
|
Yes, a bug. Fixed. Thanks! Oren. On Mon, 2008-04-07 at 15:47 -0400, Brad Baxter wrote: > Oren, > > I have to think that this is a bug in ypaste: > > --- > - 1 > - 0 > - "0" > - "00" > ... > > 0 and "0" are treated very oddly. > > ---↵↓ > -·1↵↓ > -·°°↵↓ > -·"°"↵↓ > -·"00"↵↓ > ...↓ > ↓ > > (I don't know if those special characters will travel through email.) > > -- Brad |
From: Brad B. <bm...@ma...> - 2008-04-11 03:20:01
|
Bug? --- plain scalar value # this is not a comment # this is not a comment --- - plain scalar value # this is not a comment # this is not a comment (indented) --- - plain scalar value # this is not a comment # this is a comment --- key: plain scalar value # this is not a comment # this is not a comment (indented) --- key: plain scalar value # this is not a comment # this is a comment ... The specs say, "Plain scalars must never contain the ": " and " #" character combinations. Such combinations would cause ambiguity with mapping key: value pairs and comments." So I expected ypaste to give me an error for all of the plain scalar values above that contain "#". Instead, it accepts the data as part of the scalar value in all the places I would expect it to if there were no prohibition on "#". Maybe "#" is acceptable now and the wording of the specs just needs updating? Regards, -- Brad |
From: Oren Ben-K. <or...@be...> - 2008-04-11 21:07:39
|
On Thu, 2008-04-10 at 23:20 -0400, Brad Baxter wrote: > Bug? > > --- > plain scalar value # this is not a comment Yes, there was a bug in the implementation of the lookbehind. The productions and spec are correct. Nice catch! Oren. |