From: Oren Ben-K. <or...@ri...> - 2001-08-05 07:51:50
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | File/value starts with '@' or ':' -> list. > | File/value starts with '=' or '|' -> scalar. > | File value starts with '%' or anything else -> map. > > Nice, but "=" should not be the indicator for a > scalar (since it represents the default value). > Even $ is better, although I like ^ the best. Hmmm. I don't want to make $ an indicator, to allow "price: $12". As for "^", I don't like it somehow. How about "`"? key1: `% this simple scalar starts with an indicator. key2: ` This simple scalar starts at the next line. It has no leading or trailing white space. key3: This is: a map key4: : This is a list entry. Brian's implicit indicators scheme has the advantage that it removes many indicators from the YAML file: most maps and list won't need any. But it has a downside in that multi-line simple scalars will have to add an indicator. OK, there are much less multi-line simple scalars then there are maps and lists, I guess. But I think we should strive to minimize the visual impact of these. "`" has much less "polluting visual presence" then "^", "$" or "=". And it is a kind of quote, after all... Taking this to its extreme (as I've been known to do :-), here's a much less modest proposal. Let's use " for quoted strings, ' for simple scalars, ` for blocks. In all cases the full form uses the quote as a wrapper (`this is a block`). However, the start and/or end quote may be omitted if this doesn't lead to an ambiguity (the default style is '). For example, if a " value happens to start or end with a ", then the leading/trailing " becomes mandatory. If a block doesn't contain the final newline, the trailing ` becomes mandatory, etc. Examples: : " This is a string with escapes (what we call a quoted string). Lines are folded. It can use any printable but escapes such as \n are expanded. Using " is consistent with Perl. : ' This is a string without escapes (what we call a simple scalar). Lines are folded. It can use any printable character since escapes such as \n are not expanded. Using ' is consistent with Perl. : ` This is an "as if" string (what we call a block). It can use any character including newlines since no line folding is done and escapes are not expanded. Since ` is the "odd bird" in the quotes family, its use is at least not inconsistent with anything. : ` This block doesn't have a trailing newline.` : ` This block does have it ` ` : This is implicitly single-quote : 'This is explicitly single quote : 'Likewise' : 'This one ends with a '' : "This is explicitly double quote : "Likewise" : `This is legal for orthogonality` Pros: - We use the three quote types for the three scalar types (I'd suggest we can them 'block', 'folded' and 'escaped', in an increasing level of processing). - We avoid the need of needing to escape quotes - we only have to quote unprintable characters. - The choice of quote types is intuitive to the Perl-aware. - It looks better visually (IMVHO). - There is no need for a special format for a block without trailing newline format. - The scalar types become more consistent with each other (specifying them would be fun - each could be defined in terms of the next-simpler format). - Using all types in a key becomes "inevitable": this is an implicit simple key: ... 'this an explicit simple key' : ... "this is an escaped key" : ... `this is a block key` : ... I rather like this combination. Thoughts? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-08-05 10:48:38
|
Clark C . Evans [mailto:cc...@cl...] wrote: > I've given the implicit indicators some thought > and I must say I don't like it beacuse it makes > YAML more complicated. Having indicators > gives us N rules to memorise. Having all > of these indicators optional under particular > circumstances doubles the number of rules. Good point. Of course, we could make these rules pretty simple... > After a short period of time, your eyes > glance over the % and @ without problem. Well, when I showed a piece of YAML code with these indicators, people were baffled. Without these indicators they got the meaning immediately. So I can't say with any confidence they increase readability. > I'd rather keep YAML simple and thus more > readable. No short hands! Simple != readable. Do you really believe that: delivery: % !: date &: 17 =: 2001-02-03 Is more readable than: delivery: %(!date &17) 2001-02-03 S-expressions are simple. RPN (Reverse Polish Notation) is simple. Neither is human-readable. Humans have a strange perception mechanism that makes Perl preferable to Lisp. Many languages (especially the functional and logical ones) ignored this to their peril. As Einstein said "make it as simple as possible but not any simpler". Regardless of shorthands and implicit indicators, what do you think of using " ' ` for the three scalar types? It is an independent issue... Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-08-05 15:25:28
|
On Sun, Aug 05, 2001 at 01:49:26PM +0200, Oren Ben-Kiki wrote: | What do you think of using " ' ` for the three scalar types? I think they are confusingly similar. "|" was good for a block indicator. Here is a counter proposal: 1. Map, List and Simple(one line) scalars are unadorned. 2. A LineFolded, Block, and Quoted multi-line scalars are explicit via, "\", "|", and quotes respectively. 3. Null and Reference nodes are explicit by "~" and "*" respectively. 4. The Simple (one line) scalar can be typed via auto-detection (regular expression), where types include, but are not limited [9] to: A. Integer B. Real (Floating) C. Date/Time D. Rational (Currency) 5. An optional !class can immediately precede any node. This is not a short hand and serializes the type, if any, of the given node. If the type is given, any type auto-detection (4) is superceded. 6. An optional anchor &0003 can immediately precede any node, following the optional class, if any. 7. The simple (one line) scalar may not begin with \ | " & ! : 8. The starting production is ListEntry where a top level list is assumed, MapPair where a top level map is assumed, Empty where a top level Null node is assumed, or Scalar complete with indicator. 9. Optionally, an implementation may allow for arbitrary type auto-detection/parsing for the Simple (one line) scalar. In this case, the YAML system would require a (regex,parser,printer) three tuple. The above is both readable, and can serialize a colored graph nicely. | delivery: % | !: date | &: 17 | =: 2001-02-03 | | Is more readable than: | | delivery: %(!date &17) 2001-02-03 No. However, I'm inclined to say that the information model for YAML is a colored graph and call it a day and not admit the equivalence of the above. I think each node should have an "anchor" and a class/type ("color"). We have two tradeoffs: (a) pretty serialization (b) pretty information model I think that a colored graph is a nice trade off, it is the 90% case for the information model given that most languges are typed; and if we only have two "special" attributes for each node, I'm sure we can find a nice serialization. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-08-05 15:57:26
|
title: Clark's Counter-Proposal example: \ The starting production is auto-detected as a map. This is a LineFolded multi-line scalar as indicated by the \ The next maps are Simple one-line scalars which have auto-detected types. integer: 34 float: 2.34e0 date: 2000-05-23 rational: 2.34 block: | This is a multi-line block scalar where new lines are significant. leading whitespace is significant. quoted: "The quoted scalar can span multiple lines with escaping. Note that the next quoted scalar is a string without any type detection." string: "2.34" anchored: &003 This node is anchored. reference: *003 complex: 3.3+4i note: \ The above "complex" Simple scalar will be treated as a string by the default YAML parser. Although one could register a complex type handler into the parser! Further, a base64 handler can also be called, so that the above can be translated into a binary representation during parse time without the intermediate string. base64: &004 !base64 \ R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAO fn515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmlp aWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f /++f/++f/++f/++f/++f/++f/++f/++SH+Dk1h ZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjo EwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYN G84BwwEeECcgggoBADs= nested map: what: This is a nested map. img: *004 =: \ Default value which the API *may* use when map node is treated as a scalar. It is non-magical and a feature of the YAML serial API only. nested list: : This is a nested list : \ Complete with multi-line scalars and other nicities. null: ~ zoom: \ A blank scalar, without ~ is a zero length Simple (one line) scalar. comments: !comment \ I have no clue how we could do comments, other than having it as a data type. multi: \ Per before a blank line (two carriage returns with optional indentation in the middle) in a Quoted or LineFolded become a carriage return. Thus these two paragraphs are separated by a single new line. trail: |- The block scalar without the trailing new line. "quoted key that is multi-line": ~ Thoughts: A. Need a bit of forward looking to do some of the auto-detction. But this isn't that bad since the LineFolded is explicit. B. rfc822: We could allow this type of scalar, (similar to LineFolded), but it cannot begin with a carraige return. Thus, in this case it must start on the same line. Reference Card: : A map pair separator, or the begin of a list item (I still think list items should use the minus sign). " For Quoted scalars, and quoted key values. ~ The null / None indicator | A Block scalar where new lines count |- A Block scalar less trailing new line \ A FoldedLine scalar & An anchor * A reference node ! A class/type/color indicator = A non-magical default value for a map. | Here is a counter proposal: | | 1. Map, List and Simple(one line) scalars | are unadorned. | | 2. A LineFolded, Block, and Quoted multi-line | scalars are explicit via, "\", "|", | and quotes respectively. | | 3. Null and Reference nodes are explicit | by "~" and "*" respectively. | | 4. The Simple (one line) scalar can be | typed via auto-detection (regular | expression), where types include, | but are not limited [9] to: | | A. Integer | B. Real (Floating) | C. Date/Time | D. Rational (Currency) | | 5. An optional !class can immediately | precede any node. This is not a short | hand and serializes the type, if any, | of the given node. If the type is | given, any type auto-detection (4) | is superceded. | | 6. An optional anchor &0003 can immediately | precede any node, following the optional | class, if any. | | 7. The simple (one line) scalar may not | begin with \ | " & ! : | | 8. The starting production is ListEntry | where a top level list is assumed, | MapPair where a top level map is | assumed, Empty where a top level | Null node is assumed, or Scalar | complete with indicator. | | 9. Optionally, an implementation may allow | for arbitrary type auto-detection/parsing | for the Simple (one line) scalar. In | this case, the YAML system would require | a (regex,parser,printer) three tuple. | | The above is both readable, and can serialize | a colored graph nicely. | | | delivery: % | | !: date | | &: 17 | | =: 2001-02-03 | | | | Is more readable than: | | | | delivery: %(!date &17) 2001-02-03 | | No. However, I'm inclined to say that the information | model for YAML is a colored graph and call it a day | and not admit the equivalence of the above. I think | each node should have an "anchor" and a class/type ("color"). | | We have two tradeoffs: | (a) pretty serialization | (b) pretty information model | | I think that a colored graph is a nice trade off, | it is the 90% case for the information model | given that most languges are typed; and if we | only have two "special" attributes for each | node, I'm sure we can find a nice serialization. | | Best, | | Clark | |
From: Jason D. <ja...@in...> - 2001-08-05 17:52:36
|
FWIW, I was just about to make a proposal almost identical to this. I have some comments inline. > title: Clark's Counter-Proposal > example: \ > The starting production is auto-detected > as a map. This is a LineFolded multi-line > scalar as indicated by the \ > The next maps are Simple one-line scalars > which have auto-detected types. > integer: 34 > float: 2.34e0 > date: 2000-05-23 > rational: 2.34 Are you requiring that all floats and doubles have the exponent part to distiniguish them from rationals? What about a trailing indicator similar to how C identifies floats? float: 2.34f double: 2.34d rational: 2.34r I was also going to suggest that we support characters, durations, and booleans. So strings with the value true or false would have to be quoted. Durations are of a specific form as well. See http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#duration for details. Characters could be quoted using single quotes. That would cover just about all of the major types used in most programming languages. As cool as the idea sounds of adding handlers for newer types, that would break interopability. Deserializers that don't have those same handlers installed would read those values back as strings so I think we need to limit the amount of supported auto-detected datatypes and required that all others be tagged. > block: | > This is a multi-line block scalar > where new lines are significant. > leading whitespace is > significant. Block scalars seem awkward without the | going down the left edge to me. I think that the use case of pasting in source code is going to be so rare that I wouldn't mind doing the extra work of adding the |s. Compare: foo: | int main(int argc, char** argv) { printf("Hello, YAML!"); return 0; } foo: |int main(int argc, char** argv) { | printf("Hello, YAML!"); | return 0; |} (Of course, this would look a lot better if I could figure out how to make Outlook display it in a fixed-width font! I hope you guys aren't as cursed.) > quoted: "The quoted scalar can span > multiple lines with escaping. Note > that the next quoted scalar is a string > without any type detection." > string: "2.34" > anchored: &003 This node is anchored. > reference: *003 > complex: 3.3+4i Python uses "j" for some reason. Should we follow suit? > note: \ > The above "complex" Simple scalar will > be treated as a string by the default > YAML parser. Although one could register > a complex type handler into the parser! > Further, a base64 handler can also be > called, so that the above can be translated > into a binary representation during parse > time without the intermediate string. > base64: &004 !base64 \ > R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAO > fn515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmlp > aWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f > /++f/++f/++f/++f/++f/++f/++f/++SH+Dk1h > ZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjo > EwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYN > G84BwwEeECcgggoBADs= > nested map: > what: This is a nested map. > img: *004 > =: \ > Default value which the API *may* use > when map node is treated as a scalar. > It is non-magical and a feature of > the YAML serial API only. If it's a serial API only, then it can use the first value in the map, can't it? We wouldn't have to use up = as a special key. > nested list: > : This is a nested list > : \ > Complete with multi-line scalars > and other nicities. > null: ~ > zoom: \ > A blank scalar, without ~ is a zero length > Simple (one line) scalar. > comments: !comment \ > I have no clue how we could do comments, other > than having it as a data type. > multi: \ > Per before a blank line (two carriage returns > with optional indentation in the middle) in > a Quoted or LineFolded become a carriage > return. > > Thus these two paragraphs are separated by a > single new line. > trail: |- > The block scalar without the trailing > new line. Couldn't you use \ at the end of the line to indicate that there wasn't any trailing text? The indentation would signify that the scalar value was finished. > "quoted key that > is multi-line": ~ > > Thoughts: > > A. Need a bit of forward looking to do > some of the auto-detction. But this isn't > that bad since the LineFolded is explicit. > > B. rfc822: We could allow this type of scalar, > (similar to LineFolded), but it cannot > begin with a carraige return. Thus, in > this case it must start on the same line. Allowing spaces or colons in the key would break RFC 822 compaibility. Is that really a necessary goal? Requiring that indentation be four spaces would also make it impossible to read most RFC 822 documents. Personally, I don't see why compatibility with RFC 822 is a goal but fixing the spaces really bothers me. > > Reference Card: > > : A map pair separator, or the begin of a > list item (I still think list items should > use the minus sign). > > " For Quoted scalars, and quoted key values. > > ~ The null / None indicator > > | A Block scalar where new lines count > > |- A Block scalar less trailing new line > > \ A FoldedLine scalar > > & An anchor > > * A reference node > > ! A class/type/color indicator > > = A non-magical default value for a map. > This is purely a matter of taste, but my proposal used different indicators for anchors, references, and classes: foo: =1 {java.lang.Date} 2001-08-05 bar: ^1 Why "=" for anchor nodes? Because it implies in a fuzzy sort of way that this node has an equvalent alias of 1. Why "^" for references? Because it looks like an arrow pointing to a node that was declared earlier in the document. Why "{" and "}" to delimit types? Because curly braces make me think of programming languages and these types are specifically so that programming languages can de-serialize into native types. It's also much prettier than using ! and sets the class apart more from the value then ! does. Compare: foo: {java.lang.Date} 2001-08-05 foo: !java.lang.Date 2001-08-05 To me it's much easier to see where my class starts and ends and the scalar begins. I also didn't have a special indicator for nulls. Instead I just referenced the 0th node. foo: ^0 Alternatively, we don't need a 0 at all. foo: ^ That's referencing the node that isn't there. I'm really amazed at how similar your proposal is to mine. I hope everyone else likes it as well. Jason. |
From: Clark C . E. <cc...@cl...> - 2001-08-05 18:15:03
|
On Sun, Aug 05, 2001 at 10:54:34AM -0700, Jason Diamond wrote: | Are you requiring that all floats and doubles have the | exponent part to distiniguish them from rationals? Yep. And I don't think the distinction between a float and double is worth the hassle. | I was also going to suggest that we support characters, durations, and | booleans. So strings with the value true or false would have to be quoted. | Durations are of a specific form as well. See | http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#duration for details. | Characters could be quoted using single quotes. That would cover just about | all of the major types used in most programming languages. I think that Integer, Real, Rational, Date/Time are pretty adequate; the others are kinda edge cases. | As cool as the idea sounds of adding handlers for newer types, that would | break interopability. Deserializers that don't have those same handlers | installed would read those values back as strings so I think we need to | limit the amount of supported auto-detected datatypes and required that all | others be tagged. Ahh. Yes. Could be a bit problematic. Hmm. I don't have my thinking cap now... so I'll have to defer comment till later. | Block scalars seem awkward... I think it's the best compromise. We can't revisit everything! | > =: \ | > Default value which the API *may* use | > when map node is treated as a scalar. | > It is non-magical and a feature of | > the YAML serial API only. | | If it's a serial API only, then it can use the first value in | the map, can't it? We wouldn't have to use up = as a special key. Yes, but the internal model can re-arrange. And we can't depend upon the interal model preserving order. | > trail: |- | > The block scalar without the trailing | > new line. | | Couldn't you use \ at the end of the line to indicate that there wasn't any | trailing text? The indentation would signify that the scalar value was | finished. I think this was the best compomise... I guess I can only refer you to the archives. | > B. rfc822: We could allow this type of scalar, | > (similar to LineFolded), but it cannot | > begin with a carraige return. Thus, in | > this case it must start on the same line. | | Allowing spaces or colons in the key would break RFC 822 compaibility. Is | that really a necessary goal? Requiring that indentation be four spaces | would also make it impossible to read most RFC 822 documents. Personally, I | don't see why compatibility with RFC 822 is a goal but fixing the spaces | really bothers me. Fixing the spaces is the only way to remove the | block scalar requirement. This is Brian's baby ;) | This is purely a matter of taste, but my proposal used different | indicators for anchors, references, and classes: | | foo: =1 {java.lang.Date} 2001-08-05 | bar: ^1 | | Why "=" for anchor nodes? Because it implies in a fuzzy sort of way that | this node has an equvalent alias of 1. | | Why "^" for references? Because it looks like an arrow pointing to a node | that was declared earlier in the document. I'd like to use = for the default map value, and I'd rather not have the meaning of = be dependent upon context. | Why "{" and "}" to delimit types? Because curly braces make me think of | programming languages and these types are specifically so that programming | languages can de-serialize into native types. It's also much prettier than | using ! and sets the class apart more from the value then ! does. Compare: | | foo: {java.lang.Date} 2001-08-05 | | foo: !java.lang.Date 2001-08-05 I like ! beacuse it is consistent with & and *. | foo: ^ | | That's referencing the node that isn't there. A bit magical no? Best, Clark | I'm really amazed at how similar your proposal is to mine. | I hope everyone else likes it as well. We are just kinda assimilating differences and mutating them into place. Thank you for your feedback! Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-08-05 18:48:14
|
On Sun, Aug 05, 2001 at 08:52:08PM +0200, Oren Ben-Kiki wrote: | Clark C . Evans [mailto:cc...@cl...] wrote: | > I think they are confusingly similar. "|" was good for a | > block indicator. Here is a counter proposal: ... | | In short, I don't think that a "colored graph" is a good idea. Ok. I have my doubts as well. | Simple != readable. No joke. | Do you really believe that: | | delivery: % | !: date | &: 17 | =: 2001-02-03 | | Is more readable than: | | delivery: %(!date &17) 2001-02-03 Yes. That being said... why don't we back-pedal and completely drop the idea of class/type since it isn't gaurenteed to round-trip or requires magical re-write techniques. As we have discovered, there are many approaches to handling class/type, which you have convinved me can be layered. For example, one could use an external schema, or regular expression for type detection, or the above defaulting mechanism based on the ability to treat a map or list as a scalar. For an escape hatch, we can even add a "comment" mechanism which is explicitly gaurenteed to round-trip. This can then be used to provide for short-hand mechansim at a higher layer. Of course, this requires that both the sender and receiver agree on the particular short-hand. In the worst case, these comments can be attached to a node via an external lookup table (shadow). Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-08-05 17:51:26
|
Clark C . Evans [mailto:cc...@cl...] wrote: > I think they are confusingly similar. "|" was good for a > block indicator. Here is a counter proposal: ... "It's a long way to our YAML, It's a long way to go; It's a long way to our YAML, To the best format I'll know! Good-bye, RFC0822! Farewell, round-trip Perl! It's a long, long way to our YAML, But my heart's right there! " (Sung to "Its a long way to Tipperary" :-) In short, I don't think that a "colored graph" is a good idea. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-08-05 18:26:27
|
Clark C . Evans [mailto:cc...@cl...] wrote: > 5. An optional !class can immediately > precede any node. This is not a short > hand and serializes the type, if any, > of the given node. If the type is > given, any type auto-detection (4) > is superceded. OK, from all the proposal (syntax issues aside - I still think the 3 quotes proposal is better), this is the one that bothers me the most. What if the type is not recognized by the system? Does the '!class' remain as a prefix to the scalar value? How does one assign a type to a map (Brian's use case)? In short, how does your colored-graph model allow "slurping" the YAML file into Perl native data format, reasonably accessing the values in it, and then writing them back unchanged? Also: > I think that Integer, Real, Rational, Date/Time are > pretty adequate; the others are kinda edge cases. You forgot binary. And Rational isn't an edge case? Not to mention: > | As cool as the idea sounds of adding handlers > | for newer types, that would break interopability. > | Deserializers that don't have those same handlers > | installed would read those values back as strings > | so I think we need to limit the amount of supported > | auto-detected datatypes and required that all > | others be tagged. > > Ahh. Yes. Could be a bit problematic. Hmm. > I don't have my thinking cap now... so I'll > have to defer comment till later. A *bit* problematic? It is the core issue - you either end up trying to define the ultimate, one-size-fit-all data typing schema (good luck to you!), or you will be forced to see that typing is a schema specific issue and the only way to handle interoperability is to be able to rely on a more basic way to parse the file, one where everything is one of map, list or string/null and nothing else. In which case, welcome back to my view. Note that using my way there is no problem adding as many new types as you wish, while maintaining full interoperability :-) Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-08-05 18:52:36
|
On Sun, Aug 05, 2001 at 09:27:13PM +0200, Oren Ben-Kiki wrote: | Does the '!class' remain as a prefix to the scalar value? | How does one assign a type to a map (Brian's use case)? Doh! | You forgot binary. And Rational isn't an edge case? Currency is in most legal systems a Rational number. | A *bit* problematic? It is the core issue - you either | end up trying to define the ultimate, one-size-fit-all | data typing schema (good luck to you!), or you will be | forced to see that typing is a schema specific issue | and the only way to handle interoperability is to be | able to rely on a more basic way to parse the file, | one where everything is one of map, list or string/null | and nothing else. In which case, welcome back to my view. Right. I'm sold. And given that I can't digest the short-hand notation, I think this means we must punt the entire class/type issue up one level. | Note that using my way there is no problem adding as | many new types as you wish, while maintaining full | interoperability :-) I think we have the same ideals... I just have become dead-set against the rewriting or short-hand mechanism. Not that it can't be applied at a higher level. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-08-05 08:24:48
|
On Sun, Aug 05, 2001 at 10:52:37AM +0200, Oren Ben-Kiki wrote: | Brian's implicit indicators scheme has the advantage that | it removes many indicators from the YAML file: most maps | and list won't need any. I've given the implicit indicators some thought and I must say I don't like it beacuse it makes YAML more complicated. Having indicators gives us N rules to memorise. Having all of these indicators optional under particular circumstances doubles the number of rules. After a short period of time, your eyes glance over the % and @ without problem. I'd rather keep YAML simple and thus more readable. No short hands! Best, Clark |