From: Oren Ben-K. <or...@ri...> - 2001-08-12 16:42:28
|
Clark C . Evans [mailto:cc...@cl...] wrote: > I spent a few minutes looking at it; I'll dedicate far more > time later. What I noticed: > > - Characters needs to be included in the information model Yes, we should explicitly say "Character == Any Unicode character" there. > - The character production needs to be fixed somewhat since > Both #D and #A are allowed in the serialized format, some > better way to denote the normalization should be given, > and then a note that characters in the information model > that are not directly expressable via the char production > must be escaped. Wording, wording... I knew there will be a lot of that... > - The unquoted scalar should be broken into two scalar types, > those starting with alphabetic characters ("unquoted"), and > those that remain ("implicit"). Ideally this distinction > will be based on a character property of the unicode > specification. > > Currently, an alphabetic scalar is implicitly typed... which > is not correct. Types for alphabetic scalars must be > explicit, less we have a bag of worms. I don't see why (this is a *required* implicit type - it says so in the appropriate section). It doesn't make sense to double the productions (or, worse, to add a whole new section!) just for this. Perhaps a stronger wording on the implicit typing of these scalars would do instead? What exactly is the potential problem you see here? > - I still don't quite understand the reference to reference > mechanism. A bit of explanation would help. You are probably right. This is brand new text explaining a complex subject, after all. Why don't you try to put your finger on the problem so we can improve the wording accordingly? > - The spec should state that implicit types will be dictated > by the YAML spec or amendments to the YAML spec. Implicit > types should not be customizable as this will hurt > interoperability. I'm not convinced of that. We should probably put stronger wordings to the effect that by sticking with "widely accepted types" as listed in the spec, they would gain interoperability; and that by "customizing" them as you put it, interoperability suffers. I see no point in forbidding customizations. If that makes sense for some application, it will be done anyway, with or without our "permission". I also don't think we should create a distinction between implicit and explicit types. Both are part of the document's Schema, handling unknown types is the same, encoding is the same, their name space is the same, and for interoperability both systems should agree on the set of types. And surely we want to allow people to customize their *explicit* types, right? :-) > Very nice work Oren. Whew, a relief. Now I await the other shoe to drop. Brian? :-) > When I get a chunk of time... > (which might not be until the first sale of the > application I'm developing... mid September at the > very latest) I will go through the spec with a very > fine tooth comb. Good luck, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-08-13 09:30:34
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | What exactly is the potential problem you see here? > > In general, I've used productions when ever a different > set of requirements have held (such as the folded text > production) rather than put in specific wording. Wording > can be skipped/mis-interpreted by an implementer. > However, a completely different production can't > be skipped. > > And given that there is a big difference between unquoted > scalars starting with an alpha (not implicity typed) and > the remainder (which are implicitly typed), I think that > this really merits a different production. Hmmmm. I guess so... Though for my money, any reasonable implementation would do it the way I described (first step: tokenizing it in a uniform way; second step: look at the data to determine the type. "<alpha>.*" is just another regexp for such an implementation. But if you insist, I've no problem with making it a separate production. Two, actually - one for keys as well... > So. I could define "19990302" to be a date if I wanted? Being YAML should not, IMHO, mean that the document schema is constrained in any way. In this respect XML got it right. That said, we do list a set of types such that using these types ensures a degree of interoperability between a wide range of languages and systems. So, if for some obscure reason your application could really benefit from using base 9 integers, dates as you wrote above, and writing floating point numbers right-to-left - so be it. Of course it wouldn't be interoperable with most of the YAML world, but given these requirements I doubt that's a factor in the first place. It would still work with Y-Pretty-Print, however! I'd be more willing to "require" these implicit types if that was useful in any way. But it seems to me that if anyone has a twisted application for some reason, he'll ignore such a requirement anyway. What's the point in making what he'll be using "not YAML"? All it would do is lose us one potential user (e.g., he won't be able to use standard YAML tools). What's the point in that? > I strongly feel that the set of implicit types be > defined centrally. If someone wants another implicit type, > they should propose an addition to the YAML specification. > Otherwise we will have people "extending" YAML in > incompatible ways and then YAML community as a whole > can't add new implicit types beyond the initial core. Did *I* say that? Doesn't sound like me. If I did say that I hereby renounce it :-) > Is this want we want? Not to be able to add new > implicit types as they become necessary? Definitely not - IMHO. The way I feel is that the set of explicit *and* implicit type definitions is just a part of the document's schema. I see no difference between any of the following: - The map "center" is of class "point" (explicit type). - The definition of "point" is (class type definition): - It requires three keys "x", "y" and "accuracy" (structure) - "x" and "y" should be of type "float" (implicit type) - "x" and "y" should be in the range 0 - 100. - The definition of "float" is: ... - "accuracy" should be of type "error" (explicit type) - "accuracy" should be in the range 0 - 1 - Etc... Sample: center: !point x: 12.5 y: 3. accuracy: !error 0.1 Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-08-13 09:34:12
|
Clark C . Evans [mailto:cc...@cl...] wrote: > - As for single quote strings... > > | if this: > | 'isn't forbidden, then...' > | is this: > | 'a key' : or a scalar value?' > > Hmm. tough one. I can only think of one solution > off hand... forbid the single quote scalar form > for keys (we already forbid the block form). Also, > I'd drop the terminating quote as any resonable > programmer would assume that intermediate quotes > would have to be escaped. So, you suggest we use ' as an indicator. I agree with Brian that's counter intuitive. People expect it to be used as surrounding quotes. Also we don't forbid any form for keys now - keys can be blocks as well as maps and lists, due to the production allowing them to be nodes: map: % map : key : value | block key : value Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-08-13 10:01:07
|
Stephane Payrard [mailto:s.p...@wa...] wrote: Hi Stephane. It is good to see new people on the list. > first a minnor correction to you last draft. > > <td valign="top"><nobr>::= <code><a > href="#chr">char</a> - <a href="#eol"> > > should read > > <td valign="top"><nobr>::= <code><a > href="#char">char</a> - <a href="#eol"> > > Any such error could be probably avoided by replacing "chr" by "char" > everywhere in the spec. I am a Perler and all for concision, I am > note sure using "chr" buys us much. Thanks for catching this. We should really use some tool to verify that all such links are correct. I'm on the verge of writing a Perl script to do it for me - surely something like that is already available somewhere, though. > I unserstand that XML infoset and YAML infoset are deliberately > different. Neither is a subset of the other. But it would be > interesting to state more clearly in 1.4 what will be the loss > incurred when converting a XML document to a YAML one. For such a > lossless convetions, my understanding is that YAML is lacking > namespace besides the already mentionned missing XML whitespace > policy. Good points. How to round trip between XML and YAML? It is probably worth a separate (non-normative) document. Let's see: <tag attr="value">mixed <a:bold xmlns="...">text</bold> here</tag> tag: attr: value =: : "mixed " : "a:bold": xmlns: ... =: "text" : " here" Of course, once you throw in comments, processing directives, DTDs etc. things would start to get hairy. The other way around is harder, since YAML is richer than XML. Off the top of my head: object: !class #: @ : First comment : Second comment x: 12.5 y: 3. accuracy: !error 0.1 % key: value : value for map key <object yaml:type="class"> <yaml:tag yaml:name="#" yaml:type="list"> <yaml:entry>First comment</yaml:entry> <yaml:entry>First comment</yaml:entry> </yaml:tag> <x yaml:type="implicit">12.5</x> <y yaml:type="implicit">3.</y> <accuracy yaml:type="error">0.1</accuracy> <yaml:tag yaml:type="map"> <yaml:key> <key>value</key> </yaml:key> <yaml:value>value for map key</yaml:value> </yaml:tag> </center> Note the heavy use of the "yaml:" namespace (org.yaml based) to handle constructs not easily specified in XML. There would be zillion little details to work out to make this work well. An interesting project. And *definitely* one for a separate document. :-) > I have subscribed the list a few weeks ago, but I did not have > read all of it. So forgive me if the following propositions are > out of the scope of the list. For this reason, I ask your advice > before working them out and sending them the list at large. > Currently they are half-baked propositions. I have not worked-out > all the consequences. It isn't out of scope. Round-tripping XML in YAML or vice versa isn't our main focus, but it is a very relevant issue to YAML. In particular, it would be a very nice thing to be able to grab someone's XML and show him how it would look much nicer in YAML. It should at least work for XML-RPC, SOAP, RDF, or any other data-oriented XML - it won't work well for "marked up text". Being able to do this may be another good way to gain recognition for YAML. If this is of interest to you, feel free to take the issue on! I suggest you to look up the Common-XML spec and begin with that - general XML is too big to start with. Welcome aboard, Oren Ben-Kiki |
From: Stephane P. <s.p...@wa...> - 2001-08-13 14:01:35
|
Before fleshing this stuff out, I want early feeback to see if this really belong to this list: To make YAML more conformant to our first goal (maximum human readability) I would like to add two syntactical constructs: contextualizers and oneliners. They both address in complementary ways the problem of making readable large and deep structure. contextualizer and oneliners are alternative constructs: they just express in a more readable and concise style material that can already be expressed without them. So it is probably wise to propose them as optional YAML features. In other words, a YAML emitter must be able to emit YAML free of contextualizers and oneliners, but may be able tu use them. Symmetrically a YAML parser may be able to accept YAML with contextualizers and oneliners but is not required to. -Contextualizers make possible for a human reader at some point of a YAML file to figure out outer structures without roaming thru the whole file. They are also a device to reduce the indentation needed to express deep structures. -Oneliners makes possible to represent some composite structures in one line. This comes at the cost of added conventions but this yields smaller and more readable YAML data files. A bigger YAML language makes for small YAML data file but is not necessarily more readable. Our goal is certainly not to make a compressor. We must find to the right trade-off. I think that my propositions go in the right dirction. === contextualizer This convention tentatively named contextualizer will be a comment as far as the parser is concerned, except when it will permit to reduce indentation. The goal is to make possible for the human reader to grasp context that otherwise span many pages. Example: if the invoice YAML sample is made deeper nested and a command includes hundreds of products. How could you know the page (or the screen) you are reading is in the middle of a list of products? With the contextualizer, you can figure out the enclosing structures. I understand that such a feature should really belongs to a specialised editor. But this kind of thing never happens. Let me cite a similar example from the language C. in the perl source, when reading toke.c, I would like to know that I am reading code belonging to yylex() even if it the beginning of the fonction is 2000 lines before the line I currently read. In emacs that information could be displayed in the modeline. Neither emacs, nor vi do it as far as I know. Visual-whatever does it, I think by displaying an hlist. Anyway, In C such a case of a syntactical structures spanning pages is pathological, but in YAML we can expect it to be the rule. If the contextualizer is manually generated , a simple contextualizer is like an assert() in C code. The parser will verify it is correct. The BNF for contextualizer: contextualizer := '##' [ imap | ilist | key(n) ] * A simple contextualizer is not really useful in the following example because that example is so short. Anyway it illustrates the principle: Am example of contextualizer ## product@{ An example of production with that contextalizer: product : @ : % desc : Grade A, Leather Hide Basketball id : BL394d ## product@% # <-- the contextualizer price : $450.00 quantity : 4 This example was about getting outer context for large structures (many pages of YAML). An additonal use of contextualizer is dealing with deep structures. A structure of depth 20 would be indented by 20 x 4, that is 80 characters. This is insane. We introduce an added convention called marker. From one marked contextualizer to the next one, the indentation for the struture ## product@#% # <-- the contextualizer with marker # there is two structure product : @ : % desc : Grade A, Leather Hide Basketball id : ## product@#% # <-- the contextualizer with marker # so the indentation is 2 x 4 less spaces price : $450.00 quantity : 4 ## # <--- closing contextualizer (special case) # ( or '###' which is consistent with the # previous definition and the BNF below ) The BNF becomes marker := '#' contextualizer := '##' [ imap | ilist | key(n) | marker ] * An added constraint: there is at most one marker in a contextualizer =One liners more formal definition and BNF to be done. Note: Consider comma for separating key,value pairs. This increase readability and does not add constraints. Indeed, A string containing a comma is likely to contain whitespace and beeing allready quoted. With quoted strings, the comma visually spearates pairs. Also, I don't know if this has been discusses before, but I would like a device to express (possibly many level deep) short structures as one liners. This would minimize the use of contextualizers by having denser information. The '[' and ']' would respectively be the opening and closing list delimiter. the '{' and '}' would respectively be the opening and closing map delimiter. A openening delimiter and its matching closing delimiter MUST be on the same line. Additional constraint for scalar within a one liner: they must be simple(n) without whitespace or quoted (n). With this convention the following YAML emissions are equivalent: product : @ : % id : BL666 title : the YAML primer unit price : $45.00 quantity : 1 # Note that the string "the YAML primer" must be quoted in oneliners because it # contains whitespaces product : [ { id: BL666 title: "the YAML primer" price: $45 quantity: 1 } ] product: @ { id: BL666 title: "the YAML primer" "unit price": $45 quantity: 1 } # with comma as pair separator product: @ { id: BL666, title: "the YAML primer", "unit price": $45, quantity: 1 } The following emission is incorrect: product: [ { id: BL666 title: "the YAML primer" "unit price": $45 quantity: 1 } ] # <--- incorrect: closing delimitor must be on same line as the # matching opening one. We could make the closing delimitor '}' and ']' optionals at the end of a line. Not that I am advocating it. == arrays MOSTLY TO BE DONE, MAY BE DROPPED, SYNTAX TO BE THOUGHT OUT. STUDY COMBINATION of array syntax and oneliner. That is a luck that the 'list' word has been chosen for an ordered sequence leaving available the word "array" Bidimensional array is a very common data structure, YAML has to support it. In a way, it buys us multidimensional arrays as well because a multidimensional array (with dim >2) is a list of bidimensional arrays. idmatrice : @ @ 1 0 @ 0 1 2 candidate array syntax idmatrice : @ @ 1 0 @ 0 1 idmatrice : @ [ 1 0 ] [ 0 1 ] product is here an array of map. product : @ : % id : BLD394D price : $450 quantity : 4 : % id : BLD3800 price : $100 quantitty : 2 array sytax product : @ id price quantity BLD394D $450 3 BLD3800 $100 2 Conclusion: Adding contextualizers and oneliners as YAML syntactical constructs increases both expressivness and readability at the cost of a slighty longer learning curve. Also YAML editors will have to be smarter to use these added conventions but they don't have to. The resultant increased readability will be probably made more apparent and necessary when dealing with real world data. -- stef Hi Oren! On Mon, 13 Aug 2001, Oren Ben-Kiki wrote: > Stephane Payrard [mailto:s.p...@wa...] wrote: > > Hi Stephane. It is good to see new people on the list. > > > first a minnor correction to you last draft. > > > > <td valign="top"><nobr>::= <code><a > > href="#chr">char</a> - <a href="#eol"> > > > > should read > > > > <td valign="top"><nobr>::= <code><a > > href="#char">char</a> - <a href="#eol"> > > > > Any such error could be probably avoided by replacing "chr" by "char" > > everywhere in the spec. I am a Perler and all for concision, I am > > note sure using "chr" buys us much. > > Thanks for catching this. We should really use some tool to verify > that all such links are correct. I'm on the verge of writing a Perl > script to do it for me - surely something like that is already > available somewhere, though. > > > I unserstand that XML infoset and YAML infoset are deliberately > > different. Neither is a subset of the other. But it would be > > interesting to state more clearly in 1.4 what will be the loss > > incurred when converting a XML document to a YAML one. For such a > > lossless convetions, my understanding is that YAML is lacking > > namespace besides the already mentionned missing XML whitespace > > policy. > > Good points. How to round trip between XML and YAML? It > is probably worth a separate (non-normative) document. Let's > see: > > <tag attr="value">mixed <a:bold xmlns="...">text</bold> here</tag> > > tag: > attr: value > =: > : "mixed " > : > "a:bold": > xmlns: ... > =: "text" > : " here" > > Of course, once you throw in comments, processing directives, DTDs > etc. things would start to get hairy. The other way around is harder, > since YAML is richer than XML. Off the top of my head: > > object: !class > #: @ > : First comment > : Second comment > x: 12.5 > y: 3. > accuracy: !error 0.1 > % > key: value > : > value for map key > > <object yaml:type="class"> > <yaml:tag yaml:name="#" yaml:type="list"> > <yaml:entry>First comment</yaml:entry> > <yaml:entry>First comment</yaml:entry> > </yaml:tag> > <x yaml:type="implicit">12.5</x> > <y yaml:type="implicit">3.</y> > <accuracy yaml:type="error">0.1</accuracy> > <yaml:tag yaml:type="map"> > <yaml:key> > <key>value</key> > </yaml:key> > <yaml:value>value for map key</yaml:value> > </yaml:tag> > </center> > > Note the heavy use of the "yaml:" namespace (org.yaml > based) to handle constructs not easily specified in > XML. There would be zillion little details to work > out to make this work well. An interesting project. > And *definitely* one for a separate document. :-) > > > I have subscribed the list a few weeks ago, but I did not have > > read all of it. So forgive me if the following propositions are > > out of the scope of the list. For this reason, I ask your advice > > before working them out and sending them the list at large. > > Currently they are half-baked propositions. I have not worked-out > > all the consequences. > > It isn't out of scope. Round-tripping XML in YAML or vice > versa isn't our main focus, but it is a very relevant issue > to YAML. In particular, it would be a very nice thing to be > able to grab someone's XML and show him how it would look > much nicer in YAML. It should at least work for XML-RPC, SOAP, > RDF, or any other data-oriented XML - it won't work well for > "marked up text". Being able to do this may be another good > way to gain recognition for YAML. > > If this is of interest to you, feel free to take the issue > on! I suggest you to look up the Common-XML spec and begin > with that - general XML is too big to start with. > > Welcome aboard, > > Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-08-12 17:02:10
|
On Sun, Aug 12, 2001 at 07:43:16PM +0200, Oren Ben-Kiki wrote: | > - The unquoted scalar should be broken into two scalar types, | > those starting with alphabetic characters ("unquoted"), and | > those that remain ("implicit"). Ideally this distinction | > will be based on a character property of the unicode | > specification. | > | > Currently, an alphabetic scalar is implicitly typed... which | > is not correct. Types for alphabetic scalars must be | > explicit, less we have a bag of worms. | | I don't see why (this is a *required* implicit type - it says | so in the appropriate section). It doesn't make sense to double | the productions (or, worse, to add a whole new section!) just | for this. Perhaps a stronger wording on the implicit typing of | these scalars would do instead? | | What exactly is the potential problem you see here? In general, I've used productions when ever a different set of requirements have held (such as the folded text production) rather than put in specific wording. Wording can be skipped/mis-interpreted by an implementer. However, a completely different production can't be skipped. And given that there is a big difference between unquoted scalars starting with an alpha (not implicity typed) and the remainder (which are implicitly typed), I think that this really merits a different production. | > - The spec should state that implicit types will be dictated | > by the YAML spec or amendments to the YAML spec. Implicit | > types should not be customizable as this will hurt | > interoperability. | | I'm not convinced of that. We should probably put stronger | wordings to the effect that by sticking with "widely accepted | types" as listed in the spec, they would gain interoperability; | and that by "customizing" them as you put it, interoperability | suffers. So. I could define "19990302" to be a date if I wanted? I strongly feel that the set of implicit types be defined centrally. If someone wants another implicit type, they should propose an addition to the YAML specification. Otherwise we will have people "extending" YAML in incompatible ways and then YAML community as a whole can't add new implicit types beyond the initial core. Is this want we want? Not to be able to add new implicit types as they become necessary? Best, Clark |