From: Brian I. <in...@tt...> - 2004-02-03 22:38:37
|
Hi all, Oren Clark and I have been making several changes to YAML, and I want to run them by the list by giving examples. Most of these changes are optimizations that loosen the rules to allow some nice use case tricks. A few of these actually tighten the rules in a non backwards compatible way. As far as we can tell, we don't think that we are breaking very many, if any, valid YAML documents in the wild. If this is not the case, please let us know. The jist of all this is that YAML is being polished to make things better, and that the typical YAML of yesterday remains exactly the same. 1) The first change is to allow a clever syntax for unordered sets. Since mapping keys work nicely for this, we used to be able to say: --- banana: peach: orange: which would load as a hash with null values. This makes a decent set object since you, by definition, can't have duplicate keys, and key order is not important or preserved. Clark wanted to do this: --- ? banana ? peach ? orange which was a relaxation or the previously valid: --- ? banana : ? peach : ? orange : We decided this was a nice use case. The '?' is more declarative. The ':' looks more accidental. Of course you can mix these: --- ? banana peach: orange: yum But that would be silly. 2) The next change is for flow collections. You know, { ... } and [ ... ]. We decided to allow lists inside of curly brackets, and pairs inside square brackets; with the following semantics: --- # These - { banana, peach, orange } - [ banana: yellow, peach: pink, orange: orange ] --- # are equivalent to these - ? banana ? peach ? orange - - banana: yellow - peach: pink - orange: orange So we get flow collections for "sets" and "ordered maps". These forms were previously invalid YAML, so this is just a loosening of the rules and therefore doesn't break backwards compatibility. If it isn't immediately obvious, the above are different from the old flow forms of: --- # These - [ banana, peach, orange ] - { banana: yellow, peach: pink, orange: orange } --- # are equivalent to these - - banana - peach - orange - banana: yellow peach: pink orange: orange And of course, mixing pairs with values is allowed: --- - { foo, bar: baz } - [ foo, bar: baz ] --- - foo: bar: baz - - foo - bar: baz 3) We decided that default typing should go away completely. Previously, a plain scalar (no quotes, no '|' or '>') was required to be reported by the parser as having an empty tag. Having an empty tag is a signal for implicit typing. All other scalars (quoted, etc) were assigned a tag of '!str', thereby defeating implicit typing. And this made sense, because if you quote something it should be a string, right? --- implicit number: 123 quoted string: '123' null: empty string: '' explicitly implicit number: ! '123' Also collections without an explicit tag, were assigned '!map' or '!seq', and thus avoided implicit collection typing. Well we decided that default typing was just too weird. But we wanted to keep the same overall effect. So now we require that a parser report whether a scalar is plain or not. Then the receiver can use that information itself to determine the appropriate type. Since this is the case, we no longer allow an explicit empty tag (like above) since it adds no value. For the most part this is a transparent change to YAML users. We consider the new way the lesser of evils, but the use case of forcing a string is too important to ignore. So this is the cleanest way we can explain it. 4) We eliminated some nasty unbounded lookaheads, to make it easier to write parsers. The basic rule is this: '?' starts a mapping key, and ':' starts a mapping value. They are always required unless the mapping key is a single line (and less than 1024 characters long), or the mapping value is null. So before this was valid: --- [ 1, 2, 100000 ] : value1 "multi line string key" : value2 simple key: value3 and now it must be: --- ? [ 1, 2, 100000 ] : value1 ? "multi line string key" : value2 simple key: value3 This makes programming a parser immensely simpler, because a complex key can be discerned from a complex scalar without having to parse the whole thing. Remember, YAML parsing requires the autodetection of a new node, and what looks like a gigantic mapping might really be a key of another mapping. Luckily, this change affects mostly oddball mapping keys, since programmers tend to just use simple keys for mappings. The extra '?' indicators actually seem to *add* clarity for human readers of YAML. Even humans have to look ahead; wetware is just generally better at parsing than most software. :) Cheers, Brian |
From: Oren Ben-K. <or...@be...> - 2004-02-03 22:58:53
|
Great summary, Brian. Thanks! A clarification/elaboration with regard to keys: --- Dice throws: [ 1, 2 ] : three ... Is valid because the key is in one-line (and less than 1K). One may omit the '?' for _any_ one-line (short) key, and in this case the ':' must appear in the same line. Otherwise, the notation must be used, and the ':' (if any) must be on a separate line: --- Dice throws: ? [ 1, 2 ] : three ... We originally considered requiring the '?' even in the first case: --- Dice throws: ? [ 1, 2 ] : three ... But this would implicitly allow: --- Dice throws: ? [ 1, 2 ] : three ... Which we do _not_ want to allow. Forbidding it requires introducing the notion of one-line flow collections anyway, so we might as well make use of it and get rid of the '?' in such a case. Have fun, Oren Ben-Kiki |
From: why t. l. s. <yam...@wh...> - 2004-02-04 22:35:15
|
Couple notes to add as I've started implementing these changes into a branch of Syck, just to get an idea of what they'd take. Good ideas, all of them. Sweet and succulent are the fruits of the world's 1st global thermonuclear YAML conference. Brian Ingerson wrote: >1) > > --- > ? banana > ? peach > ? orange > > > I'm working through a shift/reduce on this one. Chances are that I've implemented it wrong, but occassionally there's a conflict of syntax somewhere. Take the below: >which was a relaxation or the previously valid: > > --- > ? banana > : > ? peach > : > ? orange > : > > > I haven't played around much with this syntax, but this appears to be valid: --- ? banana : - 1 - 2 Equivalent to: {banana: [1, 2]} It just looks whacko with the marks all lined up, got me straight? Shortcuts. >2) The next change is for flow collections. You know, { ... } and [ ... ]. >We decided to allow lists inside of curly brackets, and pairs inside >square brackets; with the following semantics: > > --- # These > - { banana, peach, orange } > - [ banana: yellow, peach: pink, orange: orange ] > > > This change is checked into Syck CVS. You can grab the head branch and test it. >3) We decided that default typing should go away completely. >[...] >So now we require that a parser report >whether a scalar is plain or not. Then the receiver can use that >information itself to determine the appropriate type. > > I'm debating whether to add this as a new kind. i.e. enum syck_kind_tag { syck_map_kind, syck_seq_kind, syck_str_kind, syck_plain_kind }; Or if the SyckNode struct needs a new member for a plain flag. I'm so, so terribly tempted to just leave typing the way it is and when the inspectors come by, just tuck in the bedsheets really tight so no one can tell. >4) We eliminated some nasty unbounded lookaheads, to make it easier to >write parsers. The basic rule is this: > > '?' starts a mapping key, and ':' starts a mapping value. They are > always required unless the mapping key is a single line (and less > than 1024 characters long), or the mapping value is null. > > > Great thanks. Never supported the previous syntax, so I won't know what we're missing. _why |
From: Oren Ben-K. <or...@be...> - 2004-02-04 23:14:05
|
why the lucky stiff wrote: > I'm working through a shift/reduce on this one. Chances are that I've > implemented it wrong, but occasionally there's a conflict of syntax > somewhere. Sure. Consider this: --- : value # Null key ? Key1 # Null value ? Key2 : Hmmm ... A twisted way to read this is that 'Key2' has a null value and 'Hmmm' has a null key. "Obviously" what we want is for 'Hmmm' to be the value of 'Key2'. But the parser generator doesn't know this. Fortunately, such conflicts can (usually) be resolved by using appropriate directives. Just make sure it goes into the test suite... > I haven't played around much with this syntax, but this > appears to be valid: > > --- > ? banana > : > - 1 > - 2 > > Equivalent to: {banana: [1, 2]} > > It just looks whacko with the marks all lined up, got me straight? > Shortcuts. Yeah. It gets worse: --- ? - a - b : - 1 - 2 ... I think there's no helping it, though. A language that tries to make all "bad" things impossible ends up ruling out too many of the "good" ones (think ADA :-) > I'm debating whether to add this as a new kind. i.e. > > enum syck_kind_tag { > syck_map_kind, > syck_seq_kind, > syck_str_kind, > syck_plain_kind > }; This point was raised. We have (correctly I believe) decided not to "bless" this as a separate kind; it really isn't. However, this is at the spec's logical/formal definition level. Using a fourth kind may make sense from a coding point of view; it is a design/implementation choice. > Or if the SyckNode struct needs a new member for a plain > flag. That would be "more correct" from a spec point of view, but again, it is a design/implementation choice. > I'm so, > so terribly tempted to just leave typing the way it is and when the > inspectors come by, just tuck in the bedsheets really tight so no one > can tell. We felt the same way :-) You'll need to provide _some_ way to implicit type ints/dates/etc., I'm afraid... We know it is a wart, but it is too important a feature to give up. Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-02-04 23:40:26
|
Hey, not to knock these efforts or anything, but it seems these recent syntax changes are UBER OBFUSCATED...why does YAML need this sort of thing? If: ? banana - 1 - 2 is equivalent to: {banana: [1, 2]} ...what's wrong with "{banana: [1, 2]}"? Just curious. I dread running across someone's YAML that uses this syntax...I would have NO CLUE what the heck the structure of the data is. Why introduce syntax that goes from "YAML: easy to read" to "YAML: easy to read, if no one uses its cryptic syntax elements?" Sean O'Dell On Wednesday 04 February 2004 03:13 pm, Oren Ben-Kiki wrote: > why the lucky stiff wrote: > > I'm working through a shift/reduce on this one. Chances are that I've > > > > implemented it wrong, but occasionally there's a conflict of syntax > > somewhere. > > Sure. Consider this: > > --- > > : value # Null key > > ? Key1 # Null value > ? Key2 > > : Hmmm > > ... > > A twisted way to read this is that 'Key2' has a null value and 'Hmmm' > has a null key. "Obviously" what we want is for 'Hmmm' to be the value > of 'Key2'. But the parser generator doesn't know this. Fortunately, such > conflicts can (usually) be resolved by using appropriate directives. > Just make sure it goes into the test suite... > > > I haven't played around much with this syntax, but this > > appears to be valid: > > > > --- > > ? banana > > > > - 1 > > - 2 > > > > Equivalent to: {banana: [1, 2]} > > > > It just looks whacko with the marks all lined up, got me straight? > > Shortcuts. > > Yeah. It gets worse: > > --- > ? > - a > - b > > - 1 > - 2 > ... > > I think there's no helping it, though. A language that tries to make all > "bad" things impossible ends up ruling out too many of the "good" ones > (think ADA :-) > > > I'm debating whether to add this as a new kind. i.e. > > > > enum syck_kind_tag { > > syck_map_kind, > > syck_seq_kind, > > syck_str_kind, > > syck_plain_kind > > }; > > This point was raised. We have (correctly I believe) decided not to > "bless" this as a separate kind; it really isn't. However, this is at > the spec's logical/formal definition level. Using a fourth kind may make > sense from a coding point of view; it is a design/implementation choice. > > > Or if the SyckNode struct needs a new member for a plain > > flag. > > That would be "more correct" from a spec point of view, but again, it is > a design/implementation choice. > > > I'm so, > > so terribly tempted to just leave typing the way it is and when the > > inspectors come by, just tuck in the bedsheets really tight so no one > > can tell. > > We felt the same way :-) You'll need to provide _some_ way to implicit > type ints/dates/etc., I'm afraid... We know it is a wart, but it is too > important a feature to give up. > > Have fun, > > Oren Ben-Kiki > > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Clark C. E. <cc...@cl...> - 2004-02-05 00:39:12
|
Sean, Your email client seemed to be stripping ":" on you... On Wed, Feb 04, 2004 at 03:40:21PM -0800, Sean O'Dell wrote: | Hey, not to knock these efforts or anything, but it seems these recent syntax | changes are UBER OBFUSCATED...why does YAML need this sort of thing? | | ? banana | - 1 | - 2 ^ is a syntax error, but to answer your question, the (bad) example was... ? banana : - 1 - 2 Is there a way to require indnetation for this case? ? banana : - 1 - 2 | I dread running across someone's YAML that uses this | syntax...I would have NO CLUE what the heck the structure of the data is. Well, the example quoted is a syntax error, so you'd be as confused as a good YAML parser. ;) | Why introduce syntax that goes from "YAML: easy to read" to "YAML: easy to Well, any language worth learning has ways to make it look ugly... ;) | > --- | > : value # Null key | > ? Key1 # Null value | > ? Key2 | > : Hmmm | > ... This is quite twisted, but I don't see any reason why we should forbit the construct, it is ugly though. { : value, Key: , Key2: Hmm } | > A twisted way to read this is that 'Key2' has a null value and 'Hmmm' | > has a null key. "Obviously" what we want is for 'Hmmm' to be the value | > of 'Key2'. But the parser generator doesn't know this. Fortunately, such | > conflicts can (usually) be resolved by using appropriate directives. | > Just make sure it goes into the test suite... Well, I don't see any reason why this 'context' can't impose a non-zero indentation. Yea? | > > I haven't played around much with this syntax, but this | > > appears to be valid: | > > | > > --- | > > ? banana | > > : | > > - 1 | > > - 2 | > > | > > Equivalent to: {banana: [1, 2]} | > > | > > It just looks whacko with the marks all lined up, got me straight? | > > Shortcuts. | > | > Yeah. It gets worse: | > | > --- | > ? | > - a | > - b | > : | > - 1 | > - 2 | > ... Well, I can live with it if it would be really ugly to force indentation for this case. Boy, YAML really is a context-sensitive grammer isn't it? | > I think there's no helping it, though. A language that tries to make all | > "bad" things impossible ends up ruling out too many of the "good" ones | > (think ADA :-) Sometimes, as you recall, some of our "breakthroughs" have happened quite by accident. ;) Clark |
From: Clark C. E. <cc...@cl...> - 2004-02-05 00:26:20
|
On Thu, Feb 05, 2004 at 01:13:32AM +0200, Oren Ben-Kiki wrote: | > I haven't played around much with this syntax, but this | > appears to be valid: | > | > --- | > ? banana | > : | > - 1 | > - 2 | > | > Equivalent to: {banana: [1, 2]} | > | > It just looks whacko with the marks all lined up, got me straight? | > Shortcuts. | | Yeah. It gets worse: | | --- | ? | - a | - b | : | - 1 | - 2 | ... Er, these are anti-examples; as in examples that won't match the productions, right? | > I'm debating whether to add this as a new kind. i.e. | > | > enum syck_kind_tag { | > syck_map_kind, | > syck_seq_kind, | > syck_str_kind, | > syck_plain_kind | > }; | | This point was raised. We have (correctly I believe) decided not to | "bless" this as a separate kind; it really isn't. However, this is at | the spec's logical/formal definition level. Using a fourth kind may make | sense from a coding point of view; it is a design/implementation choice. In this new and brave YAML land, there isn't such a thing as a 'str' kind, it is a mapping, sequence or scalar. The idea that scalars get default typed as !str is now a thing of the past. We now have some other component, let's call it a "resolver", which goes through a YAML tree containing two sorts of nodes, converting the former into the latter: implicit: Implicit (or untagged) nodes are those with an empty tag, but have a flag saying if it is a plain scalar or not. Note, a true YAML representation won't be able to distinguish between plain scalar vs non-plain scalars. tagged: Tagged nodes are those which appear in the YAML stream with a non-empty "!tag" or have been tagged by the resolver. Tagged nodes do not record if they were created from plain scalars or not, this 'hack flag' is only used during he tagging process. In particular, a YAML representation graph, or a tree serialization uses only tagged nodes. The YAML presentation nodes can be either implicit or tagged. Somehow in the process of going from a presentation to a serialization/representation this "resolution" process must be carried out. Does this make sense? So, shooting from the hip, perhaps you have two enums? enum syck_kind_t { syck_kind_mapping = 2, syck_kind_sequence = 3, syck_kind_scalar = 4 } enum syck_unresolved_kind_t { syck_unresolved_mapping = syck_kind,mapping, syck_unresolved_sequence = syck_kind_sequence, syck_unresolved_scalar = syck_kind_scalar, syck_unresolved_plain_scalar = syck_kind_scalar + 1 }; and two different stucts? struct syck_node_t { syck_kind_t kind, syck_tag_t tag /* mandatory */ ... } struct syck_unresolved_node_t { syck_unresolved_kind_t kind, syck_tag_t tag, /* can be null */ ... } Just musing. As Oren said, these are implementation details.... I think. | > I'm so, | > so terribly tempted to just leave typing the way it is and when the | > inspectors come by, just tuck in the bedsheets really tight so no one | > can tell. | | We felt the same way :-) You'll need to provide _some_ way to implicit | type ints/dates/etc., I'm afraid... We know it is a wart, but it is too | important a feature to give up. It is hard compromise. Ideally, tags would always be mandatory... but since we cannot all agree on the implicit typing rules... ;) Clark |
From: Sean O'D. <se...@ce...> - 2004-02-05 01:12:43
|
On Wednesday 04 February 2004 04:26 pm, Clark C. Evans wrote: > In this new and brave YAML land, there isn't such a thing as a 'str' > kind, it is a mapping, sequence or scalar. The idea that scalars get > default typed as !str is now a thing of the past. We now have some > other component, let's call it a "resolver", which goes through a > YAML tree containing two sorts of nodes, converting the former > into the latter: This, to me, makes a lot of sense...I have wondered in the past why !str was the default. It's a good move leaving untyped scalars simply scalars. > implicit: > Implicit (or untagged) nodes are those with an empty tag, > but have a flag saying if it is a plain scalar or not. > Note, a true YAML representation won't be able to distinguish > between plain scalar vs non-plain scalars. By this, do you mean (could one also say): implicit nodes are nodes with no typing "!tag" but may be typed or left as untyped scalars? > tagged: > Tagged nodes are those which appear in the YAML > stream with a non-empty "!tag" or have been tagged > by the resolver. Tagged nodes do not record if they > were created from plain scalars or not, this 'hack flag' > is only used during he tagging process. Is the resolver a new step in the loading process? Parser->Loader->Resolver? > In particular, a YAML representation graph, or a tree serialization > uses only tagged nodes. The YAML presentation nodes can be either > implicit or tagged. Somehow in the process of going from a > presentation to a serialization/representation this "resolution" > process must be carried out. Does this make sense? This is one of those statements I have trouble with. What is a representation graph? What are presentation nodes? Assuming "representation graph" means the conceptual data structure...then, nodes are either tagged (explicitly or through the resolver) or left untagged (marked as a plain scalar). Correct? If "presentation nodes" means the physical YAML document, then the nodes are either tagged, implicit or left untagged (plain scalar). Correct? > So, shooting from the hip, perhaps you have two enums? > > enum syck_kind_t { > syck_kind_mapping = 2, > syck_kind_sequence = 3, > syck_kind_scalar = 4 > } > enum syck_unresolved_kind_t { > syck_unresolved_mapping = syck_kind,mapping, > syck_unresolved_sequence = syck_kind_sequence, > syck_unresolved_scalar = syck_kind_scalar, > syck_unresolved_plain_scalar = syck_kind_scalar + 1 > }; What's the difference between a scalar, and a plain scalar? Isn't something either typed or left as a scalar? Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-02-05 02:26:14
|
Howdy Sean! On Wed, Feb 04, 2004 at 05:12:40PM -0800, Sean O'Dell wrote: | On Wednesday 04 February 2004 04:26 pm, Clark C. Evans wrote: | > In this new and brave YAML land, there isn't such a thing as a 'str' | > kind, it is a mapping, sequence or scalar. | | This, to me, makes a lot of sense...I have wondered in the past why !str was | the default. It's a good move leaving untyped scalars simply scalars. | > implicit: | > Implicit (or untagged) nodes are those with an empty tag, | > but have a flag saying if it is a plain scalar or not. | > Note, a true YAML representation won't be able to distinguish | > between plain scalar vs non-plain scalars. | | By this, do you mean (could one also say): implicit nodes are nodes with no | typing "!tag" Yes, ones which do not have !tags on them in the character stream ("presentation"). | but may be typed or left as untyped scalars? Well, the resolver would put types on them, and during this 'typing' process, and choosing a tag for a scalar may use one peice of 'presentation' level information -- if the plain scalar style was used or not (a boolean flag). It is an ugly wart, but makes for very readable YAML documents. Bear with us here, this 'rethinking' happened a day or so before we left, and I'm sure the spec is not completely consistent with its impacts. If you don't mind, let me try an explanation on for size... | > tagged: | > Tagged nodes are those which appear in the YAML | > stream with a non-empty "!tag" or have been tagged | > by the resolver. Tagged nodes do not record if they | > were created from plain scalars or not, this 'hack flag' | > is only used during he tagging process. | | Is the resolver a new step in the loading process? Parser->Loader->Resolver? The three stage breakdown in the spec is: representation -- modeling your native data structures in a langauge independent manner serialization -- flattening these representations so that they can pass through a sequential-access medium such as a series of event calls. presentation -- making the serializations look pretty Othogonal to this breakdown are two processes that kinda go in the reverse direction: resolution -- this takes nodes which do not yet have a tag (we call these implicitly typed) plus a plain scalar flag and produces a tagged node without the plain scalar flag binding -- this takes a tagged node and produces either a canonical form (for equality comparision) or a native data object. I say kinda, beacuse resolution and binding could happen at any of the stages. One could resolve from the serialization or the representation. The spec goes into this somewhat, but it needs a bit more work. We call a representation which has all of its tags resolved, and bound a 'complete representation'. The YAML schema tools and such will be defined at this level, on complete representations. In short: It can happen in any of the three places! | > In particular, a YAML representation graph, or a tree serialization | > uses only tagged nodes. The YAML presentation nodes can be either | > implicit or tagged. Somehow in the process of going from a | > presentation to a serialization/representation this "resolution" | > process must be carried out. Does this make sense? | | This is one of those statements I have trouble with. What is a | representation graph? When you have native data, you need to "fit" it into the YAML abstract model for interoperability reasons. This abstract model is the "YAML representation" of your native data. It may or may not appear as a physical component of your system, if it is part of your YAML toolset, it will be a generic random access node API, or DOM. The abstract model is necessary since this is where a 'structural schema' would be defined and is the model upon which language independent YAML tools such as a YPATH would be based upon. | What are presentation nodes? By presentation, we mean human presentation. Presentation nodes can be represented as characters on a page, or as a tree in a YAML text editor that has such things as scalar style, etc. In the spec we define presentation not so much for what it is, but rather for what it isn't... it is used when we have aspects of YAML which are not considered part of a language-indepenent representation of your native data. Serialization nodes are somewhere between the two, they are representation nodes that have been 'flattened' to fit onto a sequential access interface. | Assuming "representation graph" means the conceptual data structure... | then, nodes are tagged (explicitly or through the resolver) Good so far... and a representation which is fully tagged where each tag is known by the processor is called a 'complete representation' | or left untagged (marked as a plain scalar). Well, the spec leaves this a bit vague (for now). But yes, you could have something very similar to a complete representation having tags which are blank. We don't have a good word for this case yet, "incomplete" doesn't quite say enough beacuse it can be incomplete due to a failure to resolve implicit types, or to bind the types to make a canonical form or native objects. | If "presentation nodes" means the physical YAML document, then the nodes are | either tagged, implicit or left untagged (plain scalar). Correct? In the presentation layer there are definately two distinct questions: tagged vs untagged plain scalar or not | What's the difference between a scalar, and a plain scalar? Isn't something | either typed or left as a scalar? an untagged node would have: a kind (scalar, mapping, sequence) a plain flag (applys only to scalars) By mixing plain into kind it makes things confusing, and this is what I was hoping to avoid. While it may be an implementation decision how to model a 3-state enum plus a flag that only happens on one of the enum possiblities, there is a conseptual difference. Kind (the three state enum) is part of the YAML representation, while the plain flag is not. So, merging them into a four-state enum is confusing at best; but may be the cleanest API option. ;( Don't say we didn't call this plain scalar thingy a hack. ;) Clark -- Clark C. Evans Prometheus Research, LLC Chief Technology Officer Turning Data Into Knowledge cc...@pr... www.prometheusresearch.com |
From: Sean O'D. <se...@ce...> - 2004-02-05 06:40:38
|
On Wednesday 04 February 2004 06:26 pm, Clark C. Evans wrote: > On Wed, Feb 04, 2004 at 05:12:40PM -0800, Sean O'Dell wrote: > | > tagged: > | > Tagged nodes are those which appear in the YAML > | > stream with a non-empty "!tag" or have been tagged > | > by the resolver. Tagged nodes do not record if they > | > were created from plain scalars or not, this 'hack flag' > | > is only used during he tagging process. > | > | Is the resolver a new step in the loading process? > | Parser->Loader->Resolver? > > The three stage breakdown in the spec is: > > representation -- modeling your native data structures in > a langauge independent manner > > serialization -- flattening these representations so that > they can pass through a sequential-access > medium such as a series of event calls. > > presentation -- making the serializations look pretty Ah, so representation is essentially the fully canonical, resolved data structure. So, at presentation level, there are lexical idioms which eventually get translated into a simpler, more straight-forward serialization form? Again, I'm harping at the terminology thing, but I would have said: Complex Data Graph Flattened Data Graph Fully Resolved Data Graph Or something along those lines...something that newcomers can pick up on. > Serialization nodes are somewhere between the two, they are > representation nodes that have been 'flattened' to fit onto > a sequential access interface. Serialization is basically just eliminating the trickier syntax available to fully-compliant YAML and putting it into a more uniform syntax, right? > In the presentation layer there are definately two distinct > questions: > > tagged vs untagged > plain scalar or not > > | What's the difference between a scalar, and a plain scalar? Isn't > | something either typed or left as a scalar? > > an untagged node would have: > a kind (scalar, mapping, sequence) > a plain flag (applys only to scalars) > > By mixing plain into kind it makes things confusing, and this > is what I was hoping to avoid. While it may be an implementation > decision how to model a 3-state enum plus a flag that only > happens on one of the enum possiblities, there is a conseptual > difference. Kind (the three state enum) is part of the YAML > representation, while the plain flag is not. So, merging them > into a four-state enum is confusing at best; but may be the > cleanest API option. ;( I'm curious about the flag. Why isn't it just: kind: scalar, mapping, sequence scalar-type: core-types: raw, str, int, float, timestamp, bool Why not just default to a "raw" or "plain" type? Why a plain/unplain flag? Why don't you just give it a type and let the type determine whether it's plain or some other type? Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-02-05 09:52:08
|
Hi Sean, It seems the terminology needs clarification... I'll try to do so (this will be long; apologies in advance). I'll introduce a new term, (tagging) category. Clark, I suggest that you consider working this term into the spec - I believe it would reduce the confusion. Sean O'Dell [mailto:se...@ce...] wrote: > Ah, so representation is essentially the fully canonical, > resolved data structure. Yes. > So, at presentation level, there are lexical idioms which > eventually get > translated into a simpler, more straight-forward serialization form? > Serialization is basically just eliminating the trickier > syntax available to > fully-compliant YAML and putting it into a more uniform syntax, right? You seem to be using the term "serialization" to mean "the (results of the) conversion between two syntactical forms" (I think). The YAML spec uses "serialization" in its strict meaning: "the (results of the) conversion of a random access data structure to a sequential access data structure". It is commonly extended to mean "the (results of the) the complete process of conversion of data to a (sequential) character stream", but the YAML spec only makes use of the strict meaning. In the YAML spec, the complete process requires that: first, your data needs to be "represented" as a random-access data structure; then, it needs to be "serialized" to a sequential-access data structure; finally, it is "presented" as a character stream. It is in in this final "presentation" step that lexical (syntactical) idioms are introduced. In the other direction, "parsing" extracts a sequential data structure from the character stream, "composing" creates a random access data structure out of the sequential data, and "construction" creates native objects out of the random-access data. There's no step that converts a complex syntax into a simpler syntax. Note: the terms "representation", "serialization", "presentation" are overloaded to mean both the _process_ of generating something *and* the _results_ of that process. The direction of these processes is from the data to the character stream; going the other way, we have a name for each "reverse" process ("parsing" is the opposite of "presentation", "composing" of "serialization", "construction" of "representation"). The results of the reverse process are already named (the results of "parsing" are the "serialization", the results of "composing" are the "representation"). The spec tries to convey this in figure 3.1 and section 3.1. Given the above, I'll "translate" Clark: > > In the presentation layer there are definitely two distinct > > questions: > > tagged vs untagged > > plain scalar or not That is: - In the character stream, each node may be either tagged or untagged. - Each (scalar) node may be in the plain style, or in some other style. > > | What's the difference between a scalar, and a plain > > | scalar? Isn't > > | something either typed or left as a scalar? Each character-stream "node" has a style (for scalars, it is one of: literal, folded, single-quoted, double-quoted, plain). The only difference between a plain scalar node and any other scalar node is, well, the style it is written in. That's it. No other difference. In particular, it doesn't have a different "kind". Now, consider the following: --- plain: foo single quoted: 'foo' double quoted: "foo" literal: |- foo folded: >- foo ... All the "foo"s are untagged scalars. The spec is trying to say that a YAML processor must create the same native data object for all these "foo"s, except that it is allowed to create a different one for the first ("plain") foo. It might also use the same object for all the "foo"s, including the "plain" one. Note this object need not be a string (though, in a sane implementation, it would be). Why? s/foo/12/: --- plain: 12 single quoted: '12' double quoted: "12" literal: |- 12 folded: >- 12 ... A human would read the value of the plain 12 to be the integer 12, and the value of all the other keys as the string "12". We'd like YAML to allow for this. To do so, we say: > > an untagged node would have: > > a kind (scalar, mapping, sequence) > > a plain flag (applys only to scalars) That is: the information that the YAML processor is allowed to use when deciding on the native object to create for an *untagged* "foo" is its kind (scalar), and if it is a scalar, whether it was written in the "plain" style or not. And, of course, it uses the value of the scalar. That's it. The processor must not consider whether the scalar was written, say, in the literal vs. folded styles, or the single vs. double quoted style. It must not consider whether the indentation level used for the scalar was a prime number. It must not consider whether there was a comment in the vicinity of the scalar, or anything else. Everything other than "kind, value, was-scalar-written-in-the-plain-style" is considered to be a mere syntactical detail that has absolutely no bearing on the semantics of the scalar. > > By mixing plain into kind it makes things confusing, and > > this is what I was hoping to avoid. Right. We have three "kind"s - scalar, mapping, sequence. That's it. It just happens the YAML processor needs to know a bit more than the kind (and value) to correctly interpret a scalar. Let's call the information it does need to know the "tagging category". This category is the kind plus one bit (whether the scalar was written in the plain style). How to represent the category as a C data structure? That's an implementation design choice. Feel free to represent the tagging category as a 4-valued enum (that's what I'd do, anyway). Just keep in mind that tagging category != kind. Kind is a separate though related) concept that is defined by the YAML spec to be a 3-valued enum. Anyone know of a language where it is easy to do enum inheritance/extension? :-) So: > I'm curious about the flag. Why isn't it just: > > kind: scalar, mapping, sequence > scalar-type: > core-types: raw, str, int, float, timestamp, bool That would bless some (short) list of types. If my application wants to use an additional type, say "price", written in the format "$<float-value>", then this type won't be covered in the YAML spec. But we do want to allow for application-specific types of this sort. Hence, we need to specify some mechanism in the spec that allows for specifying additional application-specific types. But if we do that, there's no point in making some types "core" while other are "additional". What makes "timestamp" more important than, say, "price"? It is cleaner to have them all use the same mechanism. So, we asked ourselves, "what does the tagging category consist of?" - in other words, what does the YAML processor absolutely must know to provide typing for untagged nodes? The answer turned out: "kind, plus whether-a-scalar-was-written-in-the-plain-style". There's no deep theory behind it; it is just the way humans read YAML characters streams. Any implementation is then welcome to use the tagging category to provide whatever set of types it wants. > Why not just default to a "raw" or "plain" type? > Why a plain/unplain flag? > Why don't you just give it a type and let the type determine > whether it's plain or some other type? "type" is such an overloaded word... We prefer to restrict its use to "the data type of the native data object constructed from the YAML character stream". When looking at an untagged scalar node, we can't "just give it a type" and let the "type" decide if it is plain or not - we already know if it is plain or not, and we need to figure out the "type" (by assigning it a "tag"). Again, "plain" is not a "type". It is not a "kind". It is merely a syntactical "style" people may choose to use for writing down scalars. It turns out people use it to encode semantics into the character stream, hence the fact of its use made its way into the tagging category. Bottom line: As Pascal said - I apologize for the length of this letter, I didn't have the time to write it shorter. Clark did a good work in making it shorter in the spec. I suggest that working "tagging category" into there somewhere would help. Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-02-05 17:46:26
|
On Thursday 05 February 2004 01:51 am, Oren Ben-Kiki wrote: > It seems the terminology needs clarification... I'll try to do so (this > will be long; apologies in advance). I'll introduce a new term, > (tagging) category. Clark, I suggest that you consider working this term > into the spec - I believe it would reduce the confusion. Yeah, I think anything to help differentiate presentation from representation will save brain cycles for people trying to get other things done. =) > Sean O'Dell [mailto:se...@ce...] wrote: > > Ah, so representation is essentially the fully canonical, > > resolved data structure. > > Yes. Woohoo, score +1! > > So, at presentation level, there are lexical idioms which > > eventually get > > translated into a simpler, more straight-forward serialization form? > > > > Serialization is basically just eliminating the trickier > > syntax available to > > fully-compliant YAML and putting it into a more uniform syntax, right? > > You seem to be using the term "serialization" to mean "the (results of > the) conversion between two syntactical forms" (I think). That's what I had thought: that presentation was like "level 2 syntax" which could be broken down to a "level 1 syntax" which was called serialization. > The YAML spec uses "serialization" in its strict meaning: "the (results > of the) conversion of a random access data structure to a sequential > access data structure". It is commonly extended to mean "the (results of > the) the complete process of conversion of data to a (sequential) > character stream", but the YAML spec only makes use of the strict > meaning. > > In the YAML spec, the complete process requires that: first, your data > needs to be "represented" as a random-access data structure; then, it > needs to be "serialized" to a sequential-access data structure; finally, > it is "presented" as a character stream. It is in in this final > "presentation" step that lexical (syntactical) idioms are introduced. I guess I'm confused about what the difference is between random-access YAML and serialized YAML. YAML itself, I thought, was in a text stream form, with delimiters that a parser uses to walk through the structure. When I write YAML in a text editor, that's the presentation form, I get that. What happens to YAML when it becomes serialized? What does it look like? > In the other direction, "parsing" extracts a sequential data structure > from the character stream, "composing" creates a random access data > structure out of the sequential data, and "construction" creates native > objects out of the random-access data. > > There's no step that converts a complex syntax into a simpler syntax. Ah well, -1 for me. > Note: the terms "representation", "serialization", "presentation" are > overloaded to mean both the _process_ of generating something *and* the > _results_ of that process. The direction of these processes is from the > data to the character stream; going the other way, we have a name for > each "reverse" process ("parsing" is the opposite of "presentation", > "composing" of "serialization", "construction" of "representation"). The > results of the reverse process are already named (the results of > "parsing" are the "serialization", the results of "composing" are the > "representation"). > > The spec tries to convey this in figure 3.1 and section 3.1. You know, this is sad, because my most common use of the term serialize is exactly this meaning, but somewhere along the line, the explanation of it made me think it meant something else. Let me try a nutshell explanation again, to see if I get it yet: Representation: the fully resolved data structure, either held in your brain, on a wall chart or as a fully ready-to-eat native data structure. Serialization: the primitive data structure of a loaded YAML document, fully ordered in imitation of the authored YAML. Presentation: the YAML document as it appears in a text stream; either authored or generated. > Each character-stream "node" has a style (for scalars, it is one of: > literal, folded, single-quoted, double-quoted, plain). The only > difference between a plain scalar node and any other scalar node is, > well, the style it is written in. That's it. No other difference. In > particular, it doesn't have a different "kind". I see then, so it's not a flag, it's a style. For some reason, I thought it was just a boolean plain/unplain flag. > Now, consider the following: > > --- > plain: foo > single quoted: 'foo' > double quoted: "foo" > literal: |- > foo > folded: >- > foo > ... > > All the "foo"s are untagged scalars. The spec is trying to say that a > YAML processor must create the same native data object for all these > "foo"s, except that it is allowed to create a different one for the > first ("plain") foo. It might also use the same object for all the > "foo"s, including the "plain" one. Note this object need not be a string > (though, in a sane implementation, it would be). > > Why? s/foo/12/: > > --- > plain: 12 > single quoted: '12' > double quoted: "12" > literal: |- > 12 > folded: >- > 12 > ... > > A human would read the value of the plain 12 to be the integer 12, and > the value of all the other keys as the string "12". We'd like YAML to This strikes me as something that should be part of the implicit typing process. That all scalars should load untyped, and with only two styles: single or multi-line, and then through the process of implicit typing, determine that: *) Some single-line scalars are !str types (when wrapped in single or double quotes) *) Multi-line scalars are either literal or folder !str types. What I mean is, when initially loaded, all of the above scalars in your example would load as raw scalars, and be marked as having only the single or multi-line style. Further processing would reveal all to be of type !str except for the "plain:" scalar which would be determined to be of type !int. > How to represent the category as a C data structure? That's an > implementation design choice. Feel free to represent the tagging > category as a 4-valued enum (that's what I'd do, anyway). Just keep in > mind that tagging category != kind. Kind is a separate though related) > concept that is defined by the YAML spec to be a 3-valued enum. I would go with the 3-value enum, the single or multi-line style, explicit tagging and then value pattern-matching as the process for loading YAML scalars all the way up to native data types. I would definitely leave all typing to the final two tasks: explicit tagging and value pattern matching. > So: > > I'm curious about the flag. Why isn't it just: > > > > kind: scalar, mapping, sequence > > scalar-type: > > core-types: raw, str, int, float, timestamp, bool > > That would bless some (short) list of types. If my application wants to > use an additional type, say "price", written in the format > "$<float-value>", then this type won't be covered in the YAML spec. But > we do want to allow for application-specific types of this sort. Hence, > we need to specify some mechanism in the spec that allows for specifying > additional application-specific types. But if we do that, there's no > point in making some types "core" while other are "additional". What > makes "timestamp" more important than, say, "price"? It is cleaner to > have them all use the same mechanism. I agree. I do think those core types should become part of an implicit tagging mechanism, but you may as well have them load by default to keep YAML simple. However, that same mechanism might allow a YAML author to say "this is my core domain, not yaml.org" so those default types wouldn't be used at all. > So, we asked ourselves, "what does the tagging category consist of?" - > in other words, what does the YAML processor absolutely must know to > provide typing for untagged nodes? The answer turned out: "kind, plus > whether-a-scalar-was-written-in-the-plain-style". There's no deep theory > behind it; it is just the way humans read YAML characters streams. I think the tagging process would consist of: 1) Checking the kind for scalar (not map or seq, iow) 2) Single or multi-line style 3) Explicit tag given 4) Raw scalar value pattern match Here is where I apologize if I've skipped some important specification feature that is causing me not to connect with you, if I'm way off the mark with this. It really seems, though, that the process would be as simple as the order above. Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-02-05 22:46:59
|
Sean O'Dell wrote: > Let me try a nutshell explanation again, to see if I get it yet: > > Representation: the fully resolved data structure, either > held in your brain, > on a wall chart or as a fully ready-to-eat native data structure. +1. Well, -1e-6 (you know me, I have to nit-pick, right? :-): usually the wall chart or native data or whatever (Brian likes the term "Cave drawings") aren't trivially a 1-1 match for the way YAML does things. For example, a C structure isn't really a hash table, but YAML views it "as if" it is a hash table where the keys are the C struct member names. And so on. Hence we say we merely "represent" the C struct (or cave drawing or whatever) as a YAML "representation". > Serialization: the primitive data structure of a loaded YAML > document, fully ordered in imitation of the authored YAML. +1. The whole point of the "serialization" is this imposing of order (and, as an unavoidable side-effect, adding the notion of anchors and aliases). > Presentation: the YAML document as it appears in a text > stream; either authored or generated. +1. Three out of three. > > Each character-stream "node" has a style (for scalars, it > > is one of: > > literal, folded, single-quoted, double-quoted, plain). The only > > difference between a plain scalar node and any other scalar > > node is, > > well, the style it is written in. That's it. No other > > difference. In > > particular, it doesn't have a different "kind". > > I see then, so it's not a flag, it's a style. For some > reason, I thought it was just a boolean plain/unplain flag. Almost. The "plain" style is just that, a style. The "tagging category", however, does contain this boolean you talk of; it says whether the style of the scalar node happened to be the "plain" style. So the proper name for this boolean is "was-the-scalar-in-the-plain-style". Calling it "the plain flag" may be confusing, but is much shorter :-) > > A human would read the value of the plain 12 to be the > > integer 12, and > > the value of all the other keys as the string "12". We'd > > like YAML to > > This strikes me as something that should be part of the > implicit typing process. +10! You are perfectly correct. The spec calls the "implicit typing" process "tag resolution". The thing is this: The spec does not specify how the tag resolution process works. This is an important point to make. The spec doesn't require any particular mechanism to be used for tag resolution. Yes, you can use regexps on the content of the scalars. You can also use full BNF syntaxes, or generic detection code. Whatever. The spec does define two things: what the _input_ of the process is, and whet the expected _output_ is. That's all. The input of the process (let's call it the "tagging context") is: - The path leading to the untagged node. - The content of the untagged node. # I called the following two "tagging category": - The kind of the node. - If the node is a scalar node, whether it happened to be written in the plain style. The output of the process is: - A tag. That's it. Anything else is up to the implementation. A reasonable way would be: - Tag all untagged mappings as !map. - Tag all untagged sequences as !seq. - Tag untagged scalars as follows: - If they are not written in the plain style, tag them as !str. - If they are written in the plain style, use an ordered set of { tag, regexps } tuples. Use the tuple { !str, * } as the last set member. The first regexp that matches determines the tag. But that's just one (common, sensible) way of doing this. For example, the above completely ignores the path to the node. It makes a lot of sense that some applications will expect particular nodes to have particular tags, instead of using regexps to resolve them. Note that nodes can not remain "untyped" (or, in the spec's terminology, "untagged"). Why? Because in order to construct a native object from a node, you must know which data type to use. This knowledge is equivalent to knowing the node's tag. Sure, you can choose to leave some nodes "untagged", and never construct a "really native" native data object for them. Instead, you can store them in memory in some YAML-specific DOM-like generic data structure. The spec calls this an "incomplete representation". It is "incomplete" because you don't know the tag. That's OK for things like a YAML pretty-printer, for example. > I think the tagging process would consist of: > > 1) Checking the kind for scalar (not map or seq, iow) > 2) Single or multi-line style No: "plain style" vs. "all other styles". Note that the plain style may be multi-line, and the quoted styles may be single line... And that you are _explicitly forbidden_ from distinguishing between the following two cases: --- case 1: foo bar case 2: foo bar ... You _must_ type them the same way, because the "typer" (what the spec would call the "tag resolver") gets the same "tagging category": { kind: scalar, was-scalar-plain: true } and same node value: "foo bar". The path would be different ({ path: "/case 1" } vs. { path: "/case 2" }), but you don't resolve tag according to path (others might). > 3) Explicit tag given > 4) Raw scalar value pattern match Fine. Your call. It matches the spec's requirements (up to the correction I made above), hence it is a valid way to resolve tags ("type the nodes"). Just don't fall into the trap of believing it is the _only_ way to do this. We intentionally leave this choice to the implementation. > Here is where I apologize if I've skipped some important > specification feature > that is causing me not to connect with you, if I'm way off > the mark with this. Not at all. > It really seems, though, that the process would be as > simple as the order above. Sure, it could be, and I think we are, in general, in agreement about your proposed mechanism as a guide for implementing things like a Ruby loader. We just don't want to set it in stone in the spec. A crucial point is this: Tag resolution may be driven by a schema. (For example, make use of the path leading to the node). At this point in time, we only have a just glimmering of a notion of what the schema specification would be. We are definitely not going to delay the spec until we have one. At any rate, while tag resolution may be driven by a schema, in general it need not be. It is better all around to simply leave the details of the tag resolution out of the spec. Which we did. Sean, thanks for reading through my too-long postings and taking the time to reply. This is exactly the sort of feedback we need to improve the quality of the spec. I really like the terms "tagging context" and "tagging category" that surfaced out of this discussion. Hopefully these will prevent other readers of the spec from getting it the hard way (like you had to). Clark, what do you think? Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-02-05 23:26:43
|
On Thursday 05 February 2004 02:46 pm, Oren Ben-Kiki wrote: > Sean O'Dell wrote: > That's it. Anything else is up to the implementation. A reasonable way > would be: > - Tag all untagged mappings as !map. > - Tag all untagged sequences as !seq. > - Tag untagged scalars as follows: > - If they are not written in the plain style, > tag them as !str. > - If they are written in the plain style, > use an ordered set of { tag, regexps } tuples. > Use the tuple { !str, * } as the last set member. > The first regexp that matches determines the tag. So my Syck patch to perform implicit typing is legal with the current spec? > But that's just one (common, sensible) way of doing this. For example, > the above completely ignores the path to the node. It makes a lot of > sense that some applications will expect particular nodes to have > particular tags, instead of using regexps to resolve them. Yeah, but this step is easily added; it's just another datum to consider. I actually want to be able to switch implicit typing according to the path. Any information is useful in making the typing decision. > > I think the tagging process would consist of: > > > > 1) Checking the kind for scalar (not map or seq, iow) > > 2) Single or multi-line style > > No: "plain style" vs. "all other styles". Note that the plain style may > be multi-line, and the quoted styles may be single line... And that you > are _explicitly forbidden_ from distinguishing between the following two > cases: I think I understand this now, but I'm not sure why it's this way. I think the pre-typing of quoted scalars into !str types has something to do with it...somehow, that's snagging in my brain...it doesn't make sense. To me, implicit typing should happen last, after explicit typing, and no typing at all be done prior to that. Scalars should remain raw until typed. At the end, when implicit typing is allowed, I can see converting any scalars which are bounded by quotes into string types, but prior to that no. > > It really seems, though, that the process would be as > > simple as the order above. > > Sure, it could be, and I think we are, in general, in agreement about > your proposed mechanism as a guide for implementing things like a Ruby > loader. We just don't want to set it in stone in the spec. A crucial > point is this: > > Tag resolution may be driven by a schema. In the future. But even then, doesn't it make sense that an implementation would be given the freedom to optionally type nodes (assign a specific object in place of the node) as the developer sees fit? > (For example, make use of the path leading to the node). At this point > in time, we only have a just glimmering of a notion of what the schema > specification would be. We are definitely not going to delay the spec > until we have one. At any rate, while tag resolution may be driven by a > schema, in general it need not be. It is better all around to simply > leave the details of the tag resolution out of the spec. Which we did. I agree with this...I personally think that implicit tag resolution should come in this order: 1) free and open for the implementation to decide 2) driven by a schema My personal feeling is that when it comes to implicit typing, flexibility is the most important feature, so while a schema may say "this is of this type" the programmer using any given implementation (say, Syck) is free to take that information and generate any native object they want in its stead. Basically, a schema saying a scalar is of a certain type does change the data model to comply with the schema, but the actual native-language data tree generated may contain objects that the schema author did not expect. It doesn't change the conceptual data model at all, but it does change which objects comprised the final native data tree. > Sean, thanks for reading through my too-long postings and taking the > time to reply. This is exactly the sort of feedback we need to improve > the quality of the spec. I really like the terms "tagging context" and > "tagging category" that surfaced out of this discussion. Hopefully these > will prevent other readers of the spec from getting it the hard way > (like you had to). Clark, what do you think? No problem, but I am clucking my tongue a little. I have no clue what "tagging category" and "tagging context" mean. It seems somehow I have unwittingly assisted in the generation or two more cryptic terms. I have only myself to blame this time. Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-02-05 23:50:49
|
Sean O'Dell wrote: > So my Syck patch to perform implicit typing is legal with the > current spec? Seems so. > I think I understand this now, but I'm not sure why it's this > way. Because that's the cleanest way we found - lump _all_ the typing issues into one "tag resolution" process, constrain its inputs and outputs, and let the implementations do whatever they want within this framework. > I think > the pre-typing of quoted scalars into !str types has > something to do with > it...somehow, that's snagging in my brain...it doesn't make > sense. To me, > implicit typing should happen last, after explicit typing, > and no typing at > all be done prior to that. Scalars should remain raw until > typed. At the > end, when implicit typing is allowed, I can see converting > any scalars which > are bounded by quotes into string types, but prior to that no. Whatever - that's all within the very large range of behaviors we allow for. > ... doesn't it make sense that an implementation > would be given the freedom to optionally type nodes (assign a > specific object > in place of the node) as the developer sees fit? Yes, and that's exactly what we allow for. The only constraints are about what "mere syntactical details" the developer must not use (e.g., distinguishing between single quoted and double quoted styles). Otherwise, it is completely up to him. The reason for these constraints is that I want to be able to fire up VI on a YAML file and, say, convert a single-quoted scalar to a double-quoted one (for escaping), or maybe line-wrap a value if the line is too long for printing, etc.; and I want to do all these knowing with 100% certainty that I didn't change the semantics of the document. Without these constraints, I can't do that. What if the developer decided to check whether the indentation level was a prime number in order to decide which data type to use, and I re-indented it to 4 spaces? > ...I personally think that implicit tag > resolution should come in this order: > > 1) free and open for the implementation to decide > 2) driven by a schema You can do that. Again, it falls within the wide range of behaviors we allow. > My personal feeling is that when it comes to implicit typing, > flexibility is > the most important feature, Hence we allow _anything_ obeying the constraints we set up. > No problem, but I am clucking my tongue a little. I have no > clue what > "tagging category" and "tagging context" mean. It seems > somehow I have > unwittingly assisted in the generation or two more cryptic > terms. I have > only myself to blame this time. "Tagging context" is the sum of all inputs available to whatever tag resolution method you employ. Given that, you can do anything - consult a schema, or not; use regexps, or not; do these in any order, or at the same time, or whatever. The final result is the node's tag. That's it. "Tagging category" is probably not necessary as a separate term. So you only caused the creation of one term :-) Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-02-06 17:24:47
|
On Thursday 05 February 2004 03:50 pm, Oren Ben-Kiki wrote: > Sean O'Dell wrote: > "Tagging context" is the sum of all inputs available to whatever tag > resolution method you employ. Given that, you can do anything - consult > a schema, or not; use regexps, or not; do these in any order, or at the > same time, or whatever. The final result is the node's tag. That's it. Makes perfect sense. =) Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-02-05 16:06:53
|
On Wed, Feb 04, 2004 at 10:40:32PM -0800, Sean O'Dell wrote: | On Wednesday 04 February 2004 06:26 pm, Clark C. Evans wrote: | > On Wed, Feb 04, 2004 at 05:12:40PM -0800, Sean O'Dell wrote: | > | > tagged: | > | > Tagged nodes are those which appear in the YAML | > | > stream with a non-empty "!tag" or have been tagged | > | > by the resolver. Tagged nodes do not record if they | > | > were created from plain scalars or not, this 'hack flag' | > | > is only used during he tagging process. | > | | > | Is the resolver a new step in the loading process? | > | Parser->Loader->Resolver? | > | > The three stage breakdown in the spec is: | > | > representation -- modeling your native data structures in | > a langauge independent manner | > | > serialization -- flattening these representations so that | > they can pass through a sequential-access | > medium such as a series of event calls. | > | > presentation -- making the serializations look pretty | | Ah, so representation is essentially the fully canonical, resolved data | structure. Yep. | | So, at presentation level, there are lexical idioms which eventually get | translated into a simpler, more straight-forward serialization form? Well, the presentation level is your YAML syntax, your text document. And yes, there are lots of human presentation details which are not part of the document's YAML representation. | Again, I'm harping at the terminology thing, but I would have said: | | Complex Data Graph | Flattened Data Graph | Fully Resolved Data Graph | | Or something along those lines...something that newcomers can pick up on. Well, if you are worried about terminology, two days ago we got a rather nice email from Denis Howe who said: "The spec is fab btw, a model of conciseness and clarity, just like YAML." When asked about which version of the spec in a reply, he responded; "Yes, the 2004-01-29 version. The represent, serialize, present stuff made perfect sense, though it seemed to be stating the obvious. I guess that's a Good Thing in a spec." So, with that being our only "newcomer" feedback so far on the rewrite of the specification, I'm quite happy with the choice of terminology. Of course, we are always looking to improve! | Serialization is basically just eliminating the trickier syntax available to | fully-compliant YAML and putting it into a more uniform syntax, right? Well, I wasn't considering that serialization would have a syntax, but I suppose if it did, it would be something very close to the YAML canonical form, where styles, comments, etc. would be discarded. But really, it would be better to think of the serialization level as a sequential "event" API like SAX or something similar. Cheers! Clark P.S. I'm curious if you really dug-into the new spec on the website? |