From: David H. <dav...@bl...> - 2004-09-09 01:35:18
|
(Clark's "simple proposal" is #1.) - use the term "tag-specifier" for the strings used to specify tags in the presentation model, and make a consistent distinction in the spec between tags and tag-specifiers (renaming EBNF productions, etc. where necessary). - when asked for the tag of a mapping, sequence or non-plain scalar node that does not have a tag-specifier, the parser reports <null>. (<null> does not necessarily mean a NULL pointer or null reference; that depends on the API.) For plain scalars without a tag-specifier see below. - when presenting a YAML document, if the tag of a node is <null> then the presenter MUST NOT include a tag specifier for that node. For scalar nodes with a <null> tag, the presenter MUST use a non-plain style. - tag:yaml.org,2002:plain is added to the tag repository, and MUST be supported. This is the type that is used by default if a plain scalar node cannot be automatically resolved to a more specific type. The types that can be automatically recognised are the following tag:yaml.org,2002: types: bool float (excluding "0"; see below) int merge null timestamp value The regexps for these are all disjoint. This does not imply that support for any of the above types is required; a YAML implementation need not recognize types that it does not support. For example: %TAG !! tag:example.org,2004: # makes no difference --- - 42 # tag:yaml.org,2002:int - 0x2A # tag:yaml.org,2002:int - true # tag:yaml.org,2002:bool - 4.2e1 # tag:yaml.org,2002:float - 0 # tag:yaml.org,2002:int - 0.0 # tag:yaml.org,2002:float - << # tag:yaml.org,2002:merge - ~ # tag:yaml.org,2002:null - 2004-09-09 00:12:34.00 +01 # tag:yaml.org,2002:timestamp - = # tag:yaml.org,2002:value - fred # tag:yaml.org,2002:plain - {} # <null> - [] # <null> APIs SHOULD provide a way to disable this behaviour, in which case the tag of all plain scalar nodes is reported as tag:yaml.org,2002:plain (mapping, sequence and non-plain nodes are still reported as <null>). [BTW, the link for tag:yaml.org,2002:special is broken. What is it? Also, why is there a \t in the regexp for timestamp?] - clarify that the set of canonical forms for a tag MUST be a subset of its regexp. Currently tag:yaml.org,2002:float does not satisfy this; remove "0" from its set of canonical forms (the canonical form for +0.0 becomes "0.e+0"). - redefine equality as an operation that can produce three results: equal, unequal, or unknown. The result is unknown iff it depends on computing a canonical value for a scalar node with a tag that is not recognized by the implementation. (It is still possible for two scalar nodes with an unrecognized tag to be reported as equal, if they have exactly the same string value.) The value of a node includes its kind, so the definition of equality does not depend on each tag being associated with a single kind. Remove the definition of identity in 3.2.1.3. Whether nodes are represented by the same native object has nothing to do with the representation model or with equality; it's an implementation detail. (The mathematical model of representations that I'm going to post very soon has a notion of graph "equivalence" which might be more useful.) - the descriptions of !!seq, !!map and !!str are moved out of the spec and into the tag repository (since they are not used normatively, only in examples). - remove all discussion of resolution from the spec. (Remember, what is not specified is allowed.) - explain that if a YAML API maps nodes that might be unequal in the YAML representation model to the same native value, this is considered to be a transformation. If an API purports to be able to represent any graph in the YAML representation model, then it MUST use a representation that distinguishes all graphs that *might* be unequal. This implies that, in such a representation, it must be possible to include as keys in a mapping two values whose equality is unknown. However, an API can be useful if it does not support this requirement, or if it only supports it when used in certain ways. Rationale: The fact that the regexps for all of the yaml.org scalar types except binary are disjoint was simply too tempting. Specifying which common types can be automatically recognized in a plain scalar will improve consistency and interoperability, and will work well for 99% of applications. Any applications for which this does not work well can simply disable the default resolution of plain scalars to yaml.org types. Anticipating possible objections: yes, creating dependencies on ~8 of the types in the YAML repository is inelegant. Also, future scalar types may well not be automatically recognizable. However, this doesn't matter in practice as long as the *most common* types are recognizable. It is unnecessary to define non-<null> tags for unspecified sequences, mappings, and non-plain scalars, because the kind already distinguishes these from each other. -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-09 03:11:47
|
Ok. I'm opposed to the NULL idea since it makes TAG optional in the model; this makes equality harder to define and complicates the API. I'm also -0 on the plain scalar auto-typing thingy, as I think it's better to require application developers specify how they want to resolve tags. On Thu, Sep 09, 2004 at 02:35:09AM +0100, David Hopwood wrote: | - use the term "tag-specifier" for the strings used to specify tags in | the presentation model, and make a consistent distinction in the spec | between tags and tag-specifiers (renaming EBNF productions, etc. where | necessary). Agreed. !!int is a tag specifier, tag:yaml.org,2002:int is a tag Also, !private happens to be both a tag specifier and a tag. In my proposal ?mapping is a tag; but is _not_ a tag specifier. | - when asked for the tag of a mapping, sequence or non-plain scalar node | that does not have a tag-specifier, the parser reports <null>. | (<null> does not necessarily mean a NULL pointer or null reference; | that depends on the API.) For plain scalars without a tag-specifier | see below. | | - when presenting a YAML document, if the tag of a node is <null> then | the presenter MUST NOT include a tag specifier for that node. For | scalar nodes with a <null> tag, the presenter MUST use a non-plain style. *nod* (Proposal #1 would has something equivalent, just not detailed) | - tag:yaml.org,2002:plain is added to the tag repository, and MUST be | supported. This is the type that is used by default if a plain scalar | node cannot be automatically resolved to a more specific type. In proposal #1, the 'concrete' types are moved into the tag repository, and four 'undefined' ?implicit types are added. One of them is equivalent in thought to ?plain. | The types that can be automatically recognised are the following This is the old behavior of pre-1.0, and the reason why the regex are disjoint | APIs SHOULD provide a way to disable this behaviour, in which case the | tag of all plain scalar nodes is reported as tag:yaml.org,2002:plain | (mapping, sequence and non-plain nodes are still reported as <null>). Hmm. So, the spec requires that implicit typing works, but that it can be turned off. This _is_ different from | [BTW, the link for tag:yaml.org,2002:special is broken. What is it? http://yaml.org/type/special/ | Also, why is there a \t in the regexp for timestamp?] someone (not me) wanted to press tab? | - clarify that the set of canonical forms for a tag MUST be a subset | of its regexp. | | Currently tag:yaml.org,2002:float does not satisfy this; remove "0" | from its set of canonical forms (the canonical form for +0.0 becomes | "0.e+0"). nice | - redefine equality as an operation that can produce three results: | equal, unequal, or unknown. The result is unknown iff it depends on | computing a canonical value for a scalar node with a tag that is not | recognized by the implementation. | (It is still possible for two scalar nodes with an unrecognized tag | to be reported as equal, if they have exactly the same string value.) | | The value of a node includes its kind, so the definition of equality | does not depend on each tag being associated with a single kind. You've not described how equality happens with NULL tags. This oversight is exactly why I don't like about NULL tags (or do I say a tag-slot that is empty?). It is an unnecessary special case. | Remove the definition of identity in 3.2.1.3. Whether nodes are | represented by the same native object has nothing to do with the | representation model or with equality; it's an implementation detail. Hmm. Thinking required. | - remove all discussion of resolution from the spec. With this proposal, I'm not sure you can do this. | - explain that if a YAML API maps nodes that might be unequal in the YAML | representation model to the same native value, this is considered to | be a transformation. If an API purports to be able to represent any | graph in the YAML representation model, then it MUST use a | representation | that distinguishes all graphs that *might* be unequal. This implies | that, | in such a representation, it must be possible to include as keys in a | mapping two values whose equality is unknown. | | However, an API can be useful if it does not support this requirement, | or if it only supports it when used in certain ways. Nice | Rationale: | | The fact that the regexps for all of the yaml.org scalar types except | binary are disjoint was simply too tempting. Specifying which common types | can be automatically recognized in a plain scalar will improve consistency | and interoperability, and will work well for 99% of applications. Any | applications for which this does not work well can simply disable the | default resolution of plain scalars to yaml.org types. | | Anticipating possible objections: yes, creating dependencies on ~8 of the | types in the YAML repository is inelegant. Also, future scalar types | may well not be automatically recognizable. However, this doesn't matter | in practice as long as the *most common* types are recognizable. I'm in somewhat agreement of the value, but hesitant. | It is unnecessary to define non-<null> tags for unspecified sequences, | mappings, and non-plain scalars, because the kind already distinguishes | these from each other. Yea, but making the tag optional in the info model makes a special case everywhere to be checked, ie, a node.tag == xxx isn't sufficient, one has to now do node.tag is null and node.kind = 'mapping', etc. Clark |
From: David H. <dav...@bl...> - 2004-09-09 23:25:10
|
Clark C. Evans wrote: > | [BTW, the link for tag:yaml.org,2002:special is broken. What is it? > http://yaml.org/type/special/ Proposal: remove this type. "Transfer method" is undefined, but as far as I can tell the motivation is: # In-memory, the encoded data may be accessed and manipulated in a standard # way using the three basic data types (mapping, sequence and string), # allowing limited processing to be applied to arbitrary YAML data. However, this is so limited that it doesn't seem to be a good complexity/functionality tradeoff. Such processing is better done using language-specific APIs. -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-10 00:14:13
|
David, This file was quite historical; so old infact, that I've updated the types with the versions from CVS. Anyway, the relevant paths are: http://yaml.org/spec/ # canononical path to current spec http://yaml.org/spec/spec.html # current CSV snapshot of the spec http://yaml.org/spec/type.html # the listing of types Cheers! Clark |
From: Clark C. E. <cc...@cl...> - 2004-09-09 15:47:47
|
On Thu, Sep 09, 2004 at 02:35:09AM +0100, David Hopwood wrote: | - redefine equality as an operation that can produce three results: | equal, unequal, or unknown. The result is unknown iff it depends on | computing a canonical value for a scalar node with a tag that is not | recognized by the implementation. I like this change, it will also apply to ?unspecified tags which, by definition, are never recognized as they must be resolved. | (It is still possible for two scalar nodes with an unrecognized tag | to be reported as equal, if they have exactly the same string value.) Correct. | The value of a node includes its kind, so the definition of equality | does not depend on each tag being associated with a single kind. It doesn't hurt to be redundant, no? | Remove the definition of identity in 3.2.1.3. Whether nodes are | represented by the same native object has nothing to do with the | representation model or with equality; it's an implementation detail. It is there for two reasons: - as a contrast to 'equality', with which it is sometimes confused; and - to help determine which sorts of objects they should use, if their application requires that two 'nodes' with the same string value remain distinct, then they should use a collection (such as a sequence with exactly one value). Given YAML's purpose as a cross-language serialization tool, I think the latter point is quite important, no? Perhaps not. I'm not dead set against keeping it in. Cheers, Clark |
From: David H. <dav...@bl...> - 2004-09-09 20:18:11
|
Clark C. Evans wrote: > On Thu, Sep 09, 2004 at 02:35:09AM +0100, David Hopwood wrote: > | - redefine equality as an operation that can produce three results: > | equal, unequal, or unknown. The result is unknown iff it depends on > | computing a canonical value for a scalar node with a tag that is not > | recognized by the implementation. > > I like this change, it will also apply to ?unspecified tags which, by > definition, are never recognized as they must be resolved. > > | (It is still possible for two scalar nodes with an unrecognized tag > | to be reported as equal, if they have exactly the same string value.) > > Correct. > > | The value of a node includes its kind, so the definition of equality > | does not depend on each tag being associated with a single kind. > > It doesn't hurt to be redundant, no? It can do. The fact that a tag is associated with a single kind is simply a matter of convention; it isn't enforced, and can't be enforced for unrecognized tags. If implementations assume this property in other places where it isn't used redundantly, they will get into trouble. > | Remove the definition of identity in 3.2.1.3. Whether nodes are > | represented by the same native object has nothing to do with the > | representation model or with equality; it's an implementation detail. > > It is there for two reasons: > > - as a contrast to 'equality', with which it is sometimes confused; and > > - to help determine which sorts of objects they should use, if their > application requires that two 'nodes' with the same string value > remain distinct, then they should use a collection (such as > a sequence with exactly one value). In that case it's defined incorrectly. # Node equality should not be confused with node identity. Two nodes are # identical only when they represent the same native object. Typically, this # corresponds to a single memory address. During serialization, equal scalar # nodes may be treated as if they were identical. In contrast, the seperate # identity of two distinct, but equal, collection nodes must be preserved. Nodes don't represent native objects; native objects represent nodes. Two distinct native objects can represent the same (identical) node. As a concrete example consider Java String objects: there can be two objects "foo", representing identical YAML scalar tag:yaml.org,2002:str nodes. So, two nodes are *not* identical only when they represent (more precisely, are constructed as) the same native object. How many native objects there are simply has nothing to do with it. Defining identity in the representation model in terms of identity of native objects is backwards, anyway: identity constrains the valid choices of native representations, not vice-versa. Part of the problem here is that there are two kinds of native representation, unambiguous and ambiguous. If a Python API implementing YAML "constructs", say, !!int 1 and !!float 1.0 using Python objects that cannot both be used as keys in the representation of a mapping, then it is using an ambiguous representation: it cannot represent every valid element of the YAML node graph model distinctly. This is not unreasonable, *provided* that it is clearly documented that the representation is ambiguous, and how it is ambiguous. OTOH, to handle YAML documents generically, without making assumptions that are only valid in the context of particular applications/formats, it's absolutely necessary to have an unambiguous representation that can distinctly represent every possible YAML node graph. This is not difficult, even though in some cases it may require using types that are slightly less idiomatic for a given language. -- David Hopwood <dav...@bl...> |
From: Oren Ben-K. <or...@be...> - 2004-09-11 04:48:36
|
On Thursday 09 September 2004 23:17, David Hopwood wrote: > It can do. The fact that a tag is associated with a single kind is > simply a matter of convention; it isn't enforced, and can't be > enforced for unrecognized tags. Again: The spec can't "enforce" *anything*. The spec simply states what call a "YAML schema", what is we call a "conforming YAML processor" and so on. So, you are free to have tags which apply to more than one kind. Just don't call them YAML tags and don't call your schema a YAML schema. If you do, well, we'll complain loudly. Since YAML is a trademark (I think), we might even be able to force you to rename your stuff to "NotYaml"... But we can't and don't have the faintest motivation to make you stop doing whatever works for you :-) > If implementations assume this > property in other places where it isn't used redundantly, they will > get into trouble. -10. Must applications have to carry a load of redundant data and extra processing logic on the off-chance someone feeds them NotYaml pretending to be YAML? > Nodes don't represent native objects; native objects represent nodes. -10. This is a major point. YAML is a data serialization language. "In the beginning, there was the data". It is _then_ serialized. When you load a YAML document, you are *RE*constructing the "data" that was originally serialized into the document. It doesn't matter if the document was written by hand. Whoever wrote the document had some "data" - whether in a computer storage or just in his mind - which he serialized into YAML. Loading - either into computer storage or into the mind of a human reader - reverses the process. It _can't_ happen that a YAML document gets created spontaneously and _then_ someone extracts the data from it. The big bang wasn't a well-formed YAML document, and neither are virtual particle pairs :-) So... Nodes _do_ represent native objects. That's why we call this a "representation" :-) Now, the whole "identity" thing makes more sense. It talks about being able to serialize data where the same object is reachable through different paths. Operationally, when loading the data, you want to say "the object _here_ is the same one as the one I already loaded from _there_". The spec says that if you present your data object as a YAML scalar, then YAML will _not_ preserve the identity of your native objects; when someone else loads the document, in his "data", all equal loaded objects may be the same object - or they might each be a different one even if you used anchors in the YAML document. The spec it says that if you present your data as YAML collections, then YAML _will_ preserve the identity; whenever anyone will load the YAML document, he'll have exactly one object per identical node, so by using anchors you can ensure he'll have exactly one data object for each of your data objects. And again, if an application depends on this rule, and some schema says differently, "bad things will happen". Well, again: an implementation need not be burdened with defensive programming against someone trying to pass a NonYaml as if it was YAML. > ... a Python API ... cannot > represent every valid element of the YAML node graph model > distinctly... If your original data is Python, you simply define a strict YAML schema that works for you, which is actually quite easy even if Python doesn't distinguish between !!int and !!float. Just say all your numbers are !!float, for example, even if they have no '.' in them. Seems simple enough to me. However, if you load someone's schema for serializing Java data into Python, you will face problems, because his data uses types that have no direct match in your types set. You have two options: - Strictly adhere to the semantics he had in mind for his data, by defining new Python types, or using shadow objects to preserve non-Python information, or any other trick. Costly, difficult, but is strict YAML and, more importantly, you are safe knowing that you did not mangle what the document means (the original semantic of the data). - Take shortcuts, coerce types to the nearest not-quite-appropriate Python type, and so on. Fine. As you point out, this is quite reasonable in many cases. Again, the spec doesn't forbid you from doing that. It just asks you nicely (OK, we'll sue you for every penny you got if you don't :-) to clearly document "if you use the parser in this mode, it isn't a simple loading of a YAML document. What you get is NOT what the original document writer has intended. But it is very simple and vey useful, so you will want to do this anyway". > ... to handle YAML documents generically, without making > assumptions that are only valid in the context of particular > applications/formats, it's absolutely necessary to have an > unambiguous representation that can distinctly represent every > possible YAML node graph. What is important is to agree that the spec can lay down rules in the first place! Whatever rules you come up with, someone will say "well, you can't enforce that, so my code also have to deal with the case the document breaks this rule". You'll be getting nowhere fast. I think the current spec does a good job at defining a way to handle YAML documents generically without assuming anything about the particular *valid YAML schema* (== "application"). If the document does not follow a "valid YAML schema", it isn't YAML, and I can't say _anything_ about it. What if it uses the indentation level to encode information? What if comments contain meaningful data? What if the meaning depends on the time of day I read it at? Have fun, Oren Ben-Kiki |
From: Damian C. <dam...@gm...> - 2004-09-13 13:57:58
|
On Fri, 10 Sep 2004 08:29:15 +0300, Oren Ben-Kiki <or...@be...> wrote: > [...Python dicts are slightly different from YAML mappings, so ...] > You have two options: > > - Strictly adhere to the semantics > > - Take shortcuts, coerce types to the nearest not-quite-appropriate > Python type, and so on. If I may be forgiven an XML-based analogy... With Python there are two main ways of handling XML inputs: strictly conforming (basically DOM with all its warts and special cases), and Python-friendly (like elementtree, which treats XML elements as Python lists and attributes like dictionaries, etc.). When I am writing code to *use* an XML document, I generally use elementtree. If I were writing a program that is a generic XML processor (e.g., my own schema validator or templating system or whatever), then one of the complex DOM implementations is required to get fully nuanced XML. With YAML, the same may apply, but we have two important advantages. First, the 'Python-friendly' types are much closer to the YAML model, so we can use them almost always and get away with it. Second, even if we do need 'YAML-pedantic' types, they will not actually be that much harder to work with than the native Python types. My suggestion for the spec is that there could be an informative annex or supplement outlining the divirgences of various popular languages' native types from the YAML model and saying that if you're relaxed about these differences it is OK to use a YAML parser/loader that uses native types. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |
From: Oren Ben-K. <or...@be...> - 2004-09-13 19:06:16
|
On Monday 13 September 2004 16:57, Damian Cugley wrote: > My suggestion for the spec is that there could be an informative > annex or supplement outlining the divirgences of various popular > languages' native types from the YAML model and saying that if you're > relaxed about these differences it is OK to use a YAML parser/loader > that uses native types. The best place for this would be in the documentation of the common types - !!int, !!float and so on. That's the perfect place to discuss how they do or don't map to popular programming languages types, and how to deal with miusmatches. Have fun, Oren Ben-Kiki |