## Re: [Yaml-core] Equality issues

 Re: [Yaml-core] Equality issues From: Clark C. Evans - 2004-09-07 03:54:13 ```David, Another thing that has been floating around in my head is the notion of equality between objects of different types. This would, at least, solve some of the resolution issues that have made me want to stick with transforms. There are two ways to do identity: (a) have an equals operator, or (b) have a canonical form. We chose (b) since it is a resonable requirement and if one can do (b), it is an implementation optimization to also provide (a). Anyway, I was also thinking we may have to add a 'cast' operator to scalar tags. *sigh* It's often the case that mappings consider 1 (integer) and 1.0 (float) as the same value, and thus consider them equal. So, a cast operator would be something like: cast Scalar tags must provide a mechanism to convert values from another tag to a value having its tag, or indicate that a conversion is not possible. If tag A has a conversion CBA from tag B, and tag B has a conversion CAB from tag A, then canonical(x) = canonical(CAB(CBA(x))). Given a tag C, this conversion need not be transitive. equality (informally) Scalars are equal when: (a) their tags are equal and their canonical forms are equal; or (b) a tag A has a cast C to tag B, and canonical(y) = canonical(C(x)) for the value x in A, and y in B. Its an icky operator, but may be needed to handle ugly cases like mappings having both integer 1 and floating point 1 values. At least this way one could provide 'subset' relations between objects. Cheers, Clark On Tue, Sep 07, 2004 at 02:59:05AM +0100, David Hopwood wrote: | David Hopwood wrote: | >Also in 3.2.1.3: | > | ># Two nodes must have the same tag and value to be equal. Since each tag | ># applies to exactly one kind, this implies that the two nodes must have | ># the same kind to be equal. | > | >Actually this doesn't follow: it is perfectly possible to have two nodes | >with the same tag and value, but different kinds. | | If both sequences and mappings are treated as functions, and can therefore | have the same value, that is. | | -- | David Hopwood | | | | ------------------------------------------------------- | This SF.Net email is sponsored by BEA Weblogic Workshop | FREE Java Enterprise J2EE developer tools! | Get your free copy of BEA WebLogic Workshop 8.1 today. | http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click | _______________________________________________ | Yaml-core mailing list | Yaml-core@... | https://lists.sourceforge.net/lists/listinfo/yaml-core | -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * ```

 [Yaml-core] Equality issues From: David Hopwood - 2004-09-07 01:22:30 ```The questions below came up when I was finishing off a mathematical model of YAML representations (as promised on IRC). According to 3.2.1.3, # During serialization, equal scalar nodes may be treated as if they were # identical. In contrast, the separate identity of two distinct, but equal, # collection nodes must be preserved. It is unclear exactly how this should be interpreted. Note that it says only that equal scalar nodes *may* be treated as identical, *during serialization*. Serialization is the step that maps from the representation model to the serialization model (figure 3 in the spec). Nothing is said about the other 5 steps in that figure, any of which could, in principle, collapse equal scalar nodes to the same node. I think it would be clearer if the spec said explicitly that the representation model *does not* preserve distinctions between scalar nodes with the same string and tag, and *is not required to* preserve distinctions between equal scalar nodes. (Obviously, an implementation can only collapse scalar nodes that are equal but are represented by different strings if the tag has been recognized.) This would clarify that it is OK for a language mapping to represent a scalar value using a "value type", even though that would mean it is impossible to distinguish equal scalars. It would not prevent a language mapping from using more than one distinct object for the same scalar value. Also in 3.2.1.3: # Two nodes must have the same tag and value to be equal. Since each tag # applies to exactly one kind, this implies that the two nodes must have the # same kind to be equal. Actually this doesn't follow: it is perfectly possible to have two nodes with the same tag and value, but different kinds. At least one of the nodes must have a kind that is inconsistent with the definition of its tag, but equality should be well-defined even in that case. I suggest changing this to: # Two nodes must have the same kind, tag, and value to be equal. (Note that # if a tag is being used correctly, it applies to exactly one kind, and in # that case it is sufficient for the tags and values to be equal. However, # the definition of equality also allows for tags that are used incorrectly.) -- David Hopwood ```
 Re: [Yaml-core] Equality issues From: David Hopwood - 2004-09-07 01:59:12 ```David Hopwood wrote: > Also in 3.2.1.3: > > # Two nodes must have the same tag and value to be equal. Since each tag > # applies to exactly one kind, this implies that the two nodes must have > # the same kind to be equal. > > Actually this doesn't follow: it is perfectly possible to have two nodes > with the same tag and value, but different kinds. If both sequences and mappings are treated as functions, and can therefore have the same value, that is. -- David Hopwood ```
 Re: [Yaml-core] Equality issues From: Clark C. Evans - 2004-09-07 03:54:13 ```David, Another thing that has been floating around in my head is the notion of equality between objects of different types. This would, at least, solve some of the resolution issues that have made me want to stick with transforms. There are two ways to do identity: (a) have an equals operator, or (b) have a canonical form. We chose (b) since it is a resonable requirement and if one can do (b), it is an implementation optimization to also provide (a). Anyway, I was also thinking we may have to add a 'cast' operator to scalar tags. *sigh* It's often the case that mappings consider 1 (integer) and 1.0 (float) as the same value, and thus consider them equal. So, a cast operator would be something like: cast Scalar tags must provide a mechanism to convert values from another tag to a value having its tag, or indicate that a conversion is not possible. If tag A has a conversion CBA from tag B, and tag B has a conversion CAB from tag A, then canonical(x) = canonical(CAB(CBA(x))). Given a tag C, this conversion need not be transitive. equality (informally) Scalars are equal when: (a) their tags are equal and their canonical forms are equal; or (b) a tag A has a cast C to tag B, and canonical(y) = canonical(C(x)) for the value x in A, and y in B. Its an icky operator, but may be needed to handle ugly cases like mappings having both integer 1 and floating point 1 values. At least this way one could provide 'subset' relations between objects. Cheers, Clark On Tue, Sep 07, 2004 at 02:59:05AM +0100, David Hopwood wrote: | David Hopwood wrote: | >Also in 3.2.1.3: | > | ># Two nodes must have the same tag and value to be equal. Since each tag | ># applies to exactly one kind, this implies that the two nodes must have | ># the same kind to be equal. | > | >Actually this doesn't follow: it is perfectly possible to have two nodes | >with the same tag and value, but different kinds. | | If both sequences and mappings are treated as functions, and can therefore | have the same value, that is. | | -- | David Hopwood | | | | ------------------------------------------------------- | This SF.Net email is sponsored by BEA Weblogic Workshop | FREE Java Enterprise J2EE developer tools! | Get your free copy of BEA WebLogic Workshop 8.1 today. | http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click | _______________________________________________ | Yaml-core mailing list | Yaml-core@... | https://lists.sourceforge.net/lists/listinfo/yaml-core | -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * ```
 Re: [Yaml-core] Equality issues From: Oren Ben-Kiki - 2004-09-07 17:08:47 ```On Tuesday 07 September 2004 04:59, David Hopwood wrote: > Also in 3.2.1.3: > > # Two nodes must have the same tag and value to be equal. Since > each tag # applies to exactly one kind, this implies that the two > nodes must have # the same kind to be equal. > > Actually this doesn't follow: it is perfectly possible to have two > nodes with the same tag and value, but different kinds. At least > one of the nodes must have a kind that is inconsistent with the > definition of its tag, but equality should be well-defined even in > that case. I suggest changing this to: Well, if two nodes have different kinds, by definition they don't have the same value, so the problem is solved: they are not equal. There's no wording change required. Have fun, Oren Ben-Kiki ```
 [Yaml-core] Unspecified Tags and the rise of an Imp. From: Clark C. Evans - 2004-09-07 02:57:50 ```There are several options for handing unspecified tags, which I'd like to review here in the help of guiding a discussion on this matter. background: A scalar tag is 'recognized' if given two scalars, the YAML processor can always determine if they are equivalent. If a YAML processor has enough information to convert a scalar having that tag value to a canonical form, it is considered 'recognized'. For example, given a private tag 'foo', and two scalars "!foo 1" and "!foo 1", we know that they are equal. However, two other scalars "!foo 01" and "!foo 1" appear to be different, however, until more information about the tag is available that confirms this belief, the tag 'foo' is said to be unrecognized. A scalar tag is 'available' if the operating environment (programming language plus operating system) has a native, or built-in version of a given tag, it is said to be available. While it is nice that a tag may be available, it is sufficient, for the purposes of building a YAML Representation that all tags be recognized. For example, if the float tag is recognized, then it is always possible to tell if two scalars are equal, like "!float 2300" and "!float 2.3e3", even if the current platform doesn't have direct support for IEEE floating point numbers. In the current specification, specifically, in section 3.3 "Completeness", http://www.yaml.org/spec/#model-complete, there is a diagram (for scalars): * -> Ill-Formed | v Well-Formed -> Unresolved -> Partial Representation | ^ v / Resolved -> Unrecognized | v Recognized -> Unavailable -> Complete Representation | v Available -----------------> Native Representation The difference between a 'Partial Representation' of a YAML document vs a 'Complete Representation' depends pretty much on the idea of 'Recognition', that is, does the YAML Processor know enough about the tag that it can determine equality, if so, a complete representation (and possibly even a native representation) of the associated nodes is possible. Otherwise, a generic node, which may not enforce the particular equalty constraints of the tag has to be used. This inferior form, 'Partial Representation' duplicate nodes with different character values cannot be detected; and hence, certain constructs which may look correct may not, in fact, have a valid YAML Representation. Due to our syntax trick that allows tags to be omitted, a special step, called 'resolution' seems to be required in the above diagram. During this step, an unspecified tag (missing in the document) is filled in with a particular tag as determined by the application. claim: I assert that if we change the operation of a YAML Parser to fill-in an unspecified tag with a particular value, this second step "resolution" is not required, and infact, unnecessarly complicates the parser and loader API. There are several options: private tags: During parse-time, one could provide a canned tag in places where it has been omitted. !unspecified-mapping !unspecified-sequence !unspecified-scalar !unspecified-implicit # yes, implicits are scalars, I know And basically, the parser assigns the above tags to unspecified tags based on the kind; with a implicit exception for plain scalars with an unspecified kind, !unspecified-implicit. In this manner, these two documents below are identical: --- plain: - 'single' - "double" - |- literal - > folded --- !unspecified-mapping { !unspecified-implicit "plain": !unspecified-sequence [ !unspecified-scalar "single", !unspecified-scalar "double", !unspecified-scalar "literal", !unspecified-scalar "folded" } } That is, both of these documents have the same Partial Representation, and that the !unspecified-implicit in the second example uses a double quoted style. This simple proposal competely does away with the tag resolution process, the process moves on to tag recognition. A problematic example, using the explicit syntax: --- !unspecified-scalar "23" : string value !unspecified-implicit "23" : integer value This document, would load just fine, none of these keys are exactly equal (due to the rules above). Therefore, this document has a Partial Representation. However, unless an application provides a way to test for equality (which a native binding satisfies), there is no way to know if this has a Complete Representation. Given the Python binding, !unspecified-implicit would probably convert the 'unspecified-scalar' values into a string, and the 'unspecified-implicit' into an integer; giving a Complete Representation. By contrast, here is a document with a Partial Representation but not a Complete Representation (since both key nodes are equivalent, and thus we have an exception): --- !unspecified-implicit "2" : integer, decimal format, value 2 !unspecified-implicit "02" : integer, octal format, value 2 An example of a document that doesn't even get to be well-formed is simple to produce; since both of these key nodes have the same tag and the same character value they are equal, and thus the YAML document is illegal: --- !unspecified-implicit "2" : integer, decimal format, value 2 !unspecified-implicit "2" : integer, decimal format, value 2 For this set of examples, a null tag was not needed, and type 'resolution' was not necessary. There are more complicated cases, to be presented later, but thus far, the case has been made that resolution is an unnecessary step. public, well-known tags: It is not even necessary to use private tags for this purpose, instead of "unspecified-" tags we could use the current tags in the YAML specification, plus an additional one "imp" - tag:yaml.org,2002:map # Unspecified Mapping - tag:yaml.org,2002:seq # Unspecified Sequence - tag:yaml.org,2002:str # Unspecified Literal String - tag:yaml.org,2002:imp # Unspecified Implicit String The equivalence is identical to the ones above, only that it's a bit easier to read (given #11): --- plain: - 'single' - "double" - |- literal - > folded --- !!map { !!imp "plain": !!seq [ !!str "single", !!str "double", !!str "literal", !!str "folded" } } Anyway, let us review the previous examples, only this time using the public !!imp tag. This next example has a Complete representation since !!imp is recognized and has an equivalence defined via string comparison on its value: --- !!str "23": string value !!imp "23": integer value I could give !!imp a native binding by converting values like this to an integer value on the way in, and dumping as a !!imp on the way out. A bit convoluted, but Imps are known to be evil. The next example is a bit more complicated, --- !!imp "2" : integer, decimal format, value 2 !!imp "02" : integer, octal format, value 2 By the above logic, this also has a complete representation; however, I can't load it into a native binding beacuse it creates a duplicate key. So, as it turns out, this isn't all that nice of a solution. It doesn't work beacuse we've assumed some semantics for !!imp, which are incompatible with how my Python parser would like to use !!imp. However, this problem is only limited to using !!imp, in particular, !!str, !!map, and !!seq work quite as expected, --- !!str "2" : string, one character long !!str "02": string, two characters long. In summary, a public generic, !!imp, type works all the way up till when you actually load it; then it fails since the bindings one would like to use arn't compatible. resolution: Let's take the position that !unspecified-implicit and !!imp are just too magical beacuse they don't reflect the application's operating environment. Rather than having the parser report a !magic tag, it perhaps should return a NULL value and we should ask the application to provide us with the appropriate type for each and every node. This is called implicit tag resolution. Unfortunately, this has problems as well. Consider this document which should clearly be not well formed, but since the key nodes do not yet have a tag, we cannot compare them yet, --- a: this is a: a duplicate, no? So, we could ask an application to provide these types, it would make two calls to the application, but, this uncooperative application returns two _different_ tags, --- !a1 a: this is key 'a#1' !a2 a: this is key 'a#2' So, as it turns out, the keys weren't actually a duplicate! This seems unintuitive; but it is one affect of 'resolution'; unless, of course, you add a special rule that prevents this case. For example, we could specify that a regular expression is the 'only' rule that can be used during resolution (otherwise, the smart-ass application might use the tag's value for its type or some other indeterministic reasoning). So, assume this, --- 23: integer 23.0: float With a very-strict regular expression, I register things with a decimal to be floating point types, and things with just numbers to be an integer. So, the result after resolution would be equivalent to, --- !int 23: integer !float 23.0: float All is good, it matches the model, the tags are recognized, the representation is complete. Except of course, that this is an error in Python. Python uses equivalence across all numeric values, so the above is actually a duplicate key. The problem is, as much as you _think_ these are separate unrelated types, they arn't. The notion of equivalence is tightly tied between them in Python, the integer 1 is equal to the boolean True, which is equal to the float 1.0 >>> { True: 'hi', 1: 'hi', 1.0: 'bing' } {True: 'bing'} So, while you may _assume_ that these could and should be different types, you'd be wrong. In reality, an !!imp unified type is probably the best fit for this monster... at least in Python. This also fails when the application is only expecting string values within a particular context: --- integers-are-ok: 3: ok "2": ok, converted to intger, right? integers-not-ok: "2": ok 2: should be duplicate key, but it isn't. One last example, where resolution falls flat on its face is when you want to automatically 'upgrade' components as you are loading them. For example, suppose along time ago, I was writing my circles like.. --- circles: # x, y, radius - (2,3,10) - (3,2,2) So, I install a regex matching the above expressions to map to my !circle type. --- circles: # x, y, radius - !circle (2,3,10) - !circle (3,2,2) But really, even though it was a "convient" display, its not very good to work with. What I'd rather have, is the "resolution" process return to me a bunch of mappings: --- circles: # x, y, radius - !circle { x: 2, y: 3, 'r': 10 } - !circle { x: 3, y: 2, 'r': 2 } If I'm going to have a really nice 'ambiguous' display type, why can't it be a mapping in memory and a string in the YAML presentation? In summary, while resolution kinda works, it is a fly-swatter hack that turns out to not solve the core needs of developers who are dealing with implicit types; at least not the hard parts like equivalence, expecting particular values in particular slots, and different versions of objects. giving up: The only resonable answer to this problem... is not to solve it; at least not directly with the parser. Converting YAML types into native types may need to involve tricky calculations about equivalence, versions, and other concerns. Trying to force-fit this into a simplistic 'resolution' process just doesn't work, and actually just makes things worse. The solution proposed, is to mark all unspecified tags with !!map, !!seq, !!str, and !!imp global tags. In this manner, if a graph doesn't use any explicit tags, it always has a complete representation by which schema validation, ypath, and other tools can operate. It can also round-trip these objects just perfectly, and maintain the 'plain scalar'-ness. However, that's as far as it goes. If a programming environment _happens_ to have an object similar enough to !!map, !!seq, and !!str (which Python does), a direct binding these tags into native objects is possible (for python, the dict, list, and string). The funny !!imp can be handled as a 'generic YAML object' if necessary, and that's OK. For a Python binding, I'd be tempted to implement "imp" as its own object, with special comparison operations that matched expectations. In summary, !!imp is a complicated beast... without proper tools to make sure that it stays under control, we must rely upon the application to be smart about how it wishes to use these things. There isn't a quick pill here, and the weak version of 'type resolution' is not only inflexible, but it doesn't solve the core issues. In short, it's an extra burden without commensurate return on investment. transform: An extended option, available to applications, is to actually *transform* the incoming YAML graph into another representation. Perhaps one that does a conversion like: !!imp "23" => !python!int "23" !!imp "23.0" => !python!float "23.0" or even converts '(2,3,20)' into a object (which happens to have a !!map YAML representation). This application specific transform is then capable of handling context, for example, in this case, --- !!map !!imp 23: integer !!imp 23.0: float The transform may have to change the mapping above to one that implements the YAML equality; or, it may just choose to raise an exception. But this exception is different than the one before, this exception is a failure to transform. It means that the fella who wrote the converter didn't cover a specific use case of !!imp. And that's OK. With a few complains from her customers, it'll be patched up, or they will tell them: "Don't Do That". For the case where only strings are expected, and an integer leaks-in due to a !!imp 3, the transform could also solve this. If it was coded. In any case, *if* any kind of complicated "implicit typing" is needed, it cannot and should not be some sort of hack limited thingy. Rather than try and 'patch' this in the current specification, this should be punted to the application for now, and, later, full-blown-schemas and YAML transforms that deal with these nasty sorts of issues can be developed. syntax: Although Brian may not have been briefed on all of the details above; he had a suggestion. When a tag is unspecified, it is treated as: !!map for mappings !!seq for sequences !!str for scalars that are not plain !!imp for plains scalars Where the actual "prefix" of the objects is determined by the default %TAG mechanism. For the plain scalar, the emitter would have to have special logic, it would compare each node to see if it ends with "imp" and starts with the default prefix; if so, and if the value matches the plain scalar production, it could be emitted using the empty string. summary: I recommend the above syntax. Unspecified tags should be global tags in the current 'default' prefix. References to 'resolution' should be removed from the spec, and an non-normative example should be presented that describes how an application-specific transform can be used to meet most of these needs. The example's sole purpose is to remind developers that while they can do what ever transform they wish... they should stay within the bounds of the YAML Representation model. That is the sole point, and this extra reminder is the only thing that is needed to replace the 'Resolve' section of the current spec. ```
 Re: [Yaml-core] Unspecified Tags and the rise of an Imp. From: Clark C. Evans - 2004-09-07 13:59:48 ```summary: - We add to the specification 'tag:yaml.org,2002:imp' which stands for "Unspecified Implicit Scalar". The semantics of the !!imp tag are exactly the same as !!str, that is, two !!imp scalars are considered equal if and only if their string values are equal. - The parser shall report unspecified tags for the plain scalar parser as if '!!imp' had been provided. In particular, the default tag, lacking a default %TAG directive is 'tag:yaml.org,2002:imp'. - The parser shall report unspecified tags for mappings, sequences, and all other scalars as if '!!map', '!!seq', and '!!str' had been used. The specification is updated to call these "Unspecified Mapping", "Unspecified Sequence", and "Unspecified Common Scalar" respectively. - The only difference between !!str and !!imp, from the spec's point of view is that they are distinct, that is a !!str x is not equal to a !!imp x. Other than that, they are both unicode string values; as a result, a YAML document that only uses unspecified tags has a "Complete" YAML Representation. - The 'resolution' phase in section 3.3 of the specification is simply removed. It is not needed. It could be replaced with a paragraph which says: "An application may choose to convert !!imp or any other built-in tag to one or more other tags of the same kind, or collapse an !!imp so that it is treated the same as a !!str, or do any other graph modifications it wishes. However, any such transformation will produce a different YAML Representation, and thus, if round-tripping is important, a reverse transformation before saving the information is good pratice. In any case, a rewrite of the graph before loading should only take into account information found in the YAML Graph Representation Model. In particular, key order, tag directives, comments, and other presentation level details should be ignored." This change is required since 'resolution' phase is too limited to be useful, and keeping it in the specification is unnecessary. While the paragraph above is strictly not necessary, it is probably helpful to include, specifically to remind application or library developers that if they choose to transform the data, they should: (a) provide an inverse transform (to put the toys away), and (b) only use info in the graph model. That's it. The rest of the document below is a motivation and justification for the above change set; in particular, showing that the resolution phase is too weak to be generally useful, and that making implicits a syntax-only operation allows for greater rigor when describing the behavior of a YAML Process. Cheers! Clark On Mon, Sep 06, 2004 at 10:57:49PM -0400, Clark C. Evans wrote: | There are several options for handing unspecified tags, which I'd like | to review here in the help of guiding a discussion on this matter. | | background: | | A scalar tag is 'recognized' if given two scalars, the YAML | processor can always determine if they are equivalent. If a YAML | processor has enough information to convert a scalar having that tag | value to a canonical form, it is considered 'recognized'. For | example, given a private tag 'foo', and two scalars "!foo 1" and | "!foo 1", we know that they are equal. However, two other scalars | "!foo 01" and "!foo 1" appear to be different, however, until more | information about the tag is available that confirms this belief, | the tag 'foo' is said to be unrecognized. | | A scalar tag is 'available' if the operating environment | (programming language plus operating system) has a native, or | built-in version of a given tag, it is said to be available. While | it is nice that a tag may be available, it is sufficient, for the | purposes of building a YAML Representation that all tags be | recognized. For example, if the float tag is recognized, then it is | always possible to tell if two scalars are equal, like "!float 2300" | and "!float 2.3e3", even if the current platform doesn't have direct | support for IEEE floating point numbers. | | In the current specification, specifically, in section 3.3 | "Completeness", http://www.yaml.org/spec/#model-complete, | there is a diagram (for scalars): | | * -> Ill-Formed | | | v | Well-Formed -> Unresolved -> Partial Representation | | ^ | v / | Resolved -> Unrecognized | | | v | Recognized -> Unavailable -> Complete Representation | | | v | Available -----------------> Native Representation | | | The difference between a 'Partial Representation' of a YAML document | vs a 'Complete Representation' depends pretty much on the idea of | 'Recognition', that is, does the YAML Processor know enough about | the tag that it can determine equality, if so, a complete | representation (and possibly even a native representation) of the | associated nodes is possible. Otherwise, a generic node, which may | not enforce the particular equalty constraints of the tag has to be | used. This inferior form, 'Partial Representation' duplicate nodes | with different character values cannot be detected; and hence, | certain constructs which may look correct may not, in fact, have a | valid YAML Representation. | | Due to our syntax trick that allows tags to be omitted, a special | step, called 'resolution' seems to be required in the above diagram. | During this step, an unspecified tag (missing in the document) is | filled in with a particular tag as determined by the application. | | claim: | | I assert that if we change the operation of a YAML Parser to fill-in | an unspecified tag with a particular value, this second step | "resolution" is not required, and infact, unnecessarly complicates | the parser and loader API. There are several options: | | private tags: | | During parse-time, one could provide a canned tag in places | where it has been omitted. | | !unspecified-mapping | !unspecified-sequence | !unspecified-scalar | !unspecified-implicit # yes, implicits are scalars, I know | | And basically, the parser assigns the above tags to unspecified tags | based on the kind; with a implicit exception for plain scalars with an | unspecified kind, !unspecified-implicit. In this manner, these two | documents below are identical: | | --- | plain: | - 'single' | - "double" | - |- | literal | - > | folded | | --- !unspecified-mapping { | !unspecified-implicit "plain": | !unspecified-sequence [ | !unspecified-scalar "single", | !unspecified-scalar "double", | !unspecified-scalar "literal", | !unspecified-scalar "folded" | } | } | | That is, both of these documents have the same Partial | Representation, and that the !unspecified-implicit in the second | example uses a double quoted style. This simple proposal competely | does away with the tag resolution process, the process moves on to | tag recognition. A problematic example, using the explicit syntax: | | --- | !unspecified-scalar "23" : string value | !unspecified-implicit "23" : integer value | | This document, would load just fine, none of these keys are exactly | equal (due to the rules above). Therefore, this document has a | Partial Representation. However, unless an application provides | a way to test for equality (which a native binding satisfies), there | is no way to know if this has a Complete Representation. Given the | Python binding, !unspecified-implicit would probably convert the | 'unspecified-scalar' values into a string, and the 'unspecified-implicit' | into an integer; giving a Complete Representation. | | By contrast, here is a document with a Partial Representation | but not a Complete Representation (since both key nodes are | equivalent, and thus we have an exception): | | --- | !unspecified-implicit "2" : integer, decimal format, value 2 | !unspecified-implicit "02" : integer, octal format, value 2 | | An example of a document that doesn't even get to be well-formed | is simple to produce; since both of these key nodes have the | same tag and the same character value they are equal, and thus | the YAML document is illegal: | | --- | !unspecified-implicit "2" : integer, decimal format, value 2 | !unspecified-implicit "2" : integer, decimal format, value 2 | | For this set of examples, a null tag was not needed, and | type 'resolution' was not necessary. There are more complicated | cases, to be presented later, but thus far, the case has been | made that resolution is an unnecessary step. | | public, well-known tags: | | It is not even necessary to use private tags for this purpose, | instead of "unspecified-" tags we could use the current tags | in the YAML specification, plus an additional one "imp" | | - tag:yaml.org,2002:map # Unspecified Mapping | - tag:yaml.org,2002:seq # Unspecified Sequence | - tag:yaml.org,2002:str # Unspecified Literal String | - tag:yaml.org,2002:imp # Unspecified Implicit String | | The equivalence is identical to the ones above, only that | it's a bit easier to read (given #11): | | --- | plain: | - 'single' | - "double" | - |- | literal | - > | folded | | --- !!map { | !!imp "plain": | !!seq [ | !!str "single", | !!str "double", | !!str "literal", | !!str "folded" | } | } | | Anyway, let us review the previous examples, only this time using the | public !!imp tag. This next example has a Complete representation | since !!imp is recognized and has an equivalence defined via string | comparison on its value: | | --- | !!str "23": string value | !!imp "23": integer value | | I could give !!imp a native binding by converting values like | this to an integer value on the way in, and dumping as a !!imp | on the way out. A bit convoluted, but Imps are known to be evil. | The next example is a bit more complicated, | | --- | !!imp "2" : integer, decimal format, value 2 | !!imp "02" : integer, octal format, value 2 | | By the above logic, this also has a complete representation; however, | I can't load it into a native binding beacuse it creates a duplicate | key. So, as it turns out, this isn't all that nice of a solution. It | doesn't work beacuse we've assumed some semantics for !!imp, which | are incompatible with how my Python parser would like to use !!imp. | However, this problem is only limited to using !!imp, in particular, | !!str, !!map, and !!seq work quite as expected, | | --- | !!str "2" : string, one character long | !!str "02": string, two characters long. | | In summary, a public generic, !!imp, type works all the way up till | when you actually load it; then it fails since the bindings one would | like to use arn't compatible. | | resolution: | | Let's take the position that !unspecified-implicit and !!imp are just | too magical beacuse they don't reflect the application's operating | environment. Rather than having the parser report a !magic tag, it | perhaps should return a NULL value and we should ask the application | to provide us with the appropriate type for each and every node. | This is called implicit tag resolution. | | Unfortunately, this has problems as well. Consider this document | which should clearly be not well formed, but since the key nodes do | not yet have a tag, we cannot compare them yet, | | --- | a: this is | a: a duplicate, no? | | So, we could ask an application to provide these types, it would make | two calls to the application, but, this uncooperative application | returns two _different_ tags, | | --- | !a1 a: this is key 'a#1' | !a2 a: this is key 'a#2' | | So, as it turns out, the keys weren't actually a duplicate! This | seems unintuitive; but it is one affect of 'resolution'; unless, of | course, you add a special rule that prevents this case. For example, | we could specify that a regular expression is the 'only' rule that | can be used during resolution (otherwise, the smart-ass application | might use the tag's value for its type or some other indeterministic | reasoning). So, assume this, | | --- | 23: integer | 23.0: float | | With a very-strict regular expression, I register things with | a decimal to be floating point types, and things with just | numbers to be an integer. So, the result after resolution | would be equivalent to, | | --- | !int 23: integer | !float 23.0: float | | All is good, it matches the model, the tags are recognized, the | representation is complete. Except of course, that this is an error | in Python. Python uses equivalence across all numeric values, so the | above is actually a duplicate key. The problem is, as much as you | _think_ these are separate unrelated types, they arn't. The notion | of equivalence is tightly tied between them in Python, the integer 1 | is equal to the boolean True, which is equal to the float 1.0 | | >>> { True: 'hi', 1: 'hi', 1.0: 'bing' } | {True: 'bing'} | | So, while you may _assume_ that these could and should be different | types, you'd be wrong. In reality, an !!imp unified type is | probably the best fit for this monster... at least in Python. | | This also fails when the application is only expecting string | values within a particular context: | | --- | integers-are-ok: | 3: ok | "2": ok, converted to intger, right? | integers-not-ok: | "2": ok | 2: should be duplicate key, but it isn't. | | One last example, where resolution falls flat on its face | is when you want to automatically 'upgrade' components as | you are loading them. For example, suppose along time | ago, I was writing my circles like.. | | --- | circles: # x, y, radius | - (2,3,10) | - (3,2,2) | | So, I install a regex matching the above expressions to map | to my !circle type. | | --- | circles: # x, y, radius | - !circle (2,3,10) | - !circle (3,2,2) | | But really, even though it was a "convient" display, its | not very good to work with. What I'd rather have, is the | "resolution" process return to me a bunch of mappings: | | --- | circles: # x, y, radius | - !circle { x: 2, y: 3, 'r': 10 } | - !circle { x: 3, y: 2, 'r': 2 } | | If I'm going to have a really nice 'ambiguous' display type, | why can't it be a mapping in memory and a string in the | YAML presentation? | | In summary, while resolution kinda works, it is a fly-swatter hack | that turns out to not solve the core needs of developers who are | dealing with implicit types; at least not the hard parts like | equivalence, expecting particular values in particular slots, and | different versions of objects. | | giving up: | | The only resonable answer to this problem... is not to solve it; | at least not directly with the parser. Converting YAML types | into native types may need to involve tricky calculations about | equivalence, versions, and other concerns. Trying to force-fit | this into a simplistic 'resolution' process just doesn't work, | and actually just makes things worse. | | The solution proposed, is to mark all unspecified tags with !!map, | !!seq, !!str, and !!imp global tags. In this manner, if a graph | doesn't use any explicit tags, it always has a complete | representation by which schema validation, ypath, and other tools can | operate. It can also round-trip these objects just perfectly, and | maintain the 'plain scalar'-ness. However, that's as far as it goes. | | If a programming environment _happens_ to have an object similar | enough to !!map, !!seq, and !!str (which Python does), a direct | binding these tags into native objects is possible (for python, the | dict, list, and string). The funny !!imp can be handled as a | 'generic YAML object' if necessary, and that's OK. For a Python | binding, I'd be tempted to implement "imp" as its own object, with | special comparison operations that matched expectations. | | In summary, !!imp is a complicated beast... without proper tools to | make sure that it stays under control, we must rely upon the | application to be smart about how it wishes to use these things. | There isn't a quick pill here, and the weak version of 'type | resolution' is not only inflexible, but it doesn't solve the core | issues. In short, it's an extra burden without commensurate return | on investment. | | transform: | | An extended option, available to applications, is to actually | *transform* the incoming YAML graph into another representation. | Perhaps one that does a conversion like: | | !!imp "23" => !python!int "23" | !!imp "23.0" => !python!float "23.0" | | or even converts '(2,3,20)' into a object (which happens to have | a !!map YAML representation). This application specific transform | is then capable of handling context, for example, in this case, | | --- !!map | !!imp 23: integer | !!imp 23.0: float | | The transform may have to change the mapping above to one that | implements the YAML equality; or, it may just choose to raise | an exception. But this exception is different than the one | before, this exception is a failure to transform. It means | that the fella who wrote the converter didn't cover a specific | use case of !!imp. And that's OK. With a few complains from | her customers, it'll be patched up, or they will tell them: | "Don't Do That". | | For the case where only strings are expected, and an integer | leaks-in due to a !!imp 3, the transform could also solve this. | If it was coded. | | In any case, *if* any kind of complicated "implicit typing" is | needed, it cannot and should not be some sort of hack limited thingy. | Rather than try and 'patch' this in the current specification, this | should be punted to the application for now, and, later, | full-blown-schemas and YAML transforms that deal with these nasty | sorts of issues can be developed. | | syntax: | | Although Brian may not have been briefed on all of the details | above; he had a suggestion. When a tag is unspecified, it is | treated as: | !!map for mappings | !!seq for sequences | !!str for scalars that are not plain | !!imp for plains scalars | | Where the actual "prefix" of the objects is determined by | the default %TAG mechanism. | | For the plain scalar, the emitter would have to have special logic, | it would compare each node to see if it ends with "imp" and | starts with the default prefix; if so, and if the value matches | the plain scalar production, it could be emitted using the | empty string. | | summary: | | I recommend the above syntax. Unspecified tags should be | global tags in the current 'default' prefix. References | to 'resolution' should be removed from the spec, and | an non-normative example should be presented that describes | how an application-specific transform can be used to meet | most of these needs. The example's sole purpose is to remind | developers that while they can do what ever transform they | wish... they should stay within the bounds of the YAML | Representation model. That is the sole point, and this | extra reminder is the only thing that is needed to | replace the 'Resolve' section of the current spec. | | | | ------------------------------------------------------- | This SF.Net email is sponsored by BEA Weblogic Workshop | FREE Java Enterprise J2EE developer tools! | Get your free copy of BEA WebLogic Workshop 8.1 today. | http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click | _______________________________________________ | Yaml-core mailing list | Yaml-core@... | https://lists.sourceforge.net/lists/listinfo/yaml-core | -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * ```
 Re: [Yaml-core] Equality issues From: Clark C. Evans - 2004-09-07 03:21:56 ```On Tue, Sep 07, 2004 at 02:22:14AM +0100, David Hopwood wrote: | The questions below came up when I was finishing off a mathematical model | of YAML representations (as promised on IRC). | | According to 3.2.1.3, | | # During serialization, equal scalar nodes may be treated as if they were | # identical. In contrast, the separate identity of two distinct, but | # equal, collection nodes must be preserved. In Python, two scalar values (which are immutable) can be collapsed into the same memory address, while collection objects (not immutable) always have different addresses. | I think it would be clearer if the spec said explicitly that the | representation model *does not* preserve distinctions between scalar nodes | with the same string and tag, and *is not required to* preserve | distinctions between equal scalar nodes. (Obviously, an implementation | can only collapse scalar nodes that are equal but are represented by | different strings if the tag has been recognized.) nice. | Also in 3.2.1.3: | | # Two nodes must have the same tag and value to be equal. Since each tag | # applies to exactly one kind, this implies that the two nodes must have | # the same kind to be equal. | | Actually this doesn't follow: it is perfectly possible to have | two nodes with the same tag and value, but different kinds. In the model (I don't know where) it is specified that tags are associated with one-and-only-one kind. | If both sequences and mappings are treated as functions, | and can therefore have the same value, that is. Yes, but then we'd have to define tag:yaml.org:whole-number ;) Clark P.S. Glad to see you are picking your way through Equivalence. Its the hardest part to specify correctly. We had a more mathematical formulation in earlier drafts, but it got replaced with this more readable, less formal version. Anyway Equivalence is the core problem with how unspecified tags should be handled. ```
 Re: [Yaml-core] Equality issues From: Oren Ben-Kiki - 2004-09-07 17:10:34 ```On Tuesday 07 September 2004 04:22, David Hopwood wrote: > I think it would be clearer if the spec said explicitly that the > representation model *does not* preserve distinctions between scalar > nodes with the same string and tag, and *is not required to* preserve > distinctions between equal scalar nodes. ... and _is_ required to preserve the distinction between equal collection nodes. Yes, that would be better. Have fun, Oren Ben-Kiki ```