Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

## Re: [Yaml-core] Equality of YAML nodes

 Re: [Yaml-core] Equality of YAML nodes From: Oren Ben-Kiki - 2009-09-18 21:17:10 ```On Fri, 2009-09-18 at 21:33 +0900, Osamu TAKEUCHI wrote: > Hi all, > > I have some questions and opinions on the definition of > YAML data node's equality.. > > At first, I want to confirm that, a description > "two nodes are equal only when some condition is met" means > "two nodes are not equal when some condition is not met." Yes, it is an if-and-only-if, at least as far as the spec is concerned. > Then, I start the discussion with the equality of recursive > collection nodes. The problem you describe is widely discussed in literaturem as the (directed) graph isomorphism problem, and it occurs in many fields - type systems for functional languages (close to YAML's case), chemistry (are two molecules the same?), Prolog (unification algorithms), etc. The general problem may be exponential (it is NP, maybe even NP-complete). However, even naive algorithms do extremely well in practice, and only get exponential on very pathological cases (your examples aren't tricky enough to cause problems to a naive algorithm:-). > Let me confirm that the following two nodes > are not equal to each other. > > %YAML 1.2 > --- > &A { *A } > --- > &A { { *A } } > ... No, they are not equal; one is a single mapping that contains a key pointing to itself and a null value. The other has two mappings: the first contains a key that is a second mapping that contains a key pointing to the first, both keys having null values. There's no 1-1 mapping between them (the first has only one mappings, the second has two distinct ones). > Another reason is that if they should be equal, I can not > find a good way to compare such recursive nodes without > causing an infinite loop in my library. Mapping two global (directed) graphs (that contain cycles) to each other can be done without infinite cycles. > Next, I thought the following two are equal to each other. > > %YAML 1.2 > --- > &A{ *A, { *A } } > --- > &A{ { *A }, *A } > ... Yes, these are equal, because (as you point out) order of keys does not matter, you can construct a 1-1 mapping between them. > Then, what do you think about the next YAML document? > > %YAML 1.2 > --- > &A { &B { *B, *A }, *A } > ... > > I thought the mapping *A and mapping *B are equal to each > other because their graph topologies are the same, except > for the order of appearance in the document. Yes, they are the same. > If this > understanding is correct, a YAML parser should reject the > above input because a mapping can only contain unique keys > in YAML. Yes. > Note that such an example can be easily expanded > into further complex forms. Oh yes. The graph theory people can show you examples that will curl your hair. Of course these are "pathological" YAML, since (1) using mapping for keys is not very common, (2) having cycles in mappings is not very common, and (3) having both at once is even less common. > When I started implementing an equality evaluator that > works with such complicated cases, I found it is a > quite tough work. I wouldn't try to reinvent the wheel here. You can easily adapt one of the simple but effective algorithms already developed for this (google is your friend). Again, these will do the job quickly 99.9999% of the time, unless someone invest a lot of effort in creating truly pathological mapping keys. > So, I wondered if such a complicated comparison is > really implemented in other libraries... > Obviously, the answer was no. Alas... > I guess these results are not necessarily intended by the > library's implementer but just due to the way on which > ruby runtime compares two instances of ruby's Hash class. Yes. > So, my first several questions are: > Is there any YAML library that employs the strict node > comparison that is defined in the specification? Probably not :-( > Must I implement such a comarison for my library > if I want to support YAML 1.2 specification? Technically, yes... but it is Ok for a library to punt and pass the burden to the native data type implementation (which most YAML libraries do). > In which application, is such a strict comparison valuable? Yes, it ensures portability between different platforms. We do not want to tie YAML data to a specific platform or application. > Ok, I have implemented an equality evaluator that works > correctly for all cases I have tested, after quite a bit > of struggling. Great! > But then, another concern arose. With such an equality > evaluation, it is tricky to build a node tree defined > in the next YAML > ... > I have no idea to avoid this problem, do you? I didn't understand the problem. You have shown intermediate steps that have duplicate keys; but the final result is OK, which is all that matters. Can you clarify? > The following example is again from ruby 1.8.7's > YAML library. > > require 'yaml' > > # A class with no members > class Test > end > > # Create a hash with instances of Test > p hash = { Test.new => 1, Test.new => 2 } > > # {#=>1, #=>2} > > # Note that two instances of Test class are not > # equal to each other, because ruby compares > # class instances, by default, by their identity. As opposed to YAML, yes, which will cause problems when dumping it. Consider what would happen if the data would be loaded by a language that compares the content by default... Anything based on Prolog and/or LISP for example. > # Convert the hash into YAML. > print yaml = YAML.dump(hash) > > # --- > # !ruby/object:Test ? {} > # : 1 > # > # !ruby/object:Test ? {} > # : 2 > > ### Oops, this library has such an obvious bug! Well, technically, the author of the Test class has a bug. If he's comparing test objects based on their identity, he should serialize this identity somehow to make it explicit (!ruby/object:Test { ! ruby/MagicDefaultIdentityUniqueHash Hash: 0x7ff2c9f0 } or whatever). Objects with the same hash should be serialized as anchors and references. You may even justify loading the "UniqueHash" tag into whatever the default unique identity hash would be for calling "new Test" instead of preserving it exactly. > Should the spec really force a library to reject such an > input, even though that will not be expected by most of > users? I'd say "yes", and even for practical reasons. First, like I said, languages that take a more "data oriented" view of the world (say, Haskell, LISP, Prolog) will cheerfully consider these keys equal, so loading this data to them will fail. Second, in languages that do this sort of thing, the typical pattern is to say something like: a = new Test b = new Test mapping[a] = valueForA mapping[b] = valueForB c = someCondition ? a : b v = mapping[c] That is, the keys are kept _in addition_ to the mapping and are then used to locate values. However, this idiom does not translate to dumping/loading the mapping to a file: a = new Test mapping[a] = valueForA yamlText = dumpToYaml(mapping) mapping = loadFromYaml(yamlText) v = mapping[a] # Fails! Or, in a different program (or instance): mapping = loadFromYaml(yamlText) a = ... # How do I find 'a'? Maybe: a = mapping.findAKeyForTheValue(valueForA) # Hope there's only one... So this mapping is *practically* unusable as it is, once it is de/serialized to YAML. Having the YAML library reject the file is better, as it forces the author to consider what he is trying to achieve and do _that_. > At this point, I started considering the purpose of defining > YAML node's equality in the specification. In my opinion, the > purpose is mainly for rejecting a mapping node because of its > duplicated keys, or silently neglects the new entry when a > !!merge key is processed. Right. > Another purpose is for allowing a library to represent some > equal schalar nodes by an single identical node, as is written > in the specification. Yes. > > A YAML processor may treat equal scalars as if they were > > identical. > I have not understood the importance of this description, though. It is useful to declare that scalars don't have "identity" as such, but "objects" do. That is, "objects" have identity, but scalars don't - e.g., the number 4 has no "identity", but the "!!point { x: 4 }" does. This maps to the way 99% of the programs model their data (you can change the 'x' coordinate of the point and it will remain the "same" point, but you can't change the number 4 to the number 5, ever). In practical terms this allows YAML libraries to avoid having to re-serialize large scalars (e.g., binary data, texts, etc.). Not the _most_ useful thing in the world, but it does have its place. > I can not think of any other meaning of nodes equality in the > parsing and composition stages. Then, how about the meaning after > the composition? After composing native objects, YAML has no say at all about anything. The application may wreak whatever havoc it wants on these structures; that's its job after all. > If we take the definition of nodes' equality > strictly, we can never construct native objects from YAML nodes. I fail to see how that follows. The YAML document is intended to represent a stand-alone object. If you do "b = YAML.load(YAML.dump(a))" you do _not_ expect that "b" and "a" would have the same memory address (that is, have the same identity). You _do_ expected them to have the same content (that is, be equal). > Note that, as seen above, constructing a ruby's hash or an > instance of Test class from a YAML mapping node did not preserve > the data equality. In order for YAML to work, the representation of the object to YAML must follow YAML's rules. The representation of "Test" objects as YAML did _not_ follow these rules. The author of the Test class should have overriden the default serialization-to-YAML to resolve it. As long as he hasn't, he has a bug, and this isn't YAML's fault. > Consequently, I feel the current definition of node's equality > is not suited to the real applications. I am unconvinced... > My opinion is as following. [Basically make equality be defined on a per-tag basis, using the native implementation if needed] This would greatly weaken portability of YAML data between systems. Basically, each file would be tied to the specific platform and even specific application. A YAML library on a different platform or using a different application would have no way of knowing what the semantics is. Writing "generic" YAML tools (that know nothing about the tags) would become much more difficult, at times impossible. We'd rather not go down this road. The use case you demonstrated (using "Test" instances as keys) is pathological; I would be greatly suspicious of a system that used such a method, and I definitely wouldn't want to bend YAML out of shape to handle it. On the other hand, the following is a very reasonable practice, and much more common than your (IMO unreasonable) "Test" case: --- calendar: { year: 2009, month: 12, day: 1 }: ... { year: 2009, month: 12, day: 2 }: ... cars: { manufacturer: ford, model: gt }: ... { manufacturer: jaguar, model: xk }: ... I definitely would be surprised to learn it is OK to see two different ford GT entries in this mapping! > What do you expect to a YAML library for nodes' equality? YAML libraries may rely on the implementation of the native data type to do the right thing. They also typically provide a way for the author of the native data type to override the way it is de/serialized from/to YAML. It isn't the job of the library to ensure the right thing is done in this case (except for the common standard types provided by the platform/language). It is the responsibility of the author of the data type to ensure that the de/serialization to YAML correctly follows YAML rules. YAML libraries may also do all the heavy lifting themselves. As you point out, this isn't as easy as it seems on first sight; but there's plenty of code/algorithms/libraries out there that can help. > What do you expect to the YAML specification for nodes' equality? Just like it is right now :-) > Thank you for reading my long-long email and for your possible > response to it. Not at all; clarifying these issues is what this list is for. I hope I did manage to clarify the issue. Oren. ```

 [Yaml-core] Equality of YAML nodes From: Osamu TAKEUCHI - 2009-09-18 12:51:55 ```Hi all, I have some questions and opinions on the definition of YAML data node's equality. It became a very long email. I'm sorry about it. The current specification defines the YAML node's equality as the folowing: > Two nodes must have the same tag and content to be equal. > Since each tag applies to exactly one kind, this implies > that the two nodes must have the same kind to be equal. > Two scalars are equal only when their tags and canonical > forms are equal character-by-character. Equality of > collections is defined recursively. Two sequences are equal > only when they have the same tag and length, and each node > in one sequence is equal to the corresponding node in the > other sequence. Two mappings are equal only when they have > the same tag and an equal set of keys, and each key in this > set is associated with equal values in both mappings. For the first glance, this definition of equality seemed to be completely reasonable for me. But with implementing a YAML library, I now have some questions, especially for the equality of collection nodes and that of the nodes which represent some native data object that have their own equality evaluation schemes. At first, I want to confirm that, a description "two nodes are equal only when some condition is met" means "two nodes are not equal when some condition is not met." If not, I want to know the exact meaning. Then, I start the discussion with the equality of recursive collection nodes. Let me confirm that the following two nodes are not equal to each other. %YAML 1.2 --- &A { *A } --- &A { { *A } } ... I thought so, because their node graph topologies are different even though the Tags and the contents are the same if we compare the nodes one by one, without looking at the global graph structures. Another reason is that if they should be equal, I can not find a good way to compare such recursive nodes without causing an infinite loop in my library. Next, I thought the following two are equal to each other. %YAML 1.2 --- &A{ *A, { *A } } --- &A{ { *A }, *A } ... The reason is, the order of mapping key does not affect its content. Then, what do you think about the next YAML document? %YAML 1.2 --- &A { &B { *B, *A }, *A } ... I thought the mapping *A and mapping *B are equal to each other because their graph topologies are the same, except for the order of appearance in the document. If this understanding is correct, a YAML parser should reject the above input because a mapping can only contain unique keys in YAML. Note that such an example can be easily expanded into further complex forms. %YAML 1.2 --- &A { &B { *B: *A, *A: *B }: *A, *A: *B } --- &A { &B { *A, *B, &C { *A, *C, *B } }, *C, *A } --- &A { &B { *A, &C { *B, *A } }, *C } ... When I started implementing an equality evaluator that works with such complicated cases, I found it is a quite tough work. So, I wondered if such a complicated comparison is really implemented in other libraries. I tested ruby 1.8.7's YAML library. require 'yaml' p YAML.load("{ 3, 3 }") #1 # {3=>nil} p YAML.load("{ { foo }, { foo } }") #2 # {{"foo"=>nil}=>nil} p YAML.load("{ &A { *A }, *A }") #3 # {{{...}=>nil}=>nil} p YAML.load("{ &A { *A }, &B { *B } }") #4 # {{{...}=>nil}=>nil, {{...}=>nil}=>nil} p YAML.load("&A { &B { *B, *A }, *A }") #5 # {{{...}=>nil, {...}=>nil}=>nil, {...}=>nil} Obviously, the answer was no. Anyway, let me dig into the examples for some degree. The example #1 shows that this library does not reject a mapping with duplicated keys. Instead, it silently overwrites the entries. As discussed in the thread [Yaml-core] YAML ain't a superset of JSON From: Jakob Voss - 2009-06-06 14:05, this is not an official behavior of a YAML parser, though most of JSON parsers go that way. >From these examples, we can see how the library evaluates the equality. Example #2 and #3 show that the library is aware of equality if they are not recursive or are the same instance. Example #4 and #5 show that the library is not aware of equality if they are recursive and are not the same instance. I guess these results are not necessarily intended by the library's implementer but just due to the way on which ruby runtime compares two instances of ruby's Hash class. So, my first several questions are: Is there any YAML library that employs the strict node comparison that is defined in the specification? Must I implement such a comarison for my library if I want to support YAML 1.2 specification? In which application, is such a strict comparison valuable? I move on to the next issue. Ok, I have implemented an equality evaluator that works correctly for all cases I have tested, after quite a bit of struggling. But then, another concern arose. With such an equality evaluation, it is tricky to build a node tree defined in the next YAML. %YAML 1.2 --- &A { &B { *B, *A }, *A, null } ... This document expresses a valid node tree without any duplicated keys. However, it is almost impossible to avoid duplicated keys from appearing while composing the mapping node. ok: &A { } ok: &A { &B { } } ok: &A { &B { *B } } ok: &A { &B { *B, *A } } NG: &A { &B { *B, *A }, *A } ok: &A { &B { *B, *A }, *A, null } I have no idea to avoid this problem, do you? The last issue is related to the way the libraries use YAML. Often in YAML libraries, a native object of some class or structure is represented by a YAML's mapping node, whose keys represent the names of the properties or fields and the correspoinding values hold their values. The following example is again from ruby 1.8.7's YAML library. require 'yaml' # A class with no members class Test end # Create a hash with instances of Test p hash = { Test.new => 1, Test.new => 2 } # {#=>1, #=>2} # Note that two instances of Test class are not # equal to each other, because ruby compares # class instances, by default, by their identity. # Convert the hash into YAML. print yaml = YAML.dump(hash) # --- # !ruby/object:Test ? {} # : 1 # # !ruby/object:Test ? {} # : 2 ### Oops, this library has such an obvious bug! ### I guess it should have been as the following. p yaml = <=>2, #=>1} This behavior (without the obvious bug) is not surprising for most of users of the library. A hash is stored and restored successfully. But, as studied above, this behavior violates the YAML's specification, because the two keys in the mapping node are not unique in the YAML node representation. YAML forces a library to compare two mappings always by the Tags and their contents, even when the nodes represent some instances of classes that have their own equality comparison operators. Should the spec really force a library to reject such an input, even though that will not be expected by most of users? At this point, I started considering the purpose of defining YAML node's equality in the specification. In my opinion, the purpose is mainly for rejecting a mapping node because of its duplicated keys, or silently neglects the new entry when a !!merge key is processed. Another purpose is for allowing a library to represent some equal schalar nodes by an single identical node, as is written in the specification. > A YAML processor may treat equal scalars as if they were > identical. I have not understood the importance of this description, though. I can not think of any other meaning of nodes equality in the parsing and composition stages. Then, how about the meaning after the composition? If we take the definition of nodes' equality strictly, we can never construct native objects from YAML nodes. Note that, as seen above, constructing a ruby's hash or an instance of Test class from a YAML mapping node did not preserve the data equality. So, the YAML spec determins the node's equality after the composition stage, there will be very little application where one can make use of YAML. So, I think the definition of data equality after composition stage should be put beyond the scope of YAML specification. Consequently, I feel the current definition of node's equality is not suited to the real applications. My opinion is as following. 1. When a node has a Tag that is not a YAML's standard tag, the YAML parser and composer should evaluate nodes' equality only from the identity of the nodes, because the parser and composer do not know how to compare the data correctly. If the equality in their native data form matters, it should be checked at the construction stage, where the library has full access to the native object and to its equality operator. 2. For scalars with YAML's standard tags, the equality can be safely evaluated by their canonical forms as defined by the current specification. 3. For !!seq and !!map, the YAML parser and composer should evaluate the equality only from the identity of the nodes. This means, even if a mapping node has two collection nodes that have same content, the library should not reject such an input, instead, they should pass through it to the constructor. In most of cases, these collection classes are mapped into the arrays and hashes at the construction stage. The equality of the collection objects are evaluated by the system default equality operators. When the language does not have such array or hash classes, the YAML library might implement a collection classes to express those nodes. In such cases, the equality should be evaluated as the following. If two nodes are identical, they are equal. If two nodes have different Tag, they are not equal. If nodes are not recursive, their contents are compared one by one to judge equality. If nodes are recursive, the result can be the library dependent. Note that in any cases, the equality evaluation is done at the construction stage. a parser and a composer should not evaluate such content based equality. For the second one, I'm not sure if a parser and a composer must be aware of equality just to detect the duplicated keys or not. For the third one, I came to this conclusion because of the following reasons. At first, I considered the usage of !!seq and !!map nodes as map keys, which is the only case the equality matters in the parsing and composing stages. I did not think of many use cases where different instances of collections with same contents should be evaluated to be equal (to have the mapping to be rejected by the YAML processor?). At the same time, the definition given in the current specification seems to cause difficult issues as pointed out above for the recursive nodes. In addition, if we want to express some equal keys in a simgle mapping (again, for what?), we can always use anchors and aliases. Secondly, as same as the cases where a node has some non standard tag to be mapped into some native data, !!seq and !!map nodes are used as almost always the representations of the instances of array and hash. Even when the YAML specification defines their equality, it is meaningless unless the specification forbids such mapping. My opinion might be due to the lack of case studies of mine. If there is any library that deals with the collection nodes' equality strictly along with the specification and someone is making use of the definition, I want to learn about the case. I know this suggestion is very much controversial and probably contains misunderstandings of mine. So, I would like to listen to the opinions of yours. What do you expect to a YAML library for nodes' equality? What do you expect to the YAML specification for nodes' equality? Thank you for reading my long-long email and for your possible response to it. Best, Osamu TAKEUCHI ```
 Re: [Yaml-core] Equality of YAML nodes From: Oren Ben-Kiki - 2009-09-18 21:17:10 ```On Fri, 2009-09-18 at 21:33 +0900, Osamu TAKEUCHI wrote: > Hi all, > > I have some questions and opinions on the definition of > YAML data node's equality.. > > At first, I want to confirm that, a description > "two nodes are equal only when some condition is met" means > "two nodes are not equal when some condition is not met." Yes, it is an if-and-only-if, at least as far as the spec is concerned. > Then, I start the discussion with the equality of recursive > collection nodes. The problem you describe is widely discussed in literaturem as the (directed) graph isomorphism problem, and it occurs in many fields - type systems for functional languages (close to YAML's case), chemistry (are two molecules the same?), Prolog (unification algorithms), etc. The general problem may be exponential (it is NP, maybe even NP-complete). However, even naive algorithms do extremely well in practice, and only get exponential on very pathological cases (your examples aren't tricky enough to cause problems to a naive algorithm:-). > Let me confirm that the following two nodes > are not equal to each other. > > %YAML 1.2 > --- > &A { *A } > --- > &A { { *A } } > ... No, they are not equal; one is a single mapping that contains a key pointing to itself and a null value. The other has two mappings: the first contains a key that is a second mapping that contains a key pointing to the first, both keys having null values. There's no 1-1 mapping between them (the first has only one mappings, the second has two distinct ones). > Another reason is that if they should be equal, I can not > find a good way to compare such recursive nodes without > causing an infinite loop in my library. Mapping two global (directed) graphs (that contain cycles) to each other can be done without infinite cycles. > Next, I thought the following two are equal to each other. > > %YAML 1.2 > --- > &A{ *A, { *A } } > --- > &A{ { *A }, *A } > ... Yes, these are equal, because (as you point out) order of keys does not matter, you can construct a 1-1 mapping between them. > Then, what do you think about the next YAML document? > > %YAML 1.2 > --- > &A { &B { *B, *A }, *A } > ... > > I thought the mapping *A and mapping *B are equal to each > other because their graph topologies are the same, except > for the order of appearance in the document. Yes, they are the same. > If this > understanding is correct, a YAML parser should reject the > above input because a mapping can only contain unique keys > in YAML. Yes. > Note that such an example can be easily expanded > into further complex forms. Oh yes. The graph theory people can show you examples that will curl your hair. Of course these are "pathological" YAML, since (1) using mapping for keys is not very common, (2) having cycles in mappings is not very common, and (3) having both at once is even less common. > When I started implementing an equality evaluator that > works with such complicated cases, I found it is a > quite tough work. I wouldn't try to reinvent the wheel here. You can easily adapt one of the simple but effective algorithms already developed for this (google is your friend). Again, these will do the job quickly 99.9999% of the time, unless someone invest a lot of effort in creating truly pathological mapping keys. > So, I wondered if such a complicated comparison is > really implemented in other libraries... > Obviously, the answer was no. Alas... > I guess these results are not necessarily intended by the > library's implementer but just due to the way on which > ruby runtime compares two instances of ruby's Hash class. Yes. > So, my first several questions are: > Is there any YAML library that employs the strict node > comparison that is defined in the specification? Probably not :-( > Must I implement such a comarison for my library > if I want to support YAML 1.2 specification? Technically, yes... but it is Ok for a library to punt and pass the burden to the native data type implementation (which most YAML libraries do). > In which application, is such a strict comparison valuable? Yes, it ensures portability between different platforms. We do not want to tie YAML data to a specific platform or application. > Ok, I have implemented an equality evaluator that works > correctly for all cases I have tested, after quite a bit > of struggling. Great! > But then, another concern arose. With such an equality > evaluation, it is tricky to build a node tree defined > in the next YAML > ... > I have no idea to avoid this problem, do you? I didn't understand the problem. You have shown intermediate steps that have duplicate keys; but the final result is OK, which is all that matters. Can you clarify? > The following example is again from ruby 1.8.7's > YAML library. > > require 'yaml' > > # A class with no members > class Test > end > > # Create a hash with instances of Test > p hash = { Test.new => 1, Test.new => 2 } > > # {#=>1, #=>2} > > # Note that two instances of Test class are not > # equal to each other, because ruby compares > # class instances, by default, by their identity. As opposed to YAML, yes, which will cause problems when dumping it. Consider what would happen if the data would be loaded by a language that compares the content by default... Anything based on Prolog and/or LISP for example. > # Convert the hash into YAML. > print yaml = YAML.dump(hash) > > # --- > # !ruby/object:Test ? {} > # : 1 > # > # !ruby/object:Test ? {} > # : 2 > > ### Oops, this library has such an obvious bug! Well, technically, the author of the Test class has a bug. If he's comparing test objects based on their identity, he should serialize this identity somehow to make it explicit (!ruby/object:Test { ! ruby/MagicDefaultIdentityUniqueHash Hash: 0x7ff2c9f0 } or whatever). Objects with the same hash should be serialized as anchors and references. You may even justify loading the "UniqueHash" tag into whatever the default unique identity hash would be for calling "new Test" instead of preserving it exactly. > Should the spec really force a library to reject such an > input, even though that will not be expected by most of > users? I'd say "yes", and even for practical reasons. First, like I said, languages that take a more "data oriented" view of the world (say, Haskell, LISP, Prolog) will cheerfully consider these keys equal, so loading this data to them will fail. Second, in languages that do this sort of thing, the typical pattern is to say something like: a = new Test b = new Test mapping[a] = valueForA mapping[b] = valueForB c = someCondition ? a : b v = mapping[c] That is, the keys are kept _in addition_ to the mapping and are then used to locate values. However, this idiom does not translate to dumping/loading the mapping to a file: a = new Test mapping[a] = valueForA yamlText = dumpToYaml(mapping) mapping = loadFromYaml(yamlText) v = mapping[a] # Fails! Or, in a different program (or instance): mapping = loadFromYaml(yamlText) a = ... # How do I find 'a'? Maybe: a = mapping.findAKeyForTheValue(valueForA) # Hope there's only one... So this mapping is *practically* unusable as it is, once it is de/serialized to YAML. Having the YAML library reject the file is better, as it forces the author to consider what he is trying to achieve and do _that_. > At this point, I started considering the purpose of defining > YAML node's equality in the specification. In my opinion, the > purpose is mainly for rejecting a mapping node because of its > duplicated keys, or silently neglects the new entry when a > !!merge key is processed. Right. > Another purpose is for allowing a library to represent some > equal schalar nodes by an single identical node, as is written > in the specification. Yes. > > A YAML processor may treat equal scalars as if they were > > identical. > I have not understood the importance of this description, though. It is useful to declare that scalars don't have "identity" as such, but "objects" do. That is, "objects" have identity, but scalars don't - e.g., the number 4 has no "identity", but the "!!point { x: 4 }" does. This maps to the way 99% of the programs model their data (you can change the 'x' coordinate of the point and it will remain the "same" point, but you can't change the number 4 to the number 5, ever). In practical terms this allows YAML libraries to avoid having to re-serialize large scalars (e.g., binary data, texts, etc.). Not the _most_ useful thing in the world, but it does have its place. > I can not think of any other meaning of nodes equality in the > parsing and composition stages. Then, how about the meaning after > the composition? After composing native objects, YAML has no say at all about anything. The application may wreak whatever havoc it wants on these structures; that's its job after all. > If we take the definition of nodes' equality > strictly, we can never construct native objects from YAML nodes. I fail to see how that follows. The YAML document is intended to represent a stand-alone object. If you do "b = YAML.load(YAML.dump(a))" you do _not_ expect that "b" and "a" would have the same memory address (that is, have the same identity). You _do_ expected them to have the same content (that is, be equal). > Note that, as seen above, constructing a ruby's hash or an > instance of Test class from a YAML mapping node did not preserve > the data equality. In order for YAML to work, the representation of the object to YAML must follow YAML's rules. The representation of "Test" objects as YAML did _not_ follow these rules. The author of the Test class should have overriden the default serialization-to-YAML to resolve it. As long as he hasn't, he has a bug, and this isn't YAML's fault. > Consequently, I feel the current definition of node's equality > is not suited to the real applications. I am unconvinced... > My opinion is as following. [Basically make equality be defined on a per-tag basis, using the native implementation if needed] This would greatly weaken portability of YAML data between systems. Basically, each file would be tied to the specific platform and even specific application. A YAML library on a different platform or using a different application would have no way of knowing what the semantics is. Writing "generic" YAML tools (that know nothing about the tags) would become much more difficult, at times impossible. We'd rather not go down this road. The use case you demonstrated (using "Test" instances as keys) is pathological; I would be greatly suspicious of a system that used such a method, and I definitely wouldn't want to bend YAML out of shape to handle it. On the other hand, the following is a very reasonable practice, and much more common than your (IMO unreasonable) "Test" case: --- calendar: { year: 2009, month: 12, day: 1 }: ... { year: 2009, month: 12, day: 2 }: ... cars: { manufacturer: ford, model: gt }: ... { manufacturer: jaguar, model: xk }: ... I definitely would be surprised to learn it is OK to see two different ford GT entries in this mapping! > What do you expect to a YAML library for nodes' equality? YAML libraries may rely on the implementation of the native data type to do the right thing. They also typically provide a way for the author of the native data type to override the way it is de/serialized from/to YAML. It isn't the job of the library to ensure the right thing is done in this case (except for the common standard types provided by the platform/language). It is the responsibility of the author of the data type to ensure that the de/serialization to YAML correctly follows YAML rules. YAML libraries may also do all the heavy lifting themselves. As you point out, this isn't as easy as it seems on first sight; but there's plenty of code/algorithms/libraries out there that can help. > What do you expect to the YAML specification for nodes' equality? Just like it is right now :-) > Thank you for reading my long-long email and for your possible > response to it. Not at all; clarifying these issues is what this list is for. I hope I did manage to clarify the issue. Oren. ```
 Re: [Yaml-core] Equality of YAML nodes From: Osamu TAKEUCHI - 2009-09-22 04:38:42 ```Hi all, I want to complement my post on the other day. >> Note that, as seen above, constructing a ruby's hash or an >> instance of Test class from a YAML mapping node did not preserve >> the data equality. > > In order for YAML to work, the representation of the object to YAML must > follow YAML's rules. The representation of "Test" objects as YAML did > _not_ follow these rules. The author of the Test class should have > overriden the default serialization-to-YAML to resolve it. As long as he > hasn't, he has a bug, and this isn't YAML's fault. Since we are talking about modifying current YAML, it can not be an answer for my question to say "this is the YAML's rule". ;) So, let me discuss whether changing the rules is beneficial for the users or not at all. :( I already gave a feedback to your question how a value-oriented languages should treat ruby's Test object. But I'm afraid that was not selfcontaining enough. In my opinion, to have a specific YAML document really portable between environements, we must have some schema that describe the structure and the meaning of the nodes. So, there won't be a perfect portability without any pre-exchange of knowledge between the serializer and the deserializer. For languages such as ruby and C#, which provides access to the language's meta data, the definition of a class can be a useful schema of a YAML document. Imagine we are serializing an instance of a class SomeClass in C#. public class SomeClass { public int a = 1; public string b = "1"; public double c = 1; public Point3D p; public utf8u[] texts; bool Equals(object obj) { ... # specify how to compare SomeClass objects } } ... SomeClass obj = GetSomeClassInstanceWithMeaningfulContent(); string yaml = YAML.Serialize(obj); The resulting YAML document will be like the next. %YAML 1.2 --- !SomeClass a: 1 b: 1 # I prefered to have this as "1" for the readability, though. c: 1 p: X: 3 Y: 4 Z: 2.1 texts: - "..." - "..." - "..." - "..." - "..." Note that we do not have to go the next way. %YAML 1.2 --- !SomeClass a: 1 b: "1" # By default, 1 is resolved to !!int 1 c: !!float 1 # By default, 1 is resolved to !!int 1 p: !Point3D X: !!float 3 "Y": !!float 4 # By default, Y is resolved to !!bool true Z: 2.1 texts: - !!utf8u "..." - !!utf8u "..." - !!utf8u "..." - !!utf8u "..." - !!utf8u "..." We can resolve the tags for the child nodes from the definition of SomeClass automatically. I undestood YAML allow this, from the description that the tag resolution can be dependent on the path leading from the root to the node. Note that the meaning of each field must be also exchanged in the form of SomeClass's specification between the serializer and the deserializer, to interplete the contents of the fields correctly when deserializing. With languages that provides the access to the class metadata, a general purpose library can be build, which converts native objects of arbitrary classes like SomeClass into YAML document, with refering to the language's metadata database, without having any other explicit schema. Note that both the serializer and deserializer must know the definition of nodes' content, anyway, to interprete the conceptual meaning of each field. Using the definition of SomeClass for that purpose does not seem to weaken the document's portability. I think exchanging the information, how an instance of SomeClass should be evaluated its equality, between the serializer and the deserializer also does not weaken the portability. The way of comparison is often defined as the member function of the class, for example, Equals for C# objects and == for ruby objects. This was the reason why I defined the equality of YAML nodes with non standard YAML tags as: If the nodes are identical, they are equal. If the nodes have different tags, they are not equal. Otherwise, the result is dependent on the native objects' equality evaluator. Do not think this makes the YAML language dependent on a specific language. It merely makes the YAML language dependent on the schema of data. I guess it makes the YAML language applicable to wider variation of schemas of data. > We'd rather not go down this road. The use case you demonstrated (using > "Test" instances as keys) is pathological; I would be greatly suspicious > of a system that used such a method, and I definitely wouldn't want to > bend YAML out of shape to handle it. I hope you understood the concept of my President-Party example. If you think it is still useless, please give some response on it. Thank you, Osamu TAKEUCHI ```
 Re: [Yaml-core] Equality of YAML nodes From: Osamu TAKEUCHI - 2009-09-19 09:00:28 ```Hi Oren, Thanks for your response. I agree that the way you go is completely selfconsistent and theoretically very smart. But I feel it narrowers the application of YAML. Let me go through your response. > > --- > > &A { &B { *B, *A }, *A, null } > > > > ok: &A { } # 1 > > ok: &A { &B { } } # 2 > > ok: &A { &B { *B } } # 3 > > ok: &A { &B { *B, *A } } # 4 > > NG: &A { &B { *B, *A }, *A } # 5 > > ok: &A { &B { *B, *A }, *A, null } # 6 > I didn't understand the problem. You have shown intermediate steps > that have duplicate keys; but the final result is OK, which is all > that matters. Can you clarify? I wanted to discuss the way how a library build such a node. I think, most of libraries fail to build the node because they at first create a map node object and then add the key / value pairs one by one to it. When the library try to add the pair (*A: nil) to &A { &B { *B, *A } } as the fifth step, the map node object will find duplicated keys and throw an exception (or neglect it). Do you think there is a good way to avoid such a problem? > > class Test > > end > > > > # Create a hash with instances of Test > > p hash = { Test.new => 1, Test.new => 2 } > > > > # {#=>1, #=>2} > > > > # Note that two instances of Test class are not > > # equal to each other, because ruby compares > > # class instances, by default, by their identity. > > As opposed to YAML, yes, which will cause problems when dumping it. > Consider what would happen if the data would be loaded by a language > that compares the content by default... Anything based on Prolog and/or > LISP for example. My conclusion is different. The Prolog and LISP code should compare the nodes by their identity because they know, from the Tag, that the nodes represent instances of ruby's Test class. The way how an object should be compared should belong to the object's type. It should not be changed across the environment or across the YAML processor's parsing/composing/construction stages. > Well, technically, the author of the Test class has a bug. If he's > comparing test objects based on their identity, he should serialize this > identity somehow to make it explicit (!ruby/object:Test { ! > ruby/MagicDefaultIdentityUniqueHash Hash: 0x7ff2c9f0 } or whatever). > Objects with the same hash should be serialized as anchors and > references. You may even justify loading the "UniqueHash" tag into > whatever the default unique identity hash would be for calling "new > Test" instead of preserving it exactly. I agree with you that this works for the Test class. But how about for Hash class. As I showed, ruby's Hash class has different equality evaluation from the YAML's. Do you say it is a bug to convert ruby's Hash to YAML's mapping? Remember ruby did not treat &A { *A } equals to &B { *B } . >> p YAML.load("{ &A { *A }, &B { *B } }") #4 >> # {{{...}=>nil}=>nil, {{...}=>nil}=>nil} Then, if we go strict, a library have to do the following. !ruby/object:Hash UniqueHash: brabrabra Entries: ? !ruby/object:Hash &A UniqueHash: brabrabri Entries: ? *A ? !ruby/object:Hash &B UniqueHash: brabrabro Entries: ? *B I do not think people accept this, while I can barely accept the following. !ruby/object:Hash ? !ruby/object:Hash &A ? *A ? !ruby/object:Hash &B ? *B Of course, this is not what I want, too. I like the next best, though it might be unportable. { &A { *A }, &B { *B } } That's the reason I proposed my not portable but practical definition of equality in YAML. > So this mapping is *practically* unusable as it is, once it is > de/serialized to YAML. Having the YAML library reject the file is > better, as it forces the author to consider what he is trying to > achieve and do _that_. Ok, my example was too much simplified. USA: Presidents: !President[] - &PR1 name: George Washington - &PR2 name: John Adams (snip) - &PR41 name: George Bush - &PR42 name: William Clinton - &PR43 name: George Bush - &PR44 name: Barack Obama Parties: !Party[] - &PA1 name: Republican - &PA2 name: Democratic (snip) PresidentToParty: &MAP (snip) *PR41: *PA1 *PR42: *PA2 *PR43: *PA1 *PR44: *PA2 Note that *PR? are the instances of President class and *PA? are those of Party class. The tags are resolved from the parent's President[] and Party[] tags. In this example, we need some extra property like UniqueHash in President class just to make the YAML document to be valid. Note that *PR41 and *PR43 are equal to each other by the YAML's definition. I think the mapping node *MAP is usable if it can be safely loaded from this quasi-YAML document. > > > A YAML processor may treat equal scalars as if they were > > > identical. > > I have not understood the importance of this description, though. > > It is useful to declare that scalars don't have "identity" as such, but > "objects" do. That is, "objects" have identity, but scalars don't - > e.g., the number 4 has no "identity", but the "!!point { x: 4 }" does. > This maps to the way 99% of the programs model their data (you can > change the 'x' coordinate of the point and it will remain the "same" > point, but you can't change the number 4 to the number 5, ever). > > In practical terms this allows YAML libraries to avoid having to > re-serialize large scalars (e.g., binary data, texts, etc.). Not the > _most_ useful thing in the world, but it does have its place. I do not like this, because YAML's scalar nodes are not always mapped to the language native scalars. Remember String in ruby are objects rather than scalar. DateTime objects of ruby and C# too. #!/usr/bin/ruby a = "abc" b = a a[0]= "A" p b # "Abc" I prefer to have the nodes identity not flexible for the scalars, too, in order to maximize the portability of YAML between languages. > > I can not think of any other meaning of nodes equality in the > > parsing and composition stages. Then, how about the meaning after > > the composition? > > After composing native objects, YAML has no say at all about anything. > The application may wreak whatever havoc it wants on these structures; > that's its job after all. YAML does not say anything for the equality of the native objects after construction. But YAML prevents us to construct the native objects if it judges they are equal on its standard, when they appear as keys in a YAML mapping. I think this is inconsistent. So, what you say is probably that we must always have the UniqueHash property when we have to compare objects by their identity, isn't it? >> 1. When a node has a Tag that is not a YAML's standard tag, the YAML >> parser and composer should evaluate nodes' equality only from the >> identity of the nodes, because the parser and composer do not know >> how to compare the data correctly. If the equality in their native >> data form matters, it should be checked at the construction stage, >> where the library has full access to the native object and to its >> equality operator. > >> 3. For !!seq and !!map, the YAML parser and composer should evaluate >> the equality only from the identity of the nodes. This means, even >> if a mapping node has two collection nodes that have same content, >> the library should not reject such an input, instead, they should >> pass through it to the constructor. > > This would greatly weaken portability of YAML data between systems. Let me separate the issues. For the 1st rule of mine, it will not weaken the protability of YAML at any sense. Because when the data have Tags that specify the way to compare the native objects, the data can be correctly treated in any environment. The portability is completely preserved. For the 3rd rule of mine, it indeed weakens the portability. Even so, I wanted to allow the users to map their language native Hash object into a YAML's mapping node seamlessly. For me, the current spec seems to forbid it as discussed above. If you think YAML's !!map and !!seq should have their own way of equality evaluation independent of a specific language, it might be an option to allow the directives like next. %ALIAS !!map !ruby/object:Hash %ALIAS !!seq !ruby/object:Array This will preserve the portablity. If no library is aware of YAML's node equality, it will change nothing, though. Best, Osamu TAKEUCHI ```
 Re: [Yaml-core] Equality of YAML nodes From: BlueGM - 2009-09-20 15:34:19 ```Osamu, Oren: If I can butt in for a moment... I've been listening to the two of you discuss the equality of YAML nodes and I'm impressed by the understanding each of you has of computer science. In the "science" of computers, I'm not equal to either of you, but as I finally got the chance to really sit down with your last post Osamu, it seems to me that the key question being asked is about the purpose of YAML. At one point Osamu gives an example in which you must tag every map as being a Ruby hash. If I understand correctly, this is to obtain Ruby style equality semantics. This raises a much simpler question about the intent of YAML. The question is, is the purpose of YAML to represent native data structures in an easily readable format or to provide a portable way of storing the information within those structures? If the former is the case, then having identical semantics to the native structures used by programming languages is of greater importance. If the later is the case, then maintaining consistency within an intermediary format (the YAML data types) is more important. I was asking myself that question and then I looked at the specification, to try to understand its goals, and I see the following: The design goals for YAML are, in decreasing priority: 1. YAML is easily readable by humans. 2. YAML matches the native data structures of agile languages. 3. YAML data is portable between programming languages. So, now I'm wondering... which is really more important to the specification? That we perfectly match the native data structures of "agile languages" or that the data be portable between languages? It seems to me that if point 2 was meant to mean a perfect match to native structures, then Osamu is correct in saying that it is necessary to either define equality in terms of the native methods of determining such equality or some other mechanism (whether a tag or by providing a unique hash) so that the equality operator matches the native implementation. If, as seems more likely to me, point 2 was meant to imply creating data types that are "like" their native counterparts that appear in different languages (so that the native types are easily represented in a majority of use cases), then we fall through to point 3 where having a consistent definition of the data types (and, by extension, their equality) is more important. I'd love to hear what each of you has to say on this. As well as others. ^-^ Thanks, BlueG ```
 Re: [Yaml-core] Equality of YAML nodes From: Oren Ben-Kiki - 2009-09-20 18:22:27 ```On Sun, 2009-09-20 at 11:34 -0400, BlueGM wrote: > I was asking myself that question and then I looked at the specification, to > try to understand its goals, and I see the following: > > The design goals for YAML are, in decreasing priority: > > 1. YAML is easily readable by humans. > 2. YAML matches the native data structures of agile languages. > 3. YAML data is portable between programming languages. > > > So, now I'm wondering... which is really more important to the > specification? That we perfectly match the native data structures of "agile > languages" or that the data be portable between languages? It seems to me > that if point 2 was meant to mean a perfect match to native structures, then > Osamu is correct in saying that it is necessary to either define equality in > terms of the native methods of determining such equality or some other > mechanism (whether a tag or by providing a unique hash) so that the equality > operator matches the native implementation. If, as seems more likely to me, > point 2 was meant to imply creating data types that are "like" their native > counterparts that appear in different languages (so that the native types > are easily represented in a majority of use cases), then we fall through to > point 3 where having a consistent definition of the data types (and, by > extension, their equality) is more important. Yes, you have hit the nail on the head. I think one the errata we need to fix to reverse the order between points 2 and 3, to clarify this point (unless Clark objects). The point never was to "perfectly" match native data structures. Another good example of this point (that also triggered a lot of debate at the time) was PHP's native maps-with-ordered-and-duplicate-keys. YAML's solution to that is to use a sequence of single key/value pairs, as in !!omap [ foo: bar, foo: baz ]. The same arguments were raised there as well as here (for identity): default mapping, matching native data structures, portability, cleanness of the information model, commonality of use cases, etc. The decision (there as well as here for identity) was/is to stick with portability and cleanness of the data and common uses cases rather than exactly matching a specific programming language and less common use cases. Have fun, Oren Ben-Kiki ```
 Re: [Yaml-core] Equality of YAML nodes From: Osamu TAKEUCHI - 2009-09-22 02:15:19 ```Thanks for your comments, BlueG. In this email, I want to talk mainly about the equality evaluation scheme for !!map and !!seq. Namely, I do not want to talk about that of the nodes with non standard Tags, because the latter does not seem to be incompatible with the portability of YAML. > 1. YAML is easily readable by humans. > 2. YAML matches the native data structures of agile languages. > 3. YAML data is portable between programming languages. In general, the importance is 1 > 3 > 2, for me, surprised? But, all of them should be for the primary purpose. 0. YAML is useful for people. So, if portablity maximization or definition cleanness maximization becomes incompatible to this primary purpose, I think we should seek some work around. In addition, if a part of the specification is always neglected by users, I think the description should be reconsidered. I now feel that the definition of equality I gave the other day was not a real definition but more like the description of the current situation of YAML. So, in order to make it better, I reconsidered it. The information model of PHP's Hash is incompatible to that of !!map as Oren pointed out. So, we have to distinguish them in the YAML's specification. I agree with Oren at this point. Hash and Array of many languages have the same information model to that of !!map or !!seq, except for the definition of euqality. For example, an array of object (object[]) in C# has same information model as that of !!seq. However, equality of C# arrays are evaluated by their identity while !!seq compares the child objects one by one. So, they are incompatible in the definition of equality. For some cases, the difference is much less obvious, as the case of the ruby's Array and Hash. However, strictly speaking, the equality definition of !!map and !!seq is incompatible to that of Hash and Array in almost all languages. If we represent such a native data object with a !!map or !!seq node, the difference of equality definition can cause a problem. But, in reality, it is very rare cases where the problem appears obviously. The reason is that it is very rare to have !!map or !!seq as the key of a mapping node and even if we might use !!map or !!seq as the mapping key, it is further rare to have duplicated keys in a mapping. Note that this is different from my President-Party example because the keys were not !!map but !President, there. Since it is rare, almost all existing libraries neglect the difference in the definition of equality. But if we really want to go strict with YAML's specification, we can never use !!map and !!seq to represent our Hash and Array objects. I think the best way to work around this problem is giving up to fully define the behavior of a mapping node that have duplicated keys with !!map or !!seq Tag. Then, we do not have to define the equality of !!map and !!seq too strictly. I revise my proposal for the equality definition of !!map, !!seq and other YAML's standard collection nodes to be: If two nodes are identical, they are equal. If they are of different content, they are not equal. If they are of equal content, the result is *undefined*. They might be judged as equal but they might not. This definition is useful enough for most of application and weakens the portability of YAML very little because it is different from the current definition only for the cases where a mapping node is rejected due to the duplicated keys. I think this is also the way most of the libraries implement YAML, now and in the future. It fills the gap between the ideal and the real worlds. At the same time, it is less strict enough to allow users to represent their Hash and Array by !!map and !!seq nodes. Note that the above definition is even compatible to the definition of equality just by the identity. If we should think of cleanness of the definition, instead of the portability of YAML, I'm willing to change the subject. Best, Osamu TAKEUCHI ```
 Re: [Yaml-core] Equality of YAML nodes From: Oren Ben-Kiki - 2009-09-30 05:33:30 ```On Tue, 2009-09-22 at 10:46 +0900, Osamu TAKEUCHI wrote: > Thanks for your comments, BlueG. > I revise my proposal for the equality definition of !!map, > !!seq and other YAML's standard collection nodes to be ... Well, as long as we are agreed this is a change to the YAML spec, and a pretty deep one at that, then this falls out of scope for YAML 1.2 (which is finalized other than errors/typos/etc.). It may be YAML 1.3 or 2.0 material. Changing the definition of equality - making it "undefined" in some cases - would have interesting implications and require a lot of careful thought and work. I'd be happy to do it if this issue proves a problem in practice... Though right now I'm swamped at work. It is hard for me to find the time to collect the errata for the current spec, as you may have noticed :-S Have fun, Oren Ben-Kiki ```
 Re: [Yaml-core] Equality of YAML nodes From: Osamu TAKEUCHI - 2009-10-04 08:00:51 ```Hi Oren, > On Tue, 2009-09-22 at 10:46 +0900, Osamu TAKEUCHI wrote: >> Thanks for your comments, BlueG. >> I revise my proposal for the equality definition of !!map, >> !!seq and other YAML's standard collection nodes to be ... > > Well, as long as we are agreed this is a change to the YAML spec, and a > pretty deep one at that, then this falls out of scope for YAML 1.2 > (which is finalized other than errors/typos/etc.). It may be YAML 1.3 or > 2.0 material. > > Changing the definition of equality - making it "undefined" in some > cases - would have interesting implications and require a lot of careful > thought and work. I'd be happy to do it if this issue proves a problem > in practice... Though right now I'm swamped at work. It is hard for me > to find the time to collect the errata for the current spec, as you may > have noticed :-S I am still thinking what is the best way of defining the equality and the identity of YAML nodes. In the consideration, a question arose. Let me confirm the next two nodes are equal to each other in the current spec. [ &A "abc", *A ] [ "abc", "abc" ] This is because identity is not defined for the scalar nodes while it is defined for collection nodes. So, in contrast, the next two nodes are not equal to each other. [ &A ["abc"], *A ] [ ["abc"], ["abc"] ] Note that the equality of collection nodes are aware of the _identity_ in each of the node trees as well as the equality of nodes across the node trees. Is my understanding correct? Best, Osamu TAKEUCHI ```
 Re: [Yaml-core] Equality of YAML nodes From: BlueGM - 2009-09-20 19:31:27 ```> -----Original Message----- > From: Oren Ben-Kiki [mailto:oren@...] > Sent: Sunday, September 20, 2009 2:22 PM > Subject: RE: [Yaml-core] Equality of YAML nodes > > On Sun, 2009-09-20 at 11:34 -0400, BlueGM wrote: > Yes, you have hit the nail on the head. I think one the > errata we need to fix to reverse the order between points 2 > and 3, to clarify this point (unless Clark objects). > > The point never was to "perfectly" match native data > structures. Another good example of this point (that also > triggered a lot of debate at the > time) was PHP's native maps-with-ordered-and-duplicate-keys. > YAML's solution to that is to use a sequence of single > key/value pairs, as in !!omap [ foo: bar, foo: baz ]. The > same arguments were raised there as well as here (for > identity): default mapping, matching native data structures, > portability, cleanness of the information model, commonality > of use cases, etc. The decision (there as well as here for > identity) was/is to stick with portability and cleanness of > the data and common uses cases rather than exactly matching a > specific programming language and less common use cases. If that's the intent, then I agree that points 2 and 3 should be reversed. And perhaps point 2 (what would become point 3) should be reworded slightly to emphasize its "likeness" to those native data types rather than leaving it open to interpretation as to whether it means "like" or "exactly matches". Which then leaves the question of, does this narrow the application of YAML, as Osamu states when he writes: "I agree that the way you go is completely selfconsistent and theoretically very smart. But I feel it narrowers the application of YAML." And, with that, if it does narrow the application of YAML, how great of a concern is it? On that, I imagine there will be a great difference of opinion. My own opinion is that while it may make YAML a little harder to use in some cases (certainly, some other recent conversations relate to problems of the differences between the native representation of data and their YAML representation), it doesn't actually prevent YAML's use in those cases and widens its potential use in cases where systems must communicate cross platform or language, which could be exceedingly difficult and, in some cases, even impossible otherwise. In other words, I see the value of portability as much greater than the extra cost of YAML not always perfectly matching the definitions used by languages. I also see the cost of not having that portability on applications that require it to be much greater than the cost to applications that have to deal with differences in the definitions of the types used (including their equality). This becomes notable, by way of example, in the oft cited case of a YAML editor, which will quite frequently be in a different language than the language that the data in the YAML file will be used for. Of course, this is largely a theoretical discussion anyways. As Osamu quite effectively points out, existing libraries don't implement the YAML specification perfectly. That's hardly new, or unique to YAML, though. It's the real culprit behind cross platform incompatibility in more than a few public standards. Those, of course, are my thoughts. I'd be interested in hearing what others have to say about this too. As I said, it is largely a question of opinion (being a cost/value estimate of each). Thanks, BlueG P.S. The yaml-core mailing list hasn't yet forwarded the message that Oren was replying to, so it might be out of sequence with this message, assuming it makes it through at all =p. If it doesn't I'll resend that message to the list later. ```