## Re: [Yaml-core] Comments on YAML from Python list

 Re: [Yaml-core] Comments on YAML from Python list From: Clark C . Evans - 2002-07-18 21:44:31 ```On Thu, Jul 18, 2002 at 03:49:16PM -0400, Clark C . Evans wrote: | I'll try to find time to participate, but time is always in short supply. Great news. | [1] Canonical transforms, such as {a, b, c} -> [a, b, c] -> {(1:a), (3:c), | (2:b)}. There are a few dozens of them among set, seq, dict, seqdict. | Some have partial inverse. None of them are one-one correspondence. | That's why I let all these four as basic structures. These four are the | combination of keyd/nonkeyed ordered/unordered. Additional kinds of | structures, such as bags (whether keyed and whether ordered), may be | added later on. [4] I think bags vs set distinction is also useful; to this you need one more permutation: unique/non-unique. Thus, a set is unordered/unique and a bag is unordered/non-unique. Of course, once you add keyed/nonkeyed, the unique/non-unique parameter applies to keys, values, or both. And by the time you get this far... it gets complicated. YAML choose another logical rationale for its containers -- that they be mathematical functions from a domain (set) to a range (bag): - keyed/unordered (mapping) where the keys are the domain; and - ordered/nonkeyed (sequence) where the indicies are the domain. These two combinations have the advantage that they are found in most modern programming languages as dictionary/hashtable, list/array. With this foundation, bags can be represented using the list, ignoring the indicies. The set can be emulated through a dictionary, with null values. The named-list (XMLish) can be modeled using a combination of lists and dictionaries. All in all, these other combinations are less prominent than the two core "functional" containers: the dictionary/hashtable and list/array. Further, these core containers are found in just about every language, while not every language has sets, bags, or named lists. I hope this justification makes sense to you. By choosing a functional basis, YAML also clears the way for a very clean implementation of various graph operations; for example, with a single YPATH expression, one can uniquely identify each node in the graph (given the starting root, of course). | [2] I tried the following kinds of indentations (where n is level) | '(%s)' % n | ' ' * n | ' ' * n + '|' | Obviously there can be a lot of other variations. Such flexibility | would allow many common document formats to be transformed into | conforming format with minimum effort, sometimes by just adding a | metacomment at the beginning of the document. For example, the formats | of the current paragraphs should be accommodated. Too funny, we also used | for a while to indicate indented content, we eventually tossed this in favor of an "autodetection" mechanism backed by a way to explicity state the number of spaces used for indentation at a given level. I guess reading the spec is the best for this... we had many false-starts before we collectively stumbled upon this one. Your notion of (n) is intriguing... only it would greatly hurt "readability" and this is one of our first goals. Thus, while it is a very clever approach; I think sticking with indentation is best. | [3] I would allow encoding and encryption to be allowed at a per node | basis, not just at the file level. In reality how to break up a tree | into subtrees to fit in files is largely arbitrary. This calls for meta | comments on each node with a simple syntax for describing them. (I assume you mean character encoding) Yes, we've played with this one quite a bit. The problem is that most editors only handle one type of character encoding at a time. Therefore, while this may be good for binary data, I think editors will mess it up. As a way to to alleviate these sort of pains, we've adopted Base64 encoding for leaf values. This is done as a type. One may be seduced into trying to make it another syntax form; but the problem is you'd also have to specify a character encoding and this is where the trouble starts. So, we felt that it was a good ballence to support a !binary type with a |base64 format but not try and integrate this use case any further. | [4] One thing I have not solved is whether the keys can only be strings. If | keys can be substructures themselves, there are further correspondence | between sets, dicts and bags, such as {a, b} -> {a:1, b:1}. This leads | to the issue of the identity of structures. Example: {a, b}=={a} if | a==b. This complicates things and that's perhaps where I stopped. | (Over-generalization perhaps?) We've allowed keys to be any sort of object. This was needed since in Python, any object can be put into a dictionary if it is hashable, (has a function to return a hash key). Also, tuples can be used as keys in Python. Since one of our primary goals was to be able to serialize native constructs (esp in Python/Perl) this was a use case. As far as identity goes... this may be a cop-out, however, if a == b, then I think { a, b } is an error. This keeps simple things simple. | So my overall comment is that this approach can be made more 'meta' than any | particular syntax or structure would allow. Finding the ballence is the hardest thing to find. I'm sure YAML isn't perfect in this regard; but I think it's a lot closer than XML. ;) | The worst thing about xml is that one has to conform to its (mostly | arbitrary) syntax conventions instead of thinking about the underlying | data structure that's pertinent for the task at hand. I do believe | that the good thing about standards is there are so many to choose from. This much we definately agree upon, as someone who spent many months in :~( struggling with its nusiances. | A meta syntax would open up the possibility of interoperability on a | much larger scale than xml could handle comfortably. It is often easier | to define a particular syntax by fixing some parameters in a meta syntax. | Perhaps these are already in yaml since I had only a half | hour reading of its docs. Well, if you have any suggestions to improve YAML, we are all open. If we don't take the suggestion, rest assured we will search our hearts for a resonable explanation why. BTW, it's a pleasure to have you "on-board". Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software ```

 [Yaml-core] Comments on YAML from Python list From: Clark C . Evans - 2002-07-18 19:41:19 ```Huaiyu Zhu worked on something similar to YAML a few years ago, and has provided the following comments. ----- Forwarded message from Huaiyu Zhu ----- From: huaiyu@... (Huaiyu Zhu) Newsgroups: comp.lang.python Subject: Re: XML overuse? (was Re: Python to XML to Python conversion) To: python-list@... Date: Thu, 18 Jul 2002 18:10:40 +0000 (UTC) Clark C . Evans wrote: >On Tue, Jul 16, 2002 at 10:14:51PM +0000, Huaiyu Zhu wrote: >| Thanks a lot for this link. The basic idea is very similar, but apparently >| they have done a lot more of formal specification than I have ever >| attempted. There are several differences in the details, so neither is >| superset of the other. I'll comment on the differences once I have time to >| read through their docs. > >I look forward to the commentary, could you do it or cc the >YAML discussion list? That'll be after I get time to read through YAML docs and review my old code and docs. >| The emphasis is on using indentation and leading markers to denote >| structure, in contrast to markups, puctuations, quotes and escapes in the >| markup languages. > >Exactly. We started with leading markers (% and @ initially) and >eventually found ways that allowed us to skip these... How like minds think alike. :-) Perl opened my mind to the possibility of heterogeneous hierarchical data structures. >I'd love to hear about the overlap; I'm sure we don't do everything. >But if you found something important that we don't have, I'd love to >know since we'd like to start finalizing the spec at this time so that >implementations can start emerging. > >I'd love to hear more about your thoughts on YAML, and if possible, >we'd really welcome your participation! I'll try to find time to participate, but time is always in short supply. Here are some comments at first glance. I don't see a description of the semantics of the structures independent of any syntax. It is possible to define all the canonical transforms among the structures [1] without concerning any particular representation. I'd also like to emphasize that all the indentations, markers etc should be configurable in a document[2][3]. [1] Canonical transforms, such as {a, b, c} -> [a, b, c] -> {(1:a), (3:c), (2:b)}. There are a few dozens of them among set, seq, dict, seqdict. Some have partial inverse. None of them are one-one correspondence. That's why I let all these four as basic structures. These four are the combination of keyd/nonkeyed ordered/unordered. Additional kinds of structures, such as bags (whether keyed and whether ordered), may be added later on. [4] [2] I tried the following kinds of indentations (where n is level) '(%s)' % n ' ' * n ' ' * n + '|' Obviously there can be a lot of other variations. Such flexibility would allow many common document formats to be transformed into conforming format with minimum effort, sometimes by just adding a metacomment at the beginning of the document. For example, the formats of the current paragraphs should be accommodated. [3] I would allow encoding and encryption to be allowed at a per node basis, not just at the file level. In reality how to break up a tree into subtrees to fit in files is largely arbitrary. This calls for meta comments on each node with a simple syntax for describing them. [4] One thing I have not solved is whether the keys can only be strings. If keys can be substructures themselves, there are further correspondence between sets, dicts and bags, such as {a, b} -> {a:1, b:1}. This leads to the issue of the identity of structures. Example: {a, b}=={a} if a==b. This complicates things and that's perhaps where I stopped. (Over-generalization perhaps?) So my overall comment is that this approach can be made more 'meta' than any particular syntax or structure would allow. The worst thing about xml is that one has to conform to its (mostly arbitrary) syntax conventions instead of thinking about the underlying data structure that's pertinent for the task at hand. I do believe that the good thing about standards is there are so many to choose from. A meta syntax would open up the possibility of interoperability on a much larger scale than xml could handle comfortably. It is often easier to define a particular syntax by fixing some parameters in a meta syntax. Perhaps these are already in yaml since I had only a half hour reading of its docs. Huaiyu -- http://mail.python.org/mailman/listinfo/python-list ----- End forwarded message ----- -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software ```
 Re: [Yaml-core] Comments on YAML from Python list From: Clark C . Evans - 2002-07-18 21:44:31 ```On Thu, Jul 18, 2002 at 03:49:16PM -0400, Clark C . Evans wrote: | I'll try to find time to participate, but time is always in short supply. Great news. | [1] Canonical transforms, such as {a, b, c} -> [a, b, c] -> {(1:a), (3:c), | (2:b)}. There are a few dozens of them among set, seq, dict, seqdict. | Some have partial inverse. None of them are one-one correspondence. | That's why I let all these four as basic structures. These four are the | combination of keyd/nonkeyed ordered/unordered. Additional kinds of | structures, such as bags (whether keyed and whether ordered), may be | added later on. [4] I think bags vs set distinction is also useful; to this you need one more permutation: unique/non-unique. Thus, a set is unordered/unique and a bag is unordered/non-unique. Of course, once you add keyed/nonkeyed, the unique/non-unique parameter applies to keys, values, or both. And by the time you get this far... it gets complicated. YAML choose another logical rationale for its containers -- that they be mathematical functions from a domain (set) to a range (bag): - keyed/unordered (mapping) where the keys are the domain; and - ordered/nonkeyed (sequence) where the indicies are the domain. These two combinations have the advantage that they are found in most modern programming languages as dictionary/hashtable, list/array. With this foundation, bags can be represented using the list, ignoring the indicies. The set can be emulated through a dictionary, with null values. The named-list (XMLish) can be modeled using a combination of lists and dictionaries. All in all, these other combinations are less prominent than the two core "functional" containers: the dictionary/hashtable and list/array. Further, these core containers are found in just about every language, while not every language has sets, bags, or named lists. I hope this justification makes sense to you. By choosing a functional basis, YAML also clears the way for a very clean implementation of various graph operations; for example, with a single YPATH expression, one can uniquely identify each node in the graph (given the starting root, of course). | [2] I tried the following kinds of indentations (where n is level) | '(%s)' % n | ' ' * n | ' ' * n + '|' | Obviously there can be a lot of other variations. Such flexibility | would allow many common document formats to be transformed into | conforming format with minimum effort, sometimes by just adding a | metacomment at the beginning of the document. For example, the formats | of the current paragraphs should be accommodated. Too funny, we also used | for a while to indicate indented content, we eventually tossed this in favor of an "autodetection" mechanism backed by a way to explicity state the number of spaces used for indentation at a given level. I guess reading the spec is the best for this... we had many false-starts before we collectively stumbled upon this one. Your notion of (n) is intriguing... only it would greatly hurt "readability" and this is one of our first goals. Thus, while it is a very clever approach; I think sticking with indentation is best. | [3] I would allow encoding and encryption to be allowed at a per node | basis, not just at the file level. In reality how to break up a tree | into subtrees to fit in files is largely arbitrary. This calls for meta | comments on each node with a simple syntax for describing them. (I assume you mean character encoding) Yes, we've played with this one quite a bit. The problem is that most editors only handle one type of character encoding at a time. Therefore, while this may be good for binary data, I think editors will mess it up. As a way to to alleviate these sort of pains, we've adopted Base64 encoding for leaf values. This is done as a type. One may be seduced into trying to make it another syntax form; but the problem is you'd also have to specify a character encoding and this is where the trouble starts. So, we felt that it was a good ballence to support a !binary type with a |base64 format but not try and integrate this use case any further. | [4] One thing I have not solved is whether the keys can only be strings. If | keys can be substructures themselves, there are further correspondence | between sets, dicts and bags, such as {a, b} -> {a:1, b:1}. This leads | to the issue of the identity of structures. Example: {a, b}=={a} if | a==b. This complicates things and that's perhaps where I stopped. | (Over-generalization perhaps?) We've allowed keys to be any sort of object. This was needed since in Python, any object can be put into a dictionary if it is hashable, (has a function to return a hash key). Also, tuples can be used as keys in Python. Since one of our primary goals was to be able to serialize native constructs (esp in Python/Perl) this was a use case. As far as identity goes... this may be a cop-out, however, if a == b, then I think { a, b } is an error. This keeps simple things simple. | So my overall comment is that this approach can be made more 'meta' than any | particular syntax or structure would allow. Finding the ballence is the hardest thing to find. I'm sure YAML isn't perfect in this regard; but I think it's a lot closer than XML. ;) | The worst thing about xml is that one has to conform to its (mostly | arbitrary) syntax conventions instead of thinking about the underlying | data structure that's pertinent for the task at hand. I do believe | that the good thing about standards is there are so many to choose from. This much we definately agree upon, as someone who spent many months in :~( struggling with its nusiances. | A meta syntax would open up the possibility of interoperability on a | much larger scale than xml could handle comfortably. It is often easier | to define a particular syntax by fixing some parameters in a meta syntax. | Perhaps these are already in yaml since I had only a half | hour reading of its docs. Well, if you have any suggestions to improve YAML, we are all open. If we don't take the suggestion, rest assured we will search our hearts for a resonable explanation why. BTW, it's a pleasure to have you "on-board". Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software ```