From: Joe L. <jo...@bu...> - 2001-08-02 18:28:12
|
(Hey listserv, don't you DARE drop this email again! Grr!) Here's a semi-formal description of YAML's information model. It does not reflect the consensus of this group in areas where the group has not yet achieved consensus (such as in node attribution). It also could be wrong just because I got it wrong. But it's a starting point. The format is the one I generally use when I write specifications. I can't say I've ever seen anyone else take this approach, but enough people like it that I keep using it. I'm open to other approaches as well. #### YAML Information Model #### A "document" consists of the following: - An ordered set of zero or more "maps" A "map" consists of the following: - An unordered set of zero or more "pairs" A "pair" consists of the following: - A "key" - A "node" A "key" consists of the following: - A string of zero or more "characters" A "typed node" consists of the following: - An optional "class name" - A "node" A "class name" consists of the following: - A string conforming to production svalue A "node" is one of the following: - A "map" - A "list" - A "scalar" - A "null" A "list" consists of the following: - An ordered set of zero or more "typed nodes" A "scalar" consists of the following: - A string of zero or more "characters" A "null" has no constituents. A "character" is one of the values #x0 through #xFFFFFFFF, inclusive NOTE: The current spec disallows periods in class names. java.util.Date is an example of a Java class name. Instead of using svalue, maybe we should come up with a production that better accommodates class names. ~Joe |
From: Joe L. <jo...@bu...> - 2001-08-02 11:58:17
|
Here's a semi-formal description of YAML's information model. It does not reflect the consensus of this group in areas where the group has not yet achieved consensus (such as in node attribution). It also could be wrong just because I got it wrong. But it's a starting point. The format is the one I generally use when I write specifications. I can't say I've ever seen anyone else take this approach, but enough people like it that I keep using it. I'm open to other approaches as well. #### YAML Information Model #### A "document" consists of the following: - An ordered set of zero or more "maps" A "map" consists of the following: - An unordered set of zero or more "pairs" A "pair" consists of the following: - A "key" - A "node" A "key" consists of the following: - A string of zero or more "characters" A "typed node" consists of the following: - An optional "class name" - A "node" A "class name" consists of the following: - A string conforming to production svalue A "node" is one of the following: - A "map" - A "list" - A "scalar" - A "null" A "list" consists of the following: - An ordered set of zero or more "typed nodes" A "scalar" consists of the following: - A string of zero or more "characters" A "null" has no constituents. A "character" is one of the values #x0 through #xFFFFFFFF, inclusive NOTE: The current spec disallows periods in class names. java.util.Date is an example of a Java class name. Instead of using svalue, maybe we should come up with a production that better accommodates class names. |
From: Joe L. <jo...@bu...> - 2001-08-02 13:25:31
|
(I'm resending this. This is the second time the list has dropped a post.) Here's a semi-formal description of YAML's information model. It does not reflect the consensus of this group in areas where the group has not yet achieved consensus (such as in node attribution). It also could be wrong just because I got it wrong. But it's a starting point. The format is the one I generally use when I write specifications. I can't say I've ever seen anyone else take this approach, but enough people like it that I keep using it. I'm open to other approaches as well. #### YAML Information Model #### A "document" consists of the following: - An ordered set of zero or more "maps" A "map" consists of the following: - An unordered set of zero or more "pairs" A "pair" consists of the following: - A "key" - A "node" A "key" consists of the following: - A string of zero or more "characters" A "typed node" consists of the following: - An optional "class name" - A "node" A "class name" consists of the following: - A string conforming to production svalue A "node" is one of the following: - A "map" - A "list" - A "scalar" - A "null" A "list" consists of the following: - An ordered set of zero or more "typed nodes" A "scalar" consists of the following: - A string of zero or more "characters" A "null" has no constituents. A "character" is one of the values #x0 through #xFFFFFFFF, inclusive NOTE: The current spec disallows periods in class names. java.util.Date is an example of a Java class name. Instead of using svalue, maybe we should come up with a production that better accommodates class names. |
From: Clark C . E. <cc...@cl...> - 2001-08-02 20:31:31
|
Thank you Joe. Overall, this information model is identical in spirit to the current information model with one exception: It introduces a "class" attribute to each node. That being said, I have a few items which should be considered: 1. I think that a scalar node should be defined not as a string of zero or more characters, but rather as any object that can be _serialized_ as a tuple: (type name, sequence of zero or more characters) 2. Characters must be defined with a one to one match with Unicode. In particular, 0x0 through 0xFFFFFFFF is too broad. The character code point range should be limited to... [#x0-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] plus any further restrictions given by the unicode specifcation, namely furter excluding those paris used for the byte order mark, which are... { 1FFFE, 1FFFF, 2FFFE, 2FFFF, 3FFFE, 3FFFF, 4FFFE, 4FFFF, 5FFFE, 5FFFF, 6FFFE, 6FFFF, 7FFFE, 7FFFF, 8FFFE, 8FFFF, 9FFFE, 9FFFF, AFFFE, AFFFF, BFFFE, BFFFF, CFFFE, CFFFF, DFFFE, DFFFF, EFFFE, EFFFF, FFFFE, FFFFF, 10FFFE, 10FFFF } Further, to prevent any transmition problems, we should exclude... [#x0-#x1F] less #x9, #xA, #xD 3. We need also need a sequential information model which extends the core information model by... a) adding an anchor to each node b) introducing the reference node. Best, Clark |
From: Joe L. <jo...@bu...> - 2001-08-02 21:11:56
|
At 04:36 PM 8/2/2001 -0400, Clark C . Evans wrote: > 1. I think that a scalar node should be defined not as > a string of zero or more characters, but rather as > any object that can be _serialized_ as a tuple: > > (type name, sequence of zero or more characters) Are you saying that the in the info model the scalar does not have an associated type, while in the serialization it does? In the model I posted, a scalar is a possible value of an optionally typed node. I think the model already gives you that tuple. Or are you saying that you'd rather not have the type factored out of the individual nodes, so the model looks like this: A "node" is one of the following: - A "map" - A "list" - A "scalar" - A "null" A "map" consists of the following: - A "type name" - An unordered set of zero or more "pairs" A "list" consists of the following: - A "type name" - An ordered set of zero or more "nodes" A "scalar" consists of the following: - A "type name" - A string of zero or more "characters" A "null" consists of the following: - A "type name" This informationally equivalent to what I posted, just a different way of portraying it. > 2. Characters must be defined with a one to one match > with Unicode. In particular, 0x0 through 0xFFFFFFFF > is too broad. The character code point range > should be limited to... > > [#x0-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Ah, but this is not the YAML text file. YAML's eu8 production allows this. > 3. We need also need a sequential information model > which extends the core information model by... > > a) adding an anchor to each node > b) introducing the reference node. Is that part of the information model? The information model describes what YAML means to represent, not how YAML represents it. A purely sequential API may need to make this distinction, but I argue that this is an implementation detail. If multiple nodes contain the same node, is it informational to know which node ended up getting the serialization and which ones got the reference? The distinction didn't exist pre-serialization; it doesn't round-trip. ~Joe |
From: Clark C . E. <cc...@cl...> - 2001-08-02 21:28:00
|
On Thu, Aug 02, 2001 at 05:18:54PM -0400, Joe Lapp wrote: | At 04:36 PM 8/2/2001 -0400, Clark C . Evans wrote: | > 1. I think that a scalar node should be defined not as | > a string of zero or more characters, but rather as | > any object that can be _serialized_ as a tuple: | > | > (type name, sequence of zero or more characters) | | This informationally equivalent to what I posted, just a | different way of portraying it. Yes, it's a different way of saying the same thing. I like what you wrote it's very good. | > 2. Characters must be defined with a one to one match | > with Unicode. In particular, 0x0 through 0xFFFFFFFF | > is too broad. The character code point range | > should be limited to... | > | > [#x0-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] | > | > plus any further restrictions given by the unicode | > specifcation, namely furter excluding those paris | > used for the byte order mark, which are... | > | > { 1FFFE, 1FFFF, 2FFFE, 2FFFF, 3FFFE, 3FFFF, | > 4FFFE, 4FFFF, 5FFFE, 5FFFF, 6FFFE, 6FFFF, | > 7FFFE, 7FFFF, 8FFFE, 8FFFF, 9FFFE, 9FFFF, | > AFFFE, AFFFF, BFFFE, BFFFF, CFFFE, CFFFF, | > DFFFE, DFFFF, EFFFE, EFFFF, FFFFE, FFFFF, | > 10FFFE, 10FFFF } | | Ah, but this is not the YAML text file. YAML's eu8 | production allows this. The information model must be limited to unicode, and these are the limitations. If we don't limit to unicode then we won't round trip through Java, Python, and other unicode compliant languages. I may have thrown you by including the following... | > | > Further, to prevent any transmition problems, we | > should exclude... [#x0-#x1F] less #x9, #xA, #xD This is the only additional restriction placed on the serialization. Sorry about the confusion. The above constraint does not come from unicode (I think...) and is only necessary in the serialization. | > 3. We need also need a sequential information model | > which extends the core information model by... | > | > a) adding an anchor to each node | > b) introducing the reference node. | | Is that part of the information model? The information model | describes what YAML means to represent, not how YAML represents it. Ok. But it is a model. ;) | A purely sequential API may need to make this distinction, but I | argue that this is an implementation detail. If multiple nodes | contain the same node, is it informational to know which node ended | up getting the serialization and which ones got the reference? The | distinction didn't exist pre-serialization; it doesn't round-trip. Right. Best, Clark |