From: Oren Ben-K. <or...@ri...> - 2001-07-31 15:20:58
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | > document ::= bom? (topmap (sep topmap)* eol)? > | > topmap ::= pair(0) (eol(0) pair(0))* > | > | Clark seems to agree with you so I modified the productions > accordingly. > | Note that this means a document can't be an empty list, and > no top-level map > | may be empty. Are we comfortable with this? > > Now that you point out the dis-advantages, I'm not sure. > I can see where an empty map may be useful, as well as > an empty list. Having the 'empty' document be valid > could also rather useful. I can see "touch fname" being > used to create a valid YAML text. Perhaps it was better > the other way... It is easy enough to change back... > I wonder if the "class/type/kind" indicator > should just be a property of each node (an attribute > of each node in the information model), and not > be an implicit map... I'm dead set against it. The model is and should be a simple map/list/scalar, no hidden attributes etc. Want an attribute? use a map. Also, what about comments? Putting some more churn into the loop: How about we restrict the shorthand just to '!'. If you want a comment, write: delivery: % =: 1/3/2001 !: date #: John, if you don't make this date, you'll be DOA, not Doe! Seems acceptable enough. Comments just don't mix well with shorthand forms. They quickly tend to get multi-line. As for the class vs. format issue, I've had the idea that we should follow IANA's notion: mug shot: !image/bmp/gzip/base64 ...base64 data... That is, the 'class' would be multi-part. The interpretation would be schema-specific, of course (each application having its own set of types), but the notion would be that the first part(s) would specify the interface and/or the concrete class; further parts would specify transfer encoding steps. The above is an "image" (interface?) in "bmp" format (concrete class?) which was gzip-ed and then base64-ed to obtain the text value places in the YAML file. IANA could be used as a source for both "first parts" and "further parts"... Again, YAML-CORE would also allow for '!' to be used as a shorthand, and not enforce any special semantics on it beyond saying that "by convention" it is used this way. Every application would be free to define its own set of types (in particular, the empty set would be a valid, common choice). I guess I'm getting sidetracked into debating this issue too soon, before you are ready to devote time for it (and Brian is completely away). Let's table it until we are all back, OK? > Before I address the next one, I think I'd like > to limit the simple scalar so that the following... > > bad: this is a simple scalar > that continues on the next line. > > is not allowed. Bye-bye being able to parse/emit RFC0822 headers, then... Also, no more being able to write: point: % # : This is a long comment. x : 12.5 y : 3.7 I don't know... What's the gain? > This is necessary to make the following > unambiguous: > > one: > > > two: The above is a single new line > xxx: > Without this limitation, "one" > above is ambiguous, does it have > a single new line or two... No, today it is completely unambiguous. "one" should have the value "\n\n\n". > | In short, it seems the wording for the simple > | scalar section needs a > | rewrite, the examples need to clarify these points, > | and maybe we should > | change the way all the productions handle newlines > | (at the end instead of at > | the beginning). Wow! > > You could try to do that... but me thinks it'd be > massively ugly. I'm sure there is a way we can > decompose this with meaningful productions while > still keeping the new line at the beginning of > each production. I'm not convinced. I inherited the "eol at start of production" from your early drafts, so I never really tried it the other way around. I will just have to try to see for myself :-) It seems obvious, though, that placing the eol at the end of productions would make the eol at the end of the document compulsory. I have no problem with that. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-07-31 17:27:52
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | > I wonder if the "class/type/kind" indicator > | > should just be a property of each node (an attribute > | > of each node in the information model), and not > | > be an implicit map... > | > | I'm dead set against it. The model is and should be a simple > | map/list/scalar, no hidden attributes etc. Want an > | attribute? use a map. > | Also, what about comments? > > Yes, but the class/type/kind indicator will be used to drive > specific serializers, ideally with a mechanism to hook them > into the parse phase. I don't get it. Would the "indicator" be a normal map key or not? > +1 to limiting the short hand form to only > allow for the class After all, that's what we started with... looping... > | They quickly tend to get multi-line. As for the class vs. format > | issue, I've had the idea that we should follow IANA's notion: > | > | mug shot: !image/bmp/gzip/base64 > | ...base64 data... > | > > I've been thinking along this line as well. Some way to > describe cascading of types... where a class consists > of multiple stages separated by a pipe | for example... > > mug shot: !base64|gzip|image/bmp | is nice and intuitive - I like it better then /. But your order is wrong for it to work. You take your image/map, send it to gzip, and then send that to base64. If it was the other way around it would have been gUNzip, right? So we had better write: mug shot: !image/bmp|gzip|base64 It also makes it easier to extract the all-important class/mime type - everything up to the first | (prefixes are easier to work with then suffixes). After all, the mime type is interesting throughout the execution of the application, while the steps are only interesting at the start and end of execution. So making it easy to access the "important stuff" is, well, important :-) > | > one: > | > > | > > | > two: The above is a single new line > | > xxx: > | > Without this limitation, "one" > | > above is ambiguous, does it have > | > a single new line or two... > | > | No, today it is completely unambiguous. "one" should have the > | value "\n\n\n". > > Actually... it should be two new lines, right? That's exactly the problem Joe has pointed at. You have: one : LF LF LF LF two : Line folding would convert LF LF LF into \n\n, but LF LF LF LF into \n\n\n - it all depends whether the fourth LF is a part of one's value or not... > Anyway, this is a pathalogical case. That it is :-) > | I'm not convinced. I inherited the "eol at start of > | production" from your > | early drafts, so I never really tried it the other way > | around. I will just > | have to try to see for myself :-) > > Have at it! I tried it both ways, but it got too > complicated for my poor brain. Perhaps I was > doing something wrong. I'll give it my best shot. At worse, I'll learn something useful about language design :-) Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-07-31 17:58:30
|
On Tue, Jul 31, 2001 at 08:28:44PM +0200, Oren Ben-Kiki wrote: | I don't get it. Would the "indicator" be a normal map key or not? It would be an additional "attribute" on each node. Is not our coloring mechanism, but I think that class/type/kind is so core that coloring may not be the best way to accomplish it. In this way, there is not a short-hand, scalars do not "magically" get promoted into maps... which kinda scares me a bit. | mug shot: !image/bmp|gzip|base64 | | It also makes it easier to extract the all-important class/mime type - | everything up to the first | (prefixes are easier to work with then | suffixes). After all, the mime type is interesting throughout the execution | of the application, while the steps are only interesting at the start and | end of execution. So making it easy to access the "important stuff" is, | well, important :-) +1 | > | > one: | > | > | > | > | > | > two: The above is a single new line | > | > xxx: | > | > Without this limitation, "one" | > | > above is ambiguous, does it have | > | > a single new line or two... | > | | > | No, today it is completely unambiguous. "one" should have the | > | value "\n\n\n". | > | > Actually... it should be two new lines, right? | | That's exactly the problem Joe has pointed at. No, it's subtly different. As I read the current production it only allows one trailing LF (which is not significant). I don't think Joe's problem is that hard to fix given the current productions. Or maybye I'm not grocking his account. | You have: | | one : LF LF LF LF two : | | Line folding would convert LF LF LF into \n\n, but LF LF LF LF | into \n\n\n - it all depends whether the fourth LF is a part | of one's value or not... The last LF is *not* part of the data as it is part of the eos(n) production... which is not informational. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-07-31 17:10:28
|
On Tue, Jul 31, 2001 at 06:21:49PM +0200, Oren Ben-Kiki wrote: | > I wonder if the "class/type/kind" indicator | > should just be a property of each node (an attribute | > of each node in the information model), and not | > be an implicit map... | | I'm dead set against it. The model is and should be a simple | map/list/scalar, no hidden attributes etc. Want an attribute? use a map. | Also, what about comments? Yes, but the class/type/kind indicator will be used to drive specific serializers, ideally with a mechanism to hook them into the parse phase. | Putting some more churn into the loop: How about we restrict the shorthand | just to '!'. If you want a comment, write: | | delivery: % | =: 1/3/2001 | !: date | #: John, if you don't make this | date, you'll be DOA, not Doe! | | Seems acceptable enough. Comments just don't mix well with | shorthand forms. +1 to limiting the short hand form to only allow for the class | They quickly tend to get multi-line. As for the class vs. format | issue, I've had the idea that we should follow IANA's notion: | | mug shot: !image/bmp/gzip/base64 | ...base64 data... | I've been thinking along this line as well. Some way to describe cascading of types... where a class consists of multiple stages separated by a pipe | for example... mug shot: !base64|gzip|image/bmp | That is, the 'class' would be multi-part. The interpretation would be | schema-specific, of course (each application having its own set of types), | but the notion would be that the first part(s) would specify the interface | and/or the concrete class; further parts would specify transfer encoding | steps. The above is an "image" (interface?) in "bmp" format (concrete | class?) which was gzip-ed and then base64-ed to obtain the text value places | in the YAML file. IANA could be used as a source for both "first parts" and | "further parts"... That would be an alternative approach. | > Before I address the next one, I think I'd like | > to limit the simple scalar so that the following... | > | > bad: this is a simple scalar | > that continues on the next line. | > | > is not allowed. | | Bye-bye being able to parse/emit RFC0822 headers Suggestion withdrawn. | > one: | > | > | > two: The above is a single new line | > xxx: | > Without this limitation, "one" | > above is ambiguous, does it have | > a single new line or two... | | No, today it is completely unambiguous. "one" should have the | value "\n\n\n". Actually... it should be two new lines, right? Anyway, this is a pathalogical case. | > You could try to do that... but me thinks it'd be | > massively ugly. I'm sure there is a way we can | > decompose this with meaningful productions while | > still keeping the new line at the beginning of | > each production. | | I'm not convinced. I inherited the "eol at start of production" from your | early drafts, so I never really tried it the other way around. I will just | have to try to see for myself :-) Have at it! I tried it both ways, but it got too complicated for my poor brain. Perhaps I was doing something wrong. | It seems obvious, though, that placing the eol at the end of productions | would make the eol at the end of the document compulsory. I have no problem | with that. Hmm. Clark |