From: Brian I. <briani@ActiveState.com> - 2001-05-18 17:03:33
|
"Clark C . Evans" wrote: > > | On 4 & 5. I don't really like the blank line at the beginning thing > | because people will mess it up or not understand it. And we have many > | heuristic options. > | > | A) Parse lookahead for X-YAML-Version > | B) Option-A rarely needed because as soon as we see a key that is *not* > | RFC822 compliant, we assume YAML. 99% of the time this is the first > | line! > | C) If there is no whitespace allowed before the colon in RFC822, we > | simply make it a requirement in YAML. Or does this break your RFC > | compatability rules? > > A&B are good. I don't really care about C, perhaps in the > interest of consistency with both RFC822, but also Python > code, we may not want to require the space before the colon. I see. > > | Just for my own edification, would you please explain the rationale > | behind making YAML RFC822 compliant. And do so with one of more specific > | examples. Thanks :) > > I'm not all that concerned about RFC822 compliance. > > I'm more concerned about consistency since we are going > to allow RFC822 headers. In particular, if someone > sees a few RFC822 lines above and the YAML lines below > the seperating blank line, they will most likely assume > that YAML has the same (or very similar rules). Thus, > those items _common_ in RFC822 should be allowed in YAML. > > There will be a laundry list of RFC822 constructs that > when moved into the YAML section will be illegal. I think I understand better now. Thanks again. > > | Neil and I agree that the normal transport mechanism between Perl and > | Python serializer/parsers would definitely *not* be a mailer. And if a > | mailer was used, most people wouldn't give a darn about the trailing > | whitespace. And if they really did, we could just encode the whole > | document anyway. So I now definitely think the best-fit answer is: > | > | " this is the hash\n key for this example :-) " : #class : > | |# My Perl Subroutine > | | > | | sub version { > | | if ($_[0] =~ /\n/) { > | | return \ "\to sender"; > | | } > | | } > > Nice. Is this fairly "optimal" for your purposes? > > | Sorry for overloading this example with so many weird things. > > Not at all. This is good. > > | I'll just comment on the multiline semantics: > | > | A) Trailing whitespace is preserved if the transporter preserves it. > | B) The content can always be encoded before transport anyway. > | C) Nothing is escaped. The content is truly verbatim. A '\' is a '\'. > | D) An implicit newline is assumed to be at the end of every line. > | E) Note that the '|' is one column back from the actual indentation > | level. This is intententional. And it will work even if the indent width > | is set to one character wide. (not mandatory, but I like it.) > | F) I'd like to push for this always starting on the next line if it is a > | map value. It has no relation to RFC822. > | > | This will work the way I intended it 98% of the time. > > One question. How are trailing new lines handled? You may > want to modify "D" so that there is a new line on every | > line, except the last one. Thus to get a trailing new-line, > you'd have to do: I had pretty much given up on that. Since this method isn't foolproof anyway, I'd just have the caveat that *all* lines are assumed to have a trailing newline. > > after : > | this has a > | trailing new line > | > before : > | > | This has a leading new > | line, but not a trailing > | new line. > both: > | > | This has both a leading > | and a trailing new line > | > another : > | this does not have > | a trailing new line, > | nor a leading new line. > > Clear? I think it beats :- as far as readability. Yes it does. And I understand it. But I think it's not obvious enough. Too subtle. It's exactly what I was trying to avoid with the :- thingy. People would think that there's just an extra blank line. I'd much prefer something like this: after : |this has a |trailing new line before : | |This has a leading new |line, but not a trailing \new line. both: | |This has both a leading |and a trailing new line another : |this does not have |a trailing new line, \nor a leading new line. Also, you can't put a space after the '|'. It doesn't scale well past these examples. > > | > 9. Clark agreed to make a "boostrap" C program > | > and upload to source forge. Brian and Neil > | > agreed to download and hack at will. > | > | As I walked to the train station with Neil, he figured out the C > | implementation in his head and said he would try to get it done > | before bed. > > Great. I'll focus on the specification today then > rather than laying-in-code. The spec is the most important thing IMO. > > | > 16. We made little progress on the scalar indicator > | > for lists, to colon or not to colon. It wasn't > | > agreed, but Clark thinks this is someone else's > | > monkey. If Oren and Brian can't agree within > | > 7 days, Clark will put on the dictator cap. > | > | We traded in the '$' for the ':'. '$' as the last character in a line > | meant a multiline scalar was to follow. Converting this semantic to the > | ':' leaves us with these represntations: > | > | key1 : @ > | single line > | : > | classless folded > | multi line > | another single line > | and another > | #class &0001 : > | classed multi > | line > | #class &0002 classed single line > | % > | key : value > | @ > | ~ > | #classy % > | key : value > | : even this multi line on the same line > | as a colon thingy works because there > | a little bit of indentation imposed by > | colon. (Although I don't love it) > | : "Another thingy like above that meets" > | "RFC822 wackiness" > | : > | | 1 > | | 1 1 > | | 1 1 1 > | |Just for completeness :-) > > Good deal. Your example above, you have two colons: > > " this is the hash\n key for this example :-) " : #class : > > Is the second colon a typo, or is it required per this > proposal? I'm glad you noticed! It's definitely not a typo. I wouldn't go so far as to say it's mandatory but I suggest it for the following reasons: PREMISE: Assuming that the ':' was used to replace the '$': PREMISE: A #class or &id always used to be followed by %, @, $ or value. $ was optional, but strongly suggested if a multiline started on the next line. CONCLUSION: In the *absence* of a #class or &id we'd have: key : $ multi line Translated to ':' speak, that's: key : : multi line Which I suggested should just be collapsed to a single ':'. I think that cover's it. Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Brian I. <briani@ActiveState.com> - 2001-05-18 20:57:39
|
Oren Ben-Kiki wrote: > > > > 1. Brian stated that he would invstigate Oren's Syntax > > > and get back with us if it meets Perl's serilization > > > requirements for hard references. If not, specify > > > what alternatives we can use. > > > > I don't think it's that important to investigate. It will probably > > always be a moot point. I will let Data::Denter use it's current scheme > > to deterministically round-trip all Perl data structures. YAML.pm > > probably will have no need for this. It's all acadenic and I have no > > spare time for academics for three more months. (My guess is, yes it > > could be made to work, but would be suboptimal for Perl people) Let's > > leave it at that for now. > > Does that mean we are giving up on Denter using YAML syntax (extended to > handle pointer-to-pointer)? Just for the record, the Perl component for YAML is called YAML.pm. Data::Denter is only of interest to Perl programmers from this point on. It may fondly be remembered as the catalyst for YAML 1.0. And it may keep a greedy eye on the YAML projects treasures, but that's of no concern here. > > I'm going to go over it with a fine-tooth comb, just to see what is involved > in making YAML a superset of it. I guess I'll also have to look at MIME > while I'm at it, with the same comb :-) Beware of the nits! Nasty buggers. ;) > > > On 4 & 5. I don't really like the blank line at the beginning thing > > because people will mess it up or not understand it. And we have many > > heuristic options. > > > > A) Parse lookahead for X-YAML-Version > > B) Option-A rarely needed because as soon as we see a key that is *not* > > RFC822 compliant, we assume YAML. 99% of the time this is the first > > line! > > C) If there is no whitespace allowed before the colon in RFC822, we > > simply make it a requirement in YAML. Or does this break your RFC > > compatability rules? > > > > Just for my own edification, would you please explain the rationale > > behind making YAML RFC822 compliant. And do so with one of more specific > > examples. Thanks :) > > Well, for example, suppose that YAML was a "good enough" superset of RFC822. > Then we could just adopt my idea that "blank lines separate top-level maps" > and we wouldn't have to say anything further about RFC822 headers, period. > If one wants to read/write a mail message as a YAML document, then it will > simply work (as long as he sticks to the "safe" constructs there). If one > wants to have a YAML document that has nothing to do with RFC822, that also > works. No need for any special statement about them. I like this approach > best. I think that sounds right, if I understand it correctly. My only contention above was the very first blank line, not the ones separating documents. > > > > " this is the hash\n key for this example :-) " : #class : > > I assume the trailing ':' is a typo? > No. See earlier post message for the reasoning. > > |# My Perl Subroutine > > | > > | sub version { > > | if ($_[0] =~ /\n/) { > > | return \ "\to sender"; > > | } > > | } > > > > Sorry for overloading this example with so many weird things. I'll just > > comment on the multiline semantics: > > > > A) Trailing whitespace is preserved if the transporter preserves it. > > B) The content can always be encoded before transport anyway. > > C) Nothing is escaped. The content is truly verbatim. A '\' is a '\'. > > D) An implicit newline is assumed to be at the end of every line. > > We have to decide what our position is about them, BTW. Is a newline a "\n" > or a "\n\r" - the answer may be different in-memory and in the text file > (and thank you, O nameless DOS/CPM programmer, for inflicting this on us :-) Bastard of Bastards. :( But I think the heuristic is quite simple. Since the newline is implicit, just replace whatever is there with the system's native choice. > > > E) Note that the '|' is one column back from the actual indentation > > level. This is intententional. And it will work even if the indent width > > is set to one character wide. (not mandatory, but I like it.) > > Under Python indentation rules, there's no problem indenting the "label" > line by 4 characters and the text lines by 7, or whatever. What you say > about one character indentation, however, implies that the following would > be legal: Yes. It would be legal. > > text: > |multi-line > |text > > I'm not certain I like it. I think Clark should make the call here - > indentation is his baby. I actually don't like it for another subtle reason. Tabs. You couldn't use them properly with this scheme. So let's scrap the backing up one space requirement. And yes, that's my final answer ;) > > I started thinking about it and hit on an issue which Brian may already have > thought about - or will have to very soon, if he's covering YAML.pm :-) The > problem is we haven't defined the data model (or, viewing it differently, > the round-tripping issue). > > In "dynamic" languages such as Perl, JavaScript, Python (and to some extent, > Java), it is natural to map a YAML map to the native hash, a list to a > vector/array, and a scalar value to a simple string. That works admirably > well, as long as the YAML entity hasn't been annotated with an ID or a class > name. > > If one wants to provide a stable-round tripping utility (e.g., suppose I > want to write a YAML pretty printer), where am I to store the ID of a scalar > value? The class of a map? For this use case, it seems my best course of > action is to wrap the native construct (map/list/scalar) in an object which > has an "id", a "class", and a "value". > > There are several options: > > A) Use the native constructs when possible, and only use "wrapper" objects > when there's a need. That makes access pattern unpredictable: do I write > map{key} or map{key}.value? That's my idea. > > B) Always use wrapper objects, and give up on de-serializing YAML into > arbitrary native data structures. Big hit on usefulness - if we do this, > Brian will just give up on us :-) You're getting to know me pretty well ;) > > C) Declare that IDs may be re-written arbitrarily, even by pretty printers. > That is, banish them from the data model. I think I agree... > > That leaves "class" as the only problematic issue. We explicitly decided not > to talk about it in the conference call. It seems to me like there's no way > around requiring that this data will survive round-trips, but I also don't > see how it is possible to de-serialize "scalar value" into a normal "Java > String" if someone attached an "unknown" class to it. I've read through this briefly, but don't have time to comment yet. Let's stick with the original syntax for now. In general, keep in mind that YAML 1.0 will *not* be the final YAML spec. It will evolve to YAML 2.0 and so on. For now, let's strive for maximum sytactic simplicity. I think we can special case the semantics of 1.0 without needing to change the current syntax. > > > > > > 12. Brian mentioned that he'd show YAML to one of > > > his Perl friends. (sorry I didn't catch his name) > > > > Damian Conway http://www.csse.monash.edu.au/~damian/ > > His input will be greatly appreciated. Emailed Damian last night. He's preparing for an 11-week world speaking tour. I'll see him in June at the YAPC (Yet Another Perl Conference) in Montreal and I'll be sure to pin him down about YAML. BTW, I mentioned to Clark that I'll probably be speaking about YAML at YAPC :) > > > 15. Clark agreed to write up the "single vs multi" > > > line controversy and post to the list so that > > > it is clearly understood. > > I thought we settled this... Every scalar value is potentially multi-line. > It doesn't seem to cost us anything, or does it? I agree but see below. > > > > 16. We made little progress on the scalar indicator > > > for lists, to colon or not to colon. It wasn't > > > agreed, but Clark thinks this is someone else's > > > monkey. If Oren and Brian can't agree within > > > 7 days, Clark will put on the dictator cap. > > > > We traded in the '$' for the ':'. '$' as the last character in a line > > I thought ':' was the first one; it is "as if" it is a normal header, with > the key "just happening" to be empty. This seems more consistent. > > > meant a multiline scalar was to follow. Converting this semantic to the > > ':' leaves us with these represntations: > > > > key1 : @ > > single line > > : > > classless folded > > multi line > > another single line > > and another > > #class &0001 : > > : #class &0001 No, not a mistake. > > > classed multi > > line > > #class &0002 classed single line > > % > > key : value > > @ > > This is an empty list, right? Yup. Just to keep you on your toes :) > > > ~ > > And this is a null? Indeed. > > > #classy % > > key : value > > : even this multi line on the same line > > as a colon thingy works because there > > a little bit of indentation imposed by > > colon. (Although I don't love it) > > This means the following: > > : single line > > Will also work, even though you *really* dislike it. I like them :-) Noted :-) > > > : "Another thingy like above that meets" > > "RFC822 wackiness" > > : > > | 1 > > | 1 1 > > | 1 1 1 > > |Just for completeness :-) > > I think we've said everything there's to be said about this, and whether or > not you find either: > > list: > : One > : Two > : Three > and Four > > Or: > > list: > One > Two > : > Three > and Four > > To be beautiful or ugly is, when all is said and done, a matter of taste. To > you, the extra ':'s are an eyesore; to me it seems strange that the > multi-line value is "more indented"; it seems as though there's structure > involved, when there isn't. I also like being able to do /^:/ in VI to get > to the next entry. While your comment on aesthetics may be true, there is a major distinction between what you think a ':' means and my intent. 1) A ':' is always a key value separator. We agree on that, but each want it to have one other meaning. 2) You want colon to be a "list bullet" in list context. 3) I want ':' to mean '$' for scalar values. And I want it to almost always be optional (unless there is ambiguity) 4) That said. We can make it the canonical/default form for emitters if we wish. Consider the following four examples. 1) Fully qualified with '$'. key1 : $ my dog has fleas key2 : $ "$40.00 for veternarian exam" key3 : $ The vet said, "Yes Ingy, Your dog has fleas." key4 : $ Ingy said, "Wow, my dog has fleas!" key5 : #class1 $ I hate fleas key6 : #class2 $ What is your viewpoint about fleas? key7 : #class3 @ $ Tom the flea $ Dick the flea $ Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 $ A very classy flea #class6 $ |My favorite fleas: | Jim | Bob 2) Fully qualified with ':'. The only real gain here is no " for $40.00. key1 : : my dog has fleas key2 : : $40.00 for veternarian exam key3 : : The vet said, "Yes Ingy, Your dog has fleas." key4 : : Ingy said, "Wow, my dog has fleas!" key5 : #class1 : I hate fleas key6 : #class2 : What is your viewpoint about fleas? key7 : #class3 @ : Tom the flea : Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 : A very classy flea #class6 : |My favorite fleas: | Jim | Bob 3) Minimal key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 I hate fleas key6 : #class2 What is your viewpoint about fleas? key7 : #class3 @ Tom the flea Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 A very classy flea #class6 |My favorite fleas: | Jim | Bob Note that the only required ':' (besides the key/value ones) is for ': Harry the flea' 4) Suggested canonical form: key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 : I hate fleas key6 : #class2 : What is your viewpoint about fleas? key7 : #class3 @ : Tom the flea : Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 : A very classy flea #class6 : |My favorite fleas: | Jim | Bob So in this last example we always use the optional scalar indicator ':' for all scalars in a list (by default). Note that a #class or &id *always* comes before a %, @, or :. It's just that the ':' is usually optional. The things I don't allow are: key1 : @ : % : @ : : a scalar : #class % : #class @ : #class : a scalar The problem with ':' as a "list bullet" is that it could not be optional. And that's too restrictive just to satisfy a personal aesthetic. , Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-18 21:55:55
|
On Fri, May 18, 2001 at 01:54:24PM -0700, Brian Ingerson wrote: | While your comment on aesthetics may be true, there is a major | distinction between what you think a ':' means and my intent. Not to make things worse, but I don't particularly like ":" being used for two different meanings. I'd rather have the $ back, people can always quote currency amounts: total : "$540.00" No problem here. | 1) Fully qualified with '$'. | | key1 : $ my dog has fleas | key2 : $ "$40.00 for veternarian exam" | key3 : $ | The vet said, "Yes Ingy, | Your dog has fleas." | key4 : $ Ingy said, "Wow, | my dog has fleas!" | key5 : #class1 $ I hate fleas | key6 : #class2 $ | What is your viewpoint | about fleas? | key7 : #class3 @ | $ Tom the flea | $ Dick the flea | $ Harry the flea | is not really hairy | % | foo : bar | #class4 % | FOO : BAR | #class5 $ A very classy flea | #class6 $ | |My favorite fleas: | | Jim | | Bob I like the above. It's clean, despite being a bit "noisy". 5) Minimal key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 I hate fleas key6 : #class2 What is your viewpoint about fleas? key7 : #class3 @ Tom the flea Dick the flea $ Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 A very classy flea #class6 |My favorite fleas: | Jim | Bob 6) Canonical key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 I hate fleas key6 : #class2 What is your viewpoint about fleas? key7 : #class3 @ $ Tom the flea $ Dick the flea $ Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 A very classy flea #class6 |My favorite fleas: | Jim | Bob Thoughts? I don't mind escaping "$382.00", as this is a small use case anyway, and I'd rather stick with Perl's type indicators. Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-05-18 22:52:52
|
> > I assume the trailing ':' is a typo? > > No. See earlier post message for the reasoning. > > That leaves "class" as the only problematic issue... > > I've read through this briefly, but don't have time to comment yet. > Let's stick with the original syntax for now. I don't know about that; it is easier not putting something in than taking it out later. > In general, keep in mind that YAML 1.0 will *not* be the final YAML > spec. It will evolve to YAML 2.0 and so on. For now, let's strive for > maximum sytactic simplicity. That's why I'd rather leave #class out if it until proven necessary. > Emailed Damian last night. He's preparing for an 11-week world speaking > tour. I'll see him in June at the YAPC (Yet Another Perl Conference) in > Montreal and I'll be sure to pin him down about YAML. BTW, I mentioned > to Clark that I'll probably be speaking about YAML at YAPC :) Now, if that was early July, I might have been able to attend. Oh well. > While your comment on aesthetics may be true, there is a major > distinction between what you think a ':' means and my intent. > > 1) A ':' is always a key value separator. We agree on that, but each > want it to have one other meaning. > 2) You want colon to be a "list bullet" in list context. > 3) I want ':' to mean '$' for scalar values. And I want it to almost > always be optional (unless there is ambiguity) > 4) That said. We can make it the canonical/default form for emitters if > we wish. Actually, I thought of differently, since I started with thr RFC822 frame of mind. The idea was to combine two concepts: - Unify Clark's Python-like indentation with RFC822 concept that (more) indented lines continue a value; - Make each YAML "element" have an RFC822 header line of its own. That's why I suggested the syntax: map: % key: value list: @ : text : % key: multi line value : @ : value Etc.; Every element is logically a single RFC822 header line, with indentation playing a dual role (both "continue a text value" and "continue a structured value"). It just happens that in some lines the "field name" is empty, that's all. The ':' isn't a "scalar marker", it is a "YAML element marker". I still don't see why you need a "scalar marker" as such, be it $, : or whatever. I realize that Perl makes such a marker a natural thing to think of, but all it does is clutter the text. (BTW, maybe one day we'd want to allow: list: @ 0 : 0th position 10 : 10th position Etc. Admittedly sparse lists aren't that much of a use case... but the above sure beats having to specify 9 null values, doesn't it? :-) > Consider the following four examples. > > 1) Fully qualified with '$'. Rather verbose... > 2) Fully qualified with ':'. The only real gain here is no " for $40.00. Right. Not much of an improvement. > 3) Minimal Nope. Minimal is: > key7 : #class3 @ > Tom the flea > Dick the flea Harry the flea is not really hairy > % > foo : bar > #class4 % > FOO : BAR > #class5 A very classy flea > #class6 > |My favorite fleas: > | Jim > | Bob > > Note that the only required ':' (besides the key/value ones) is for ': > Harry the flea' You don't really need it. Since the next line is "more indented", it is a continuation line. It also works for aligned text: |Harry the flea |is not really hairy (To clark's question about this: All the indentation, up to and including the '|' starting the line, is removed, and the result is the verbatim text. So it is well defined even if the '|' in the first line isn't on the same column as in the rest of the lines). Pretty it isn't, but it is minimal :-) > 4) Suggested canonical form: The difference between it and my original proposal is that ": %" and ": @" were collapsed into "%" and "@", as a shorthand. I'd still put an ID, say, *after* the ':', because it isn't a scalar marker... > The things I don't allow are: > ... > : : a scalar > : #class : a scalar Obviously the second ':' is unnecessary, so I agree with you here. > : % > : @ > : #class % > : #class @ Here's another option (call it 5): If we want to think of ':' as a "scalar marker", we could say that the syntax is: map % key1 % (id1) key : value key2 : value key3 * (id2) id1 key4 @ * id : value % key : value @ : value This is consistent; a value is always prefixed by its marker (@, %, : or *). No need to write ": %" or for that matter "map: %"; in a map, the syntax is <key> <value> where the ':' is just one of the options. RFC822 is simply a top level map with keys having only text values. BTW, I tried switching back to (id) instead of &id - that's consistent with RFC822's "comment" concept, emphasising the id is not part of the data model. We could keep it as &id, it doesn't matter much. Just so we'll have numbers for all of these, call my original proposal option 6: map : % sub map : % &id1 key : value text key : value ref key : * &id2 id1 list key : @ : *id : value : % key : value : @ : value > The problem with ':' as a "list bullet" is that it could not be > optional. And that's too restrictive just to satisfy a personal > aesthetic. Aesthetic is important, or S-expressions would rule over the world. Besides, aesthetics aside, being consistent is also important. This rules out option 4 for me, even though it looks nice (consistency over aesthetics :-). Options 1 and 2 are too noise for me... That leaves us with option 3 (the corrected one, without the ':'), option 5, and my original option 6. I think I like 5 the most... Have fun, Oren Ben-Kiki |
From: Brian I. <briani@ActiveState.com> - 2001-05-19 00:32:58
|
Oren Ben-Kiki wrote: > > > > I assume the trailing ':' is a typo? > > > > No. See earlier post message for the reasoning. > > > > That leaves "class" as the only problematic issue... > > > > I've read through this briefly, but don't have time to comment yet. > > Let's stick with the original syntax for now. > > I don't know about that; it is easier not putting something in than taking > it out later. It *is* optional. All ':' are optional except to distiniguish a multi-line from several single lines. > > > In general, keep in mind that YAML 1.0 will *not* be the final YAML > > spec. It will evolve to YAML 2.0 and so on. For now, let's strive for > > maximum sytactic simplicity. > > That's why I'd rather leave #class out if it until proven necessary. * ingy is trying to suppress postal tendencies * ;) > > While your comment on aesthetics may be true, there is a major > > distinction between what you think a ':' means and my intent. > > > > 1) A ':' is always a key value separator. We agree on that, but each > > want it to have one other meaning. > > 2) You want colon to be a "list bullet" in list context. > > 3) I want ':' to mean '$' for scalar values. And I want it to almost > > always be optional (unless there is ambiguity) > > 4) That said. We can make it the canonical/default form for emitters if > > we wish. > > Actually, I thought of differently, since I started with thr RFC822 frame of > mind. The idea was to combine two concepts: Ahh. Whacking everything with the RFC822 hammer ;) > - Unify Clark's Python-like indentation with RFC822 concept that (more) > indented lines continue a value; > - Make each YAML "element" have an RFC822 header line of its own. > > > 3) Minimal > > Nope. Minimal is: > > > key7 : #class3 @ > > Tom the flea > > Dick the flea > Harry the flea > is not really hairy > > % > > foo : bar > > #class4 % > > FOO : BAR > > #class5 A very classy flea > > #class6 > > |My favorite fleas: > > | Jim > > | Bob > > You don't really need it. Since the next line is "more indented", it is a > continuation line. It also works for aligned text: > > |Harry the flea > |is not really hairy Big ouch. > Pretty it isn't, but it is minimal :-) Agree. On both counts. > Here's another option (call it 5): > > If we want to think of ':' as a "scalar marker", we could say that the > syntax is: > > map % > key1 % (id1) > key : value > key2 : value > key3 * (id2) id1 Boggle %l > key4 @ > * id > : value > % > key : value Why switch indent width? > @ > : value > > This is consistent; a value is always prefixed by its marker (@, %, : or *). > No need to write ": %" or for that matter "map: %"; in a map, the syntax is > <key> <value> where the ':' is just one of the options. RFC822 is simply a > top level map with keys having only text values. This might have a chance of general acceptance if you replaced ':' with '$' in all cases. You won't accept that because of the 822 thing. And I couldn't accept it because a key/value separator is visually important if nothing else. > > BTW, I tried switching back to (id) instead of &id - that's consistent with > RFC822's "comment" concept, emphasising the id is not part of the data > model. We could keep it as &id, it doesn't matter much. I can go either way. > Aesthetic is important, or S-expressions would rule over the world. Agreed, but its not a common aesthetic. It's your personal one. My feeling is that it would bother a lot of people, especially if it was required. > > Besides, aesthetics aside, being consistent is also important. This rules > out option 4 for me, even though it looks nice (consistency over aesthetics > :-). Options 1 and 2 are too noise for me... That leaves us with option 3 > (the corrected one, without the ':'), option 5, and my original option 6. I see no inconsistencies in #4, if you accept the premises I laid out: 0) ':' is a key/value separator (We could change to '=' if it weren't for self-imposed 822 restrictions) 1) ':' also is the scalar marker (Could be still be '$') 2) The scalar marker is optional, except to resolve ambiguity. (Lists of single/multi-lines mixed together is the only example I can think of. If you can think of a way out of that one <without shifting the '|' paragraphs> then we can get rid of the scalar marker) 3) Any instance of ': :' can be collapsed to ':' without loss of meaning. > Have fun, I cut myself shaving today. That's about all the fun I've had so far... Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-19 06:21:46
|
It seems that the *only* place we may now need a scalar indicator is when the scalar has multiple lines and occurs within a list. I must say... this should be rather rare... | key: @ | Harry the flea | is not really hairy | |Harry the flea | |is not really hairy As Oren and Brian agree, possible, but UGLY. | 0) ':' is a key/value separator (We could change to '=' if it weren't | for self-imposed 822 restrictions) | 1) ':' also is the scalar marker (Could be still be '$') | 2) The scalar marker is optional, except to resolve ambiguity. (Lists of | single/multi-lines mixed together is the only example I can think of. If | you can think of a way out of that one <without shifting the '|' | paragraphs> then we can get rid of the scalar marker) | 3) Any instance of ': :' can be collapsed to ':' without loss of | meaning. I see four options: a) We can deal with the ugliness. b) We can re-introduce the scalar marker ($) that is optional pretty much everwhere excepting this case. c) We introduce a special marker only for this case (no optional stuff). d) We introduce indexes (did Oren propose this?) I think "a" is out beacuse it is ugly. I think "b" is out since adding a new item in the list would require re-numbering... or nastly BASIC 10, 20, 30 style uglyness. I'd also eliminate "b". If we re-introduced a scalar indicator, I strongly feel it should not be the same as the key/value separator. Also, I'm not fond of optional stuff... as it causes exception handling in people's head. Thus, if we did option "b", I'd like the indicator to be mandatory for consistency. But this is ugly. Therefore, I think "b" is out. This leaves us "c". I say we use ":" to indicate a multi-line scalar in this case and make it a documented exception. No use in trying to rationalize it... Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-11-09 06:49:43
|
Nice work guys. Some points: > > 1. A yaml file (stream) consists of > > a series of "standalone" yaml documents, > > where the starting production of each > > document is a single map, sequence or > > scalar node. > > a series of zero or more... > > ie an empty file is a valid YAML stream. Yes. > > 2. In general, each document in the series > > will begin with with NL '--' UNIQUE > > where UNIQUE is a sequence of printable > > characters that is constant throughout > > the stream, '-' being the canonical > > example to be used in documentation. > > The separator will match /^--\S+/m > > ie NL is not required on first separator. Yes. > > 3. The separator is optional at the start > > of a yaml stream where the first document > > is an unadorned map or sequence. Otherwise > > the separator is mandatory both at the > > start of the stream and between each > > document within the stream. > ... let's make the separator mandatory for YAML emission. Let's put it as a "should" not a "must". E.g., a pretty-print utility may preserve the separator-less property of a configuration file. Probably there are other cases where this makes sense. > > 4. Following the separator, descriptors > > like the anchor and class can be > > used. And following the optional > > descriptors the indicator. This is > > similar if not identical to the > > list item production, These items > > apply to the root node of the document.o > > yes. Yes. > > 5. Each document in the level yaml stream > > is stand-alone and it's nodes may not > > reference nodes in other documents. > > Thus, the anchor context is reset at > > each divider. > > yes Yes. > > 6. A version indicator is also optional > > within the descriptors immediately > > following the separator. > > --- ?1.0 > > ideas? --- YAML:1.0 ... Rationale: this allows us doing all sorts of other stuff in the future without requiring us to look like line noise: key ::= alpha alnum* value ::= linear_non_space* Unknown key:value following the separator should be ignored (with a warning). The YAML key' value is <major>.<minor> where the usual backward compatibility rules apply. This removes any confusion with indicators within the document. Being verbose is not an issue because it is only once per document. > > 7. The indentaion level of the root node > > of each document is 0. Thus scalars > > are not indented. A scalar continues > > untill (a) EOS is reached or (b) > > the next separator is encountered. > > right Yes. > > 8. Two adjacent separators will be > > reported as a single empty document. > > A trailing separator will result in > > the reporting of a single empty > > document at the end of the stream. > > Empty scalar. A document must have a type. Yes. > > On comments... > > > > 1. We agreed that a comment mechanism > > is needed where comments are not > > part of the information model and > > do not get round-tripped. Yes. > > 2. Maps and sequences can have comments > > but scalars cannot. > > Disagree... > Actually, I think I see your point. Withdrawn. I agree with (2). > > 3. Comments must be indented to the > > level that they apply. If they are > > at a lower indentation, then the > > indentation level is reset... so > > a higher indented item will be an > > error... > > Comments must be at the same level as the line following them. Seems you both have the same intent. I agree with it. > > 4. Comments should be part of the emitter > > and parser API so that they can be > > round tripped at a low level. However, > > they may be dropped at higher levels. > > I get the parser level. (Line numbers need to be preserved for error > reporting). But what about the emmission level? Explain. There has to be a way to emit comments. At the YAML->load_doc() or YAML_dump_doc() comments don't exist; but at the YAML->get_token() or YAML->put_token() they should. Put another way: throwaway comments exist at the lexer level and not at the parser level. I don't know whether this should be written into the spec, however. Comments are just outside the info model, just like the exact way a quoted string was line-wrapped. The same rules apply to both. I see no reason to talk in the spec about how a low-level lexer should report line breaks in a quoted strings, so I don't see any reason to talk about how it reports throwaway comments either. > > Indentation... > > 1. Brian gives up on TABS. We will intially > > only accept spaces for indentation, tabs > > will be an error. > > Tabs will confine us to doom I'm afraid. And I'm not judging this on > the aesthetic of 8 space indenting. Fine. Tabs are gone. > > 2. We'd like to experiment using only one > > space for each indentation level. If this > > is ugly in our testing, we will go back to > > four spaces. > > I'd like to keep a running list of the advantaged and disadvantages of > single space. Well, it is minimal, it is simple. It may be more readable for deep documents, and I expect YAML documents to get deep. I rather like it actually. > > 3. We will reserve an indicator valid at the > > separator level that may be used to > > set the number of spaces in a future version > > if we find that it is necessary. > > So the indent level could be changable, but only for an entire document. > This might be nice. Something like: --- YAML:1.0 INDENT:4 invoice: price: $12 See how the key:value line in the separator helps? At any rate, I'd rather *not* have this built into YAML:1.0 unless you both feel it is important. We can add it in YAML:1.1. > > 4. We may allow tabs in a future version, but > > only after some very convincing arguments. > > And if we do, tabs should mean go to the next column divisible by eight. > (We could even make that number configurable in the separator for those > doomed to satanic editors) --- YAML:1.1 INDENT:TAB4 > BTW, an emitter would never emit tabs. And people would be discouraged > from doing it as well. That's why for now, we make them illegal in 1.0. Right. As long as 'INDENT' isn't in YAML:1.0, I'm happy. We can debate its valid values in YAML:1.1, if it comes to that. > Let's stick to trying 1 space for now. YES. > > Implicit scalars... > > > > 1. Implicit scalars should only be single-line. No (base64). > 1a. Implicit scalars must be unquoted. Yes. > 1b. Implicit scalars begin with a non-alpha. Yes. > 1c. Implicit scalars have no whitspace characters. (brian proposed) No (base64, dates). > > 2. All implicit scalars shoudl be part of the YAML > > specification (perhaps a secondary document) > > and third party implicit scalars are not allowed. > > If a user wants their favorite sclar type implicit, > > they can propose it on the YAML list and we may > > bless it. Otherwise they can use explicit types. NO. I don't see implicit types as being any more special than explicit ones. The choice of types in my document is part of my document's schema, and it is my decision as a document author to either use public or private types. As long as I can write: key: !!foo bar I expect to be able to use a private implicit type as well. A good point is that such private types should not collide with any public type. I propose that all private implicit type must start with `, just like all private explicit types must start with ! (note that ` wouldn't be an indicator, it would just be the first character of an unquoted scalar). So: key: `-=>bar<=- > > 3. (brian) We will reserve all regular expressions > > up front and any thing not reserved will be > > a string. > (brian) We can reserve a whole lot of potential patterns by making them > implicit warnings. ie They throw a warning that they are reserved for > future use and should be quoted to eliminate the warning. > > (clark) We reserve anything that does not > > begin with an alphabetic character. No. Allocate all patterns matching "^`" for private usage and reserve all the rest for public implicit types. Simpler and better. > > This difference emerged after Brian gave the > > use case of putting a regular expression > > as a value. Clark's answer was "too bad", > > (a) use a multi-line scalar, or (b) quote > > the regular expression, or (c) let's introduce > > an implict "regex" type starting with the ` tick. Oh, just find some prefix to attach to regexps and be done with. Tick would have been nice, only I just used it for private types. '*' would also have been nice, only it is taken by references. '?/' seems best: regexp: ?/regexp here/ Or, if it is a private type: regexp: `?/regexp here/ I didn't use an unadorned '/' because I think in general prefix-based implicit type should be at least two characters long, to allow a larger set. So, how about we change references from '*' to '->' while we are at it? anchor: &12 reference: ->12 Nice. > > I still hold firm, although I have a new suggestion. > > Lets expand the block indicator to work as a > > literal single-line mechanism so: | \/regex > > hmm. Oren? Neutral. I see the benefit, but there's also a wart. Does this mean that: --- | this is a top-level block? --- |- and this is another w/o a newline? --- - | and this block continues on the next line... If you are willing to accept the above, then fine, allow it. Otherwise, just let it go, and live with: this: |- is a one-line block Actually I think the latter is best. Block is for, well, blocks; and the regexp issue is solved using a prefixed simple type. > > Random notes: > > > > 1. Brian likes !seq and !map as the abbreviations > > for the default map and sequence types. I'm > > assuming that !string is the default scalar > > type (in the absence of explicit/implicit types). How about we keep calling it !text rather than switching to !string? I think that "text" better reflects the intent. I find string to be a bit more of an implementation term (like 'list' and 'array' are for 'sequence'). > > 2. We both think we are very close.. I think we are. > > So. That's it. Summary: > > > > A) Let's play with one space indentation > > for a spell and see if we like it. Yes. > > B) We need to work out the implcit indicator > > stuff... > > Agreed. I propose key:value as above. And: C) Implicit types: use ` for private ones? Should I write it up as the next draft? Delivery on Sunday morning as usual :-) Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-11-09 16:59:58
|
| > > 6. A version indicator is also optional | > > within the descriptors immediately | > > following the separator. | > | > --- ?1.0 | > | > ideas? | | --- YAML:1.0 ... --- ?YAML:1.0 ... | Rationale: this allows us doing all sorts of other stuff in the future | without requiring us to look like line noise: | | key ::= alpha alnum* | value ::= linear_non_space* | | Unknown key:value following the separator should be ignored (with a | warning). The YAML key' value is <major>.<minor> where the usual backward | compatibility rules apply. | | This removes any confusion with indicators within the document. Being | verbose is not an issue because it is only once per document. If we add ? as an indicator before items like this and ?INDENT:0 we can then have one-line scalars... --- ?YAML:1.0 2001-01-01 --- !string this is a one line scalar --- 1.0 | > > 3. Comments must be indented to the | > > level that they apply. If they are | > > at a lower indentation, then the | > > indentation level is reset... so | > > a higher indented item will be an | > > error... | > | > Comments must be at the same level as the line following them. | | Seems you both have the same intent. I agree with it. I like Brian's formulation. It's simpler. | > > 4. Comments should be part of the emitter | > > and parser API so that they can be | > > round tripped at a low level. However, | > > they may be dropped at higher levels. | > | > I get the parser level. (Line numbers need to be preserved for error | > reporting). But what about the emmission level? Explain. | | There has to be a way to emit comments. Exactly. | I don't know whether this should be written into the spec, however. Perhaps not. But it should be in the API description so that a YAML processor will be familar (same interface) no matter what the language. This will take some work... | > > 3. We will reserve an indicator valid at the | > > separator level that may be used to | > > set the number of spaces in a future version | > > if we find that it is necessary. | > | > So the indent level could be changable, but only for an entire document. | > This might be nice. | | Something like: | | --- YAML:1.1 INDENT:4 | invoice: | price: $12 --- ?YAML:1.1 ?INDENT:4 invoice: price: $12 And yes... let's not include this in YAML 1.0 | > Let's stick to trying 1 space for now. | | YES. | | > > Implicit scalars... | > > | > > 1. Implicit scalars should only be single-line. | | No (base64). I don't mind making base64 explicit. It certainly doesn't have to start on the first line. This one, for example is a very very small jpg. image: !base64 \ R0lGODlhGQAPAOMAAAICBDaanAJSVAISFP7+/Gb OzAJmZAIeHGbMzGbMzGbMzGbMzGbMzGbMzGbMzG bMzCH+Dk1hZGUgd2l0aCBHSU1QACH5BAEKAAYAL AAAAAAZAA8AQAR70EgZArlBWHw7Nts1gB6RGV0w CBMlkp4qlHJppkNoyW1r5SmcTeV6wUwrFI4VEul SMyRLchhYrYLq4MDKYrm9XuFQuIzLhALApm6VV+ g44FBSHybokQGdnivNfhJ8enwFSR12eB4jcWZ3g HeCJQJycXSJEzaIc5SIWz0RADs= Not to bring this up. But perhaps we should consider compound types... !jpg!base64 | > 1c. Implicit scalars have no whitspace characters. (brian proposed) | | No (base64, dates). timestamp: 2001-02-21 15:20:00+5 | > > 2. All implicit scalars shoudl be part of the YAML | > > specification (perhaps a secondary document) | > > and third party implicit scalars are not allowed. | > > If a user wants their favorite sclar type implicit, | > > they can propose it on the YAML list and we may | > > bless it. Otherwise they can use explicit types. | | NO. I don't see implicit types as being any more special than | explicit ones. The choice of types in my document is part of | my document's schema, and it is my decision as a document | author to either use public or private types. The goal of YAML is to support interoperability among processes, languages, etc. If a user defined type is going to be used, then it should be plainly obvious. | key: !!foo bar The above is "good enough" for this use case. If you want we can revisit this at YAML 1.1 if there is enough outcry from people asking for the feature. But I'd rather be strict for now. | I expect to be able to use a private implicit type as well. | A good point is that such private types should not collide | with any public type. Right. This is the problem. | I propose that all private implicit type must start with `, | just like all private explicit types must start with ! | (note that ` wouldn't be an indicator, it would just be | the first character of an unquoted scalar). So: | | key: `-=>bar<=- Ok. This is an acceptable compromise. | > > 3. (brian) We will reserve all regular expressions | > > up front and any thing not reserved will be | > > a string. | > | > (brian) We can reserve a whole lot of potential patterns by making them | > implicit warnings. ie They throw a warning that they are reserved for | > future use and should be quoted to eliminate the warning. | > | > > (clark) We reserve anything that does not | > > begin with an alphabetic character. | | No. Allocate all patterns matching "^`" for private usage and reserve all | the rest for public implicit types. Simpler and better. Ok. So those implicit types beginning with a ` tick are private in nature. Nice compromise. | > > This difference emerged after Brian gave the | > > use case of putting a regular expression | > > as a value. Clark's answer was "too bad", | > > (a) use a multi-line scalar, or (b) quote | > > the regular expression, or (c) let's introduce | > > an implict "regex" type starting with the ` tick, | > > (d) or allow single line blocks: | regex | | regexp: !text ... line noise here ... | | Is currently a legal way of doing this. I think "line noise | as text value" is rare enough that an explicit '!text' annotation | is acceptable (even in Perl :-). The in-line block syntax seems | more trouble than it is worth. Not bad. Given you suggested "private" implicit type area, one could even use the ` tick... if one wanted. But then they'd have to register their implicit type handler or it would be a warning. No? Alternatively, we could have an org.yaml.regex explicit type. match-string: !regex ... regular expression ... This would be neat since Java and Python have regular expression objects. | My proposal for changing the reference syntax from '*' to '->' | still stands, however. Hmm. This is *ok* with me, but I like * better. How does this work with more complicated references? | You may have noted I changed the base64 syntax from [...data...] | to [=...data...=], due to the same reason: using a single prefix | character is like using a type A IP network or a TLD; it should | only be done for an extremely good reason. Actually, it might be good to just use an explicit !base64 and not bother with an implicit type for this use case. | I feel '=' is a case where a one-char type is acceptable, | and also '~'. Both are actually single-char values rather | than a single-char prefixes; so in theory '~/=...data...' | implicit types could still be used, though of course | their relationship with '='/'~' would become an issue. | At any rate, I would definitely draw the line there. | Wasting '*' seems like, well, a waste :-) Besides, | '->' may be more readable to a newbie. Hmm. I disagree here. Back references are going to be very common. Let's make it simple, either * or ^, no? | Oh, just find some prefix to attach to regexps and be done | with. Tick would have been nice, only I just used it for | private types. '*' would also have been nice, only it is | taken by references. '?/' seems best: | | regexp: ?/regexp here/ I like the idea of an explict type for regular expressions. search: !regex \/regular expression | I didn't use an unadorned '/' because I think in general prefix-based | implicit type should be at least two characters long, to allow a larger set. | So, how about we change references from '*' to '->' while we are at it? | | anchor: &12 | reference: ->12 reference: *12 Hmm. Brian? | > > Random notes: | > > | > > 1. Brian likes !seq and !map as the abbreviations | > > for the default map and sequence types. I'm | > > assuming that !string is the default scalar | > > type (in the absence of explicit/implicit types). | | How about we keep calling it !text rather than switching to !string? | I think that "text" better reflects the intent. I find string to be | a bit more of an implementation term (like 'list' and 'array' are | for 'sequence'). Perfect. I like !seq, !map, and !text as the default types. That a !text maps to to a java.lang.String on the Java platform is a platform specific detail. Summary.... | I propose key:value as above. And: I modified this as ?key:value | C) Implicit types: use ` for private ones? Nice. | Should I write it up as the next draft? Delivery on Sunday | morning as usual :-) Please! ;) Clark |
From: Brian I. <in...@tt...> - 2001-11-09 19:30:42
|
On 09/11/01 12:11 -0500, Clark C . Evans wrote: > | --- YAML:1.0 ... > > > --- ?YAML:1.0 ... I don't see a need for '?'. I like Oren's. > If we add ? as an indicator before items like this > and ?INDENT:0 we can then have one-line scalars... > > --- ?YAML:1.0 2001-01-01 > --- !string this is a one line scalar > --- 1.0 As I stated before, I really don't want data on a separator line. > | > > 1. Implicit scalars should only be single-line. > | > | No (base64). > > I don't mind making base64 explicit. It certainly > doesn't have to start on the first line. This one, > for example is a very very small jpg. Agree. > | > 1c. Implicit scalars have no whitspace characters. (brian proposed) > | > | No (base64, dates). > > timestamp: 2001-02-21 15:20:00+5 I want to keep 1c. As I've stated, I can easily live without an implicit for time/date stamp. > | Oh, just find some prefix to attach to regexps and be done > | with. Tick would have been nice, only I just used it for > | private types. '*' would also have been nice, only it is > | taken by references. '?/' seems best: > | > | regexp: ?/regexp here/ > > I like the idea of an explict type for regular expressions. > > search: !regex \/regular expression I agree. No implicit needed for regexes. > | So, how about we change references from '*' to '->' while we are at it? > | > | anchor: &12 > | reference: ->12 > > reference: *12 > > Hmm. Brian? Don't care either way. Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2001-11-09 19:47:23
|
On Fri, Nov 09, 2001 at 11:30:40AM -0800, Brian Ingerson wrote: | I don't see a need for '?'. I like Oren's. Ok. | I really don't want data on a separator line. Ok. | > | > 1c. Implicit scalars have no whitspace characters. | > | > (brian proposed) | > | | > | No (base64, dates). | > | > timestamp: 2001-02-21 15:20:00+5 | | I want to keep 1c. As I've stated, I can easily live | without an implicit for time/date stamp. Ok. This is acceptable... I think. A implicit time/date stamp occurs in about 20% of my data... so it's a must for me. I reviewed ISO 8601 and I can use the "T" form which is the form used by the W3C datetime note http://www.w3.org/TR/NOTE-datetime Year and month: YYYY-MM (eg 1997-07) Complete date: YYYY-MM-DD (eg 1997-07-16) Complete date plus hours and minutes: YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00) Complete date plus hours, minutes and seconds: YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00) Complete date plus hours, minutes, seconds and a decimal fraction of a second YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00) Plus, I'd want to grab a few standard ones from ISO 8601 directly (especially the week form): 1995-02-21 1997-W01-2 23:59:59 23:59:59.9942 23:59:59Z 13:00+01:00 1997-02-21T23:59:59+01:00 So. I can skip those forms with an embedded space. It's a bit less readable, but still an ISO (and W3C) format. Plus, it's be good to understand 1997-JAN-02 but this is another issue entirely. | > I like the idea of an explict type for regular expessions. | > | > search: !regex \/regular expression | | I agree. No implicit needed for regexes. Good. | > | So, how about we change references from '*' to '->' while we are at it? | > | | > | anchor: &12 | > | reference: ->12 | > | > reference: *12 | > | > Hmm. Brian? | | Don't care either way. I think I'd like to stick with * for two reasons (a) it will be a common construct, (b) it's short. Alternatively, we could use ^ if you prefer over *. I'm not liking the two character sequence for such a common use case. Clark |
From: 'Brian I. ' <in...@tt...> - 2001-11-09 18:44:36
|
On 09/11/01 08:50 +0200, Oren Ben-Kiki wrote: > > ... let's make the separator mandatory for YAML emission. > > Let's put it as a "should" not a "must". E.g., a pretty-print utility may > preserve the separator-less property of a configuration file. Probably there > are other cases where this makes sense. OK > > > 6. A version indicator is also optional > > > within the descriptors immediately > > > following the separator. > > > > --- ?1.0 > > > > ideas? > > --- YAML:1.0 ... > > Rationale: this allows us doing all sorts of other stuff in the future > without requiring us to look like line noise: I like this. > > > 4. Comments should be part of the emitter > > > and parser API so that they can be > > > round tripped at a low level. However, > > > they may be dropped at higher levels. > > > > I get the parser level. (Line numbers need to be preserved for error > > reporting). But what about the emmission level? Explain. > > There has to be a way to emit comments. At the YAML->load_doc() or > YAML_dump_doc() comments don't exist; but at the YAML->get_token() or > YAML->put_token() they should. Put another way: throwaway comments exist at > the lexer level and not at the parser level. > > I don't know whether this should be written into the spec, however. Comments > are just outside the info model, just like the exact way a quoted string was > line-wrapped. The same rules apply to both. I see no reason to talk in the > spec about how a low-level lexer should report line breaks in a quoted > strings, so I don't see any reason to talk about how it reports throwaway > comments either. Thanks for the explanation. > > I'd like to keep a running list of the advantaged and disadvantages of > > single space. > > Well, it is minimal, it is simple. It may be more readable for deep > documents, and I expect YAML documents to get deep. I rather like it > actually. Cool. I think that if we find one space to be just too confusing, two spaces might end up being a nice default. I think the INDENT:# thing covers us nicely for the future. > --- YAML:1.0 INDENT:4 > invoice: > price: $12 > > See how the key:value line in the separator helps? > > At any rate, I'd rather *not* have this built into YAML:1.0 unless you both > feel it is important. We can add it in YAML:1.1. Agreed. > > > Implicit scalars... > > > > > > 1. Implicit scalars should only be single-line. > > No (base64). I agree with clark that base64 should require explicit typing. > > > 1a. Implicit scalars must be unquoted. > > Yes. > > > 1b. Implicit scalars begin with a non-alpha. > > Yes. > > > 1c. Implicit scalars have no whitspace characters. (brian proposed) > > No (base64, dates). OK > > > 2. All implicit scalars shoudl be part of the YAML > > > specification (perhaps a secondary document) > > > and third party implicit scalars are not allowed. > > > If a user wants their favorite sclar type implicit, > > > they can propose it on the YAML list and we may > > > bless it. Otherwise they can use explicit types. > > NO. I don't see implicit types as being any more special than explicit ones. > The choice of types in my document is part of my document's schema, and it > is my decision as a document author to either use public or private types. > As long as I can write: > > key: !!foo bar > > I expect to be able to use a private implicit type as well. A good point is > that such private types should not collide with any public type. I propose > that all private implicit type must start with `, just like all private > explicit types must start with ! (note that ` wouldn't be an indicator, it > would just be the first character of an unquoted scalar). So: > > key: `-=>bar<=- I'm OK with the backtick. > > > 3. (brian) We will reserve all regular expressions > > > up front and any thing not reserved will be > > > a string. > > (brian) We can reserve a whole lot of potential patterns by making them > > implicit warnings. ie They throw a warning that they are reserved for > > future use and should be quoted to eliminate the warning. > > > (clark) We reserve anything that does not > > > begin with an alphabetic character. > > No. Allocate all patterns matching "^`" for private usage and reserve all > the rest for public implicit types. Simpler and better. > > > I didn't use an unadorned '/' because I think in general prefix-based > implicit type should be at least two characters long, to allow a larger set. > So, how about we change references from '*' to '->' while we are at it? > > anchor: &12 > reference: ->12 > > Nice. I really don't care either way. ('*' or '->') I'm feeling less strong about implicit types because I feeling that explicit ones should be the norm. The only implicit types I care about are: int: -42 float: 3.1415926 null: ~ ref: *001 Dates and all the others can be explicit AFAIAC. > > > > I still hold firm, although I have a new suggestion. > > > Lets expand the block indicator to work as a > > > literal single-line mechanism so: | \/regex > > > > hmm. Oren? > > Neutral. I see the benefit, but there's also a wart. Does this mean that: > > --- | this is a top-level block? > --- |- and this is another w/o a newline? > --- > - | and this block > continues on the next line... > > If you are willing to accept the above, then fine, allow it. Otherwise, just > let it go, and live with: > > this: |- > is a one-line block > > Actually I think the latter is best. Block is for, well, blocks; and the > regexp issue is solved using a prefixed simple type. I feel rather strongly that: A) I don't want blocks starting on same line as indicator. B) I don't want data being on same line as a separator. > > > > > Random notes: > > > > > > 1. Brian likes !seq and !map as the abbreviations > > > for the default map and sequence types. I'm > > > assuming that !string is the default scalar > > > type (in the absence of explicit/implicit types). > > How about we keep calling it !text rather than switching to !string? I think > that "text" better reflects the intent. I find string to be a bit more of an > implementation term (like 'list' and 'array' are for 'sequence'). Fine. Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2001-11-09 07:14:24
|
I wrote: > > > This difference emerged after Brian gave the > > > use case of putting a regular expression > > > as a value. Clark's answer was "too bad", > > > (a) use a multi-line scalar, or (b) quote > > > the regular expression, or (c) let's introduce > > > an implict "regex" type starting with the ` tick. > > Oh, just find some prefix to attach to regexps and be done with... Silly me, I completely missed the point - I thought you were talking about regexp as an implicit type, while you meant "a string value containing a regexp". Hmmm. The one-line block syntax makes some sense for that, but consider that: regexp: !text ... line noise here ... Is currently a legal way of doing this. I think "line noise as text value" is rare enough that an explicit '!text' annotation is acceptable (even in Perl :-). The in-line block syntax seems more trouble than it is worth. My proposal for changing the reference syntax from '*' to '->' still stands, however. You may have noted I changed the base64 syntax from [...data...] to [=...data...=], due to the same reason: using a single prefix character is like using a type A IP network or a TLD; it should only be done for an extremely good reason. I feel '=' is a case where a one-char type is acceptable, and also '~'. Both are actually single-char values rather than a single-char prefixes; so in theory '~/=...data...' implicit types could still be used, though of course their relationship with '='/'~' would become an issue. At any rate, I would definitely draw the line there. Wasting '*' seems like, well, a waste :-) Besides, '->' may be more readable to a newbie. Have fun, Oren Ben-Kiki |