From: Brian I. <briani@ActiveState.com> - 2001-05-18 20:57:39
|
Oren Ben-Kiki wrote: > > > > 1. Brian stated that he would invstigate Oren's Syntax > > > and get back with us if it meets Perl's serilization > > > requirements for hard references. If not, specify > > > what alternatives we can use. > > > > I don't think it's that important to investigate. It will probably > > always be a moot point. I will let Data::Denter use it's current scheme > > to deterministically round-trip all Perl data structures. YAML.pm > > probably will have no need for this. It's all acadenic and I have no > > spare time for academics for three more months. (My guess is, yes it > > could be made to work, but would be suboptimal for Perl people) Let's > > leave it at that for now. > > Does that mean we are giving up on Denter using YAML syntax (extended to > handle pointer-to-pointer)? Just for the record, the Perl component for YAML is called YAML.pm. Data::Denter is only of interest to Perl programmers from this point on. It may fondly be remembered as the catalyst for YAML 1.0. And it may keep a greedy eye on the YAML projects treasures, but that's of no concern here. > > I'm going to go over it with a fine-tooth comb, just to see what is involved > in making YAML a superset of it. I guess I'll also have to look at MIME > while I'm at it, with the same comb :-) Beware of the nits! Nasty buggers. ;) > > > On 4 & 5. I don't really like the blank line at the beginning thing > > because people will mess it up or not understand it. And we have many > > heuristic options. > > > > A) Parse lookahead for X-YAML-Version > > B) Option-A rarely needed because as soon as we see a key that is *not* > > RFC822 compliant, we assume YAML. 99% of the time this is the first > > line! > > C) If there is no whitespace allowed before the colon in RFC822, we > > simply make it a requirement in YAML. Or does this break your RFC > > compatability rules? > > > > Just for my own edification, would you please explain the rationale > > behind making YAML RFC822 compliant. And do so with one of more specific > > examples. Thanks :) > > Well, for example, suppose that YAML was a "good enough" superset of RFC822. > Then we could just adopt my idea that "blank lines separate top-level maps" > and we wouldn't have to say anything further about RFC822 headers, period. > If one wants to read/write a mail message as a YAML document, then it will > simply work (as long as he sticks to the "safe" constructs there). If one > wants to have a YAML document that has nothing to do with RFC822, that also > works. No need for any special statement about them. I like this approach > best. I think that sounds right, if I understand it correctly. My only contention above was the very first blank line, not the ones separating documents. > > > > " this is the hash\n key for this example :-) " : #class : > > I assume the trailing ':' is a typo? > No. See earlier post message for the reasoning. > > |# My Perl Subroutine > > | > > | sub version { > > | if ($_[0] =~ /\n/) { > > | return \ "\to sender"; > > | } > > | } > > > > Sorry for overloading this example with so many weird things. I'll just > > comment on the multiline semantics: > > > > A) Trailing whitespace is preserved if the transporter preserves it. > > B) The content can always be encoded before transport anyway. > > C) Nothing is escaped. The content is truly verbatim. A '\' is a '\'. > > D) An implicit newline is assumed to be at the end of every line. > > We have to decide what our position is about them, BTW. Is a newline a "\n" > or a "\n\r" - the answer may be different in-memory and in the text file > (and thank you, O nameless DOS/CPM programmer, for inflicting this on us :-) Bastard of Bastards. :( But I think the heuristic is quite simple. Since the newline is implicit, just replace whatever is there with the system's native choice. > > > E) Note that the '|' is one column back from the actual indentation > > level. This is intententional. And it will work even if the indent width > > is set to one character wide. (not mandatory, but I like it.) > > Under Python indentation rules, there's no problem indenting the "label" > line by 4 characters and the text lines by 7, or whatever. What you say > about one character indentation, however, implies that the following would > be legal: Yes. It would be legal. > > text: > |multi-line > |text > > I'm not certain I like it. I think Clark should make the call here - > indentation is his baby. I actually don't like it for another subtle reason. Tabs. You couldn't use them properly with this scheme. So let's scrap the backing up one space requirement. And yes, that's my final answer ;) > > I started thinking about it and hit on an issue which Brian may already have > thought about - or will have to very soon, if he's covering YAML.pm :-) The > problem is we haven't defined the data model (or, viewing it differently, > the round-tripping issue). > > In "dynamic" languages such as Perl, JavaScript, Python (and to some extent, > Java), it is natural to map a YAML map to the native hash, a list to a > vector/array, and a scalar value to a simple string. That works admirably > well, as long as the YAML entity hasn't been annotated with an ID or a class > name. > > If one wants to provide a stable-round tripping utility (e.g., suppose I > want to write a YAML pretty printer), where am I to store the ID of a scalar > value? The class of a map? For this use case, it seems my best course of > action is to wrap the native construct (map/list/scalar) in an object which > has an "id", a "class", and a "value". > > There are several options: > > A) Use the native constructs when possible, and only use "wrapper" objects > when there's a need. That makes access pattern unpredictable: do I write > map{key} or map{key}.value? That's my idea. > > B) Always use wrapper objects, and give up on de-serializing YAML into > arbitrary native data structures. Big hit on usefulness - if we do this, > Brian will just give up on us :-) You're getting to know me pretty well ;) > > C) Declare that IDs may be re-written arbitrarily, even by pretty printers. > That is, banish them from the data model. I think I agree... > > That leaves "class" as the only problematic issue. We explicitly decided not > to talk about it in the conference call. It seems to me like there's no way > around requiring that this data will survive round-trips, but I also don't > see how it is possible to de-serialize "scalar value" into a normal "Java > String" if someone attached an "unknown" class to it. I've read through this briefly, but don't have time to comment yet. Let's stick with the original syntax for now. In general, keep in mind that YAML 1.0 will *not* be the final YAML spec. It will evolve to YAML 2.0 and so on. For now, let's strive for maximum sytactic simplicity. I think we can special case the semantics of 1.0 without needing to change the current syntax. > > > > > > 12. Brian mentioned that he'd show YAML to one of > > > his Perl friends. (sorry I didn't catch his name) > > > > Damian Conway http://www.csse.monash.edu.au/~damian/ > > His input will be greatly appreciated. Emailed Damian last night. He's preparing for an 11-week world speaking tour. I'll see him in June at the YAPC (Yet Another Perl Conference) in Montreal and I'll be sure to pin him down about YAML. BTW, I mentioned to Clark that I'll probably be speaking about YAML at YAPC :) > > > 15. Clark agreed to write up the "single vs multi" > > > line controversy and post to the list so that > > > it is clearly understood. > > I thought we settled this... Every scalar value is potentially multi-line. > It doesn't seem to cost us anything, or does it? I agree but see below. > > > > 16. We made little progress on the scalar indicator > > > for lists, to colon or not to colon. It wasn't > > > agreed, but Clark thinks this is someone else's > > > monkey. If Oren and Brian can't agree within > > > 7 days, Clark will put on the dictator cap. > > > > We traded in the '$' for the ':'. '$' as the last character in a line > > I thought ':' was the first one; it is "as if" it is a normal header, with > the key "just happening" to be empty. This seems more consistent. > > > meant a multiline scalar was to follow. Converting this semantic to the > > ':' leaves us with these represntations: > > > > key1 : @ > > single line > > : > > classless folded > > multi line > > another single line > > and another > > #class &0001 : > > : #class &0001 No, not a mistake. > > > classed multi > > line > > #class &0002 classed single line > > % > > key : value > > @ > > This is an empty list, right? Yup. Just to keep you on your toes :) > > > ~ > > And this is a null? Indeed. > > > #classy % > > key : value > > : even this multi line on the same line > > as a colon thingy works because there > > a little bit of indentation imposed by > > colon. (Although I don't love it) > > This means the following: > > : single line > > Will also work, even though you *really* dislike it. I like them :-) Noted :-) > > > : "Another thingy like above that meets" > > "RFC822 wackiness" > > : > > | 1 > > | 1 1 > > | 1 1 1 > > |Just for completeness :-) > > I think we've said everything there's to be said about this, and whether or > not you find either: > > list: > : One > : Two > : Three > and Four > > Or: > > list: > One > Two > : > Three > and Four > > To be beautiful or ugly is, when all is said and done, a matter of taste. To > you, the extra ':'s are an eyesore; to me it seems strange that the > multi-line value is "more indented"; it seems as though there's structure > involved, when there isn't. I also like being able to do /^:/ in VI to get > to the next entry. While your comment on aesthetics may be true, there is a major distinction between what you think a ':' means and my intent. 1) A ':' is always a key value separator. We agree on that, but each want it to have one other meaning. 2) You want colon to be a "list bullet" in list context. 3) I want ':' to mean '$' for scalar values. And I want it to almost always be optional (unless there is ambiguity) 4) That said. We can make it the canonical/default form for emitters if we wish. Consider the following four examples. 1) Fully qualified with '$'. key1 : $ my dog has fleas key2 : $ "$40.00 for veternarian exam" key3 : $ The vet said, "Yes Ingy, Your dog has fleas." key4 : $ Ingy said, "Wow, my dog has fleas!" key5 : #class1 $ I hate fleas key6 : #class2 $ What is your viewpoint about fleas? key7 : #class3 @ $ Tom the flea $ Dick the flea $ Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 $ A very classy flea #class6 $ |My favorite fleas: | Jim | Bob 2) Fully qualified with ':'. The only real gain here is no " for $40.00. key1 : : my dog has fleas key2 : : $40.00 for veternarian exam key3 : : The vet said, "Yes Ingy, Your dog has fleas." key4 : : Ingy said, "Wow, my dog has fleas!" key5 : #class1 : I hate fleas key6 : #class2 : What is your viewpoint about fleas? key7 : #class3 @ : Tom the flea : Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 : A very classy flea #class6 : |My favorite fleas: | Jim | Bob 3) Minimal key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 I hate fleas key6 : #class2 What is your viewpoint about fleas? key7 : #class3 @ Tom the flea Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 A very classy flea #class6 |My favorite fleas: | Jim | Bob Note that the only required ':' (besides the key/value ones) is for ': Harry the flea' 4) Suggested canonical form: key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 : I hate fleas key6 : #class2 : What is your viewpoint about fleas? key7 : #class3 @ : Tom the flea : Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 : A very classy flea #class6 : |My favorite fleas: | Jim | Bob So in this last example we always use the optional scalar indicator ':' for all scalars in a list (by default). Note that a #class or &id *always* comes before a %, @, or :. It's just that the ':' is usually optional. The things I don't allow are: key1 : @ : % : @ : : a scalar : #class % : #class @ : #class : a scalar The problem with ':' as a "list bullet" is that it could not be optional. And that's too restrictive just to satisfy a personal aesthetic. , Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |