From: Oren Ben-K. <or...@ri...> - 2002-05-04 11:07:15
|
Whow. One turn one's back for a lousy 24 hours, and... :-) I just read through the dozens of messages with interest and the occasional glazed eyes. Mercifully Clark has summerized a potential edict so I can address that directly rather than having to collect paragraphs from this discussion... In essense I agree with it, but there are two (minor?) modifications I feel must be made: | 1. Empty strings are written using "" or '' +1. | 2. Empty maps and sequences use inline syntax {} and [] +1. | 3. We add distinction of map and sequence in the graph model. Just to make sure I get it straight - in the graph model, each type family has a kind which is one of series (collection)/keyed (collection)/scalar? As opposed to today's being a collection/scalar? If so, +1. | 4. We drop !map, !seq and !str and allow for empty transfer. | If a format is provided, a non-empty family is required. | If transfer is empty, an implementation must obviously | use the kind (scalar,map,or sequence) to determine the | in memory object to use. -10. I want to be able to have a unique explicit transfer URI for each and every node, such that strcmp on the transfer would work. The above kills this property, replacing it by some magical empty string which can mean any of three different things. Ugh. | 5. We change "lingo" to use map, sequence, and scalar | for things as much as possible. This is possible since | the !map, !seq, and !str transfers are goone. -1/+1. We have the following concepts: (A) The kind of the type family (today, collection/scalar); (B) The default type family used for each (today, seq/map/string); (C) The syntactical style of the node (today, series branch/keyed branch/leaf). Now, given (3), we can, and should, unify (A) and (C). But unifying (A)/(C) with (B) is wrong. Instead I suggest we should just unify "collection" with "branch" and "scalar" with "leaf": (A)/(B): series (collection)/keyed (collection)/scalar (C): seq/map/string (C) is the default transfer method for the appropriate node kinds. Why do we need this? Well, try writing the following otherwise: "All '!map' nodes are keyed collection nodes, but not all keyed collection nodes are '!map' nodes. For example, a keyed collection node may be a '!perl/Foo::Bar' node." Your way, the above can't be said - and it needs to be :-) | 6. We drop sparse sequence syntax and leave sparse arrays | to application specific domains, since whe have no good | way of preserving the "kind" of node (map or sequence) | as required by the graph model. +1. Just like we drop of the simple-empty-scalar notation for empty arrays. Assuming we agree on the above, our next draft should include the following: - Changing the model section according to the above; - Re-wording in the syntax section (from "branch" to "collection" and from "leaf" to "scalar"). - Changing to "Explicit" (one character change in [163] - remove the '?'), plus going over the examples; - Rolf's production fixed (thanks, Rolf!); The first one is most of the work, and the model section is Clark's baby... so... Clark, do you have the time for this? If not I'll be happy to take a shot at it. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-05-04 22:35:37
|
I *thought* I basically agreed with your edict... but it seems I don't. I'll try to clarify my views here... which may be long :-) Clark C . Evans wrote: > I think Oren has not groked that the > category or "type" of a YAML node is a tuple (family,kind) and not > just family as we were trying to do before... You have got this right. I thought that the edict implied that each type family had a "kind" property... i.e., for every type family URI there is exactly one kind. Which means that using a tuple makes no sense... Re-reading point (3) I see that this was wishful thinking - you didn't actually say that. So, now that I got this clear... I don't like the tuple approach. This ties in to a second issue: > | I particularly don't want to call a transfer method for plain YAML > | nodes. I want to disallow that to preserve interoperability in the > | YAML domain. > > I really like this idea. If there isn't a transfer method, it is > just plain-jane hashtable, array, and scalar (dict,list,string). I find this to be bizzare. YAML is a serialization format. The YAML text *must* be "transferred" (or "loaded" or whatever) into *some* in-memory native representation. Saying that, say, a string scalar isn't "transfered" just doesn't make any sense to me... OK, back to basics. There is the issue of whether !bloop [] is different than !bloop {}. My vote is that !bloop can only work with one of them, so the question doesn't even arise. Note that this allows Brian his !perl/Foo::Bar [] trick - as long as there can't possibly be a !perl/Foo::Bar {} (or vice-versa). I think this is the case in Perl... if it isn't, well, too bad - sorry, Brian. As for using a tuple rather than a type family URI string as a unique identifier... Clark said: > Further, at an implemetation level, it is a pain to toss around a > bunch of "!map", "!seq" and "!str" identifiers; I'd much rather have > them NULL (and then use the node's kind) rather than trying to do a > string comparison all of the time... In effect, think of NULL kind > as the "default" family for a given kind; then we apply other kinds > on an exception basis... I disagree. I think it is much simpler to have the type family be a simple string object (probably interned, for fast == operations), rather than a tuple. Working with a single atomic object (of a built-in type at that) is always simpler than having to deal with an aggregate. Think, just as an example, how much easier it is to create a Perl hash keyed by the URI as compared to being keyed by a tuple... and so on. I'd much rather go for Clark's original preference, one type family => one kind, period. If we do go this way, the rest of my modifications to the edict become more-or-less inevitable - I think. I'm trying hard to see what exactly is bothering Brian with this approach. Is the problem the additional effort required to try to achieve YNYRT, as opposed to an easier/simpler approach if we gave up on that and stuck with NYNRT? I'm not convinced that in fact there is an actual cost involved. I'm also unconvinced that YNYRT is impossible - I feel it is both possible, easy, and important. Perhaps I'm missing something. The term YNYRT is misleading, perhaps. One needs to consider that a YAML file, by itself, is a "dead" thing. The full term would be nYNYnRT - Say, a Python system writing a YAML file, which gets loaded to a Perl program, *not modified*, re-emitted as YAML, and then loaded to a Python program again. Say, a YAML message routed via a Perl YAML-based delivery system. Now, the goal is that the in-memory native representation of the data in the two Python programs would be "equal". We have defined what this means in the spec (same type family for all nodes, same canonical representation for all leaf nodes, etc.). I don't see anything wrong with this definition. What does it cost us to ensure this? Brian wrote: > I am looking at this scenario from the perspective of writing a Loader. > Here's the sequence of steps a loader performs: > > - Interacts with the parser to recieve nodes > - Builds each node up in memory as a hash, array, or scalar > - When a given node is complete: > - Lookup the node's transfer URI in a dispatch table Note this implicitly assumes that the type family is defined only by the URI, otherwise dispatch would have been by both the URI and the kind of input node, right? At any rate... > - Call the transfer-method for that URI > - Take whatever the transfer-method returned and insert it into the > graph So far, so good. This is exactly what the loader should do. > Note that a transfer method might change a hash into an array or some > completely opaque object. But that's ok, because the transfer was > registered within a particular domain. Right. In fact, the transfer method could convert the data, as you said, into cave drawing for all we care. Again, no problem. Given the above, I want the spec to imply/say is that for "round tripping": The registered emission routines for the type family URI, when applied to the cave drawing resulting from the above, will emit a YAML node of the same type family, with the same kind, and with "equal" content (under the definition in the spec). You are still allowed to do it - load it as a series and emit it as a map - just don't call it "round tripping". That's all. Note that in a schema-specific program, round-tripping comes more or less for free (after all, both input and output have to obey a known schema - any fooling around the kinds/type families would break it). A schema-independent program would have to encode unknown type families anyway (along the lines of what's suggested in the spec), and would also round-trip perfectly (it has no reason to mess with types it doesn't understand). What is the problem with this? As for using an "empty transfer method", isn't it obvious it just becomes an ugly special case for the above algorithm? I realize in Perl you may want to just build a hash/array/string *first*, and only then look up the URI in the dispatch table and let its "loader method" do its stuff... in which case having an "empty transfer method" seems natural. However, it may make more sense to just redirect parser events to the "loader method" of the URI rather than build it a full hash/array... I'd certainly write a C/C++ loader API this way, and probably the Java one as well. Actually the Java one may just convert the events into Java's serialization format - but no way it would build a Hash before calling the class "load" method! I don't think that a language-specific implementation optimization should effect the way we define our model. Take for example our equality notion... we'd have to redefine it around the "empty type family name" case. Ugh. I hope this addresses Brian's concerns... Finally, wording/naming, and whether the notion of "collection" is still useful. I stand by my cryptic sentence: "All '!map' nodes are keyed collection nodes, but not all keyed collection nodes are '!map' nodes. For example, a keyed collection node may be a '!perl/Foo::Bar' node." In a longer version: The type family "!map" has the kind "keyed". This means nodes using this type family must be written using the "keyed" style. However, there may be nodes written in the "keyed" style which belong to a different type family than "!map". For example, the "!perl/Foo::Bar" type family may also have a "keyed" kind, and hence would also have to be written in a keyed style; obviously "!perl/Foo::Bar" is not a "!map". Think of "!invoice" if that makes it clearer. See? we need two different names. One, "keyed", is a value of the kind property of a type family, and is also the name of the style which must be used for serializing type families of this kind. The second, "!map", is the name of a specific type family that is the default used for nodes of the "keyed" style, in case they are not given an explicit type family in a transfer method. This obviously assumes that type family => kind. I hope it is less cryptic... As for whether "collection" is a useful concept, I thought we were going to make good use of this in YPATH. I also think it provides a natural way to talk about the graph... we need *some* word to talk about "terminal" and "non-terminal" nodes in the graph, and "collection" vs. "scalar" seem a perfect vocabulary to me. That said, I agree we could downplay this aggregation. I have no problem with defining "keyed" and "series" separately, and mentioning in passing that the term "collection" is used for "keyed or series". If you want to replace the word "keyed" with something else (say, "dictionary"?)... also fine. Like I said, the model section is Clark's baby and I'll go along with any reasonable wording as long as the intent is OK. Perhaps we should IRC on this... Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-05-05 00:26:34
|
For the record, the domain of this thread is almost entirely based on the use case of YAML messaging systems, providing a minimal model so that information is not lost. | > I think Oren has not groked that the category or "type" of | > a YAML node is a tuple (family,kind) and not just family as | > we were trying to do before... | | You have got this right. I thought that the edict implied that each type | family had a "kind" property... i.e., for every type family URI there is | exactly one kind. Which means that using a tuple makes no sense... Right. | OK, back to basics. There is the issue of whether !bloop [] is different | than !bloop {}. My vote is that !bloop can only work with one of them, so | the question doesn't even arise. Note that this allows Brian his | !perl/Foo::Bar [] trick - as long as there can't possibly be a | !perl/Foo::Bar {} (or vice-versa). I can imagine stuctures having both an array and a hashtable blessed with "Foo::Bar". And this may not even be an exception. So, if !bloop {} isn't different thatn !bloop [], then Brian will have to encode the underlying data type (array, hashtable,scalar) in the family as well. This will cause stuff like !bloop:array [] to appear, which is kinda ugly. | This obviously assumes that type family => kind. I no longer think that this restriction is workable since the parser API will, of necessity, report the kind (through the structure of the interface) and the family (as a string) separately in a distinct manner, it will be hard or even impossible for the API itself to support this restriction. A niave API user (who has not studied the YAML specification) would actually assume that they are othogonal and may even use this fact. Further, what would the restriction of this boundary be, if family-x implies kind-y, must family-x always imply kind-y, or would this only apply to a given document or stream? What about our use case for turning a scalar into a map using the "=" key over time... (perhaps not in the same document). Thus, the restriction is not all that clear-cut. IMHO, this restriction seems a bit arbitrary, it is not supported by the syntax nor any API that I've seen and would be difficult, if not impossible to enforce. | > Further, at an implemetation level, it is a pain to toss around a | > bunch of "!map", "!seq" and "!str" identifiers; I'd much rather have | > them NULL (and then use the node's kind) rather than trying to do a | > string comparison all of the time... In effect, think of NULL kind | > as the "default" family for a given kind; then we apply other kinds | > on an exception basis... | | I disagree. I think it is much simpler to have the type family be a simple | string object (probably interned, for fast == operations), rather than a | tuple. Working with a single atomic object (of a built-in type at that) is | always simpler than having to deal with an aggregate. Think, just as an | example, how much easier it is to create a Perl hash keyed by the URI as | compared to being keyed by a tuple... and so on. Ok. Let me argue this one... ;) 1. It is illegal to use map, seq, and str if the kind is not keyed, series, and scalar respectively; thus there isn't a difference between them; and further you wouldn't even want to give someone the idea that they could do otherwise. 2. For 95% of my data (I use !int and !date periodically) the families are currently map, seq, and str. For efficiency, it'd be alot easier to do NULL compares than strcmp. 3. The parser interface has to communicate the data in some structure be it events or a hashtable, array, string. And in this case, the structure used to present the node has 1-1 correspondence with the kind. Thus, to say that the kind is somehow implied by the family doesn't reflect the reality of the current (and probably future APIs). 4. By eliminating the map, seq, and str transfers we can simplify the language in the rest of the spec beacuse we don't have to then differentitate between the transfer and the kind (syntatical structure). 5. What happens when a scalar is marked !seq? Yes, this is an error, but it makes for a class of errors that we don't even need to have. IMHO, if you buy the argument that family does not necessarly imply the kind, then you are left with the "type" of node being a tuple (kind,family). And as soon as you go this far, having a NULL family be allowable and dropping the redundant !seq, !map, and !str transfers seems like the next logical simplification. | I'd much rather go for Clark's original preference, one type family => one | kind, period. If we do go this way, the rest of my modifications to the | edict become more-or-less inevitable - I think. Yep. I agree, if you have family -> one kind, then I think the rest of your argument works. However, after some thought, the more "stupid" approach of letting them freely vary seems better. Thank you for challenging this. It's made me re-think things and the more times this is done, the better the new specification will read. What do you think? ;) Clark |
From: Brian I. <in...@tt...> - 2002-05-05 01:22:34
|
On 04/05/02 18:37 -0400, Oren Ben-Kiki wrote: > I *thought* I basically agreed with your edict... but it seems I don't. I'll > try to clarify my views here... which may be long :-) Just talked to Clark again. He'll summarize. I just want to address the issues raised towards my personal needs... BTW, Yesterday on the phone I jokingly said we would make an edict of 6 things. Clark was dictating, and put in the placeholder subject... I want to make another edict: "There will be no 2-party edicts on the mailing list from now on!" :-) > Clark C . Evans wrote: > > > I think Oren has not groked that the > > category or "type" of a YAML node is a tuple (family,kind) and not > > just family as we were trying to do before... > > You have got this right. I thought that the edict implied that each type > family had a "kind" property... i.e., for every type family URI there is > exactly one kind. Which means that using a tuple makes no sense... I'll defer on this... > than !bloop {}. My vote is that !bloop can only work with one of them, so > the question doesn't even arise. Note that this allows Brian his > !perl/Foo::Bar [] trick - as long as there can't possibly be a > !perl/Foo::Bar {} (or vice-versa). I think this is the case in Perl... if it > isn't, well, too bad - sorry, Brian. In Perl, object classes can just pop into existance via a _bless_ of a particular *kind* of structure. So I really need to know the kind to determine how I'll load the thing. So for me the following are equivalent: $fb1 = YAML::Load <<'...'; --- !perl/Foo::Bar [7, 11] ... $fb2 = bless [7, 11], "Foo::Bar"; There is no code (module/class) related to the 'Foo::Bar'. It's a sequence which the Loader turns into an array, and then looks up the transfer URI to see that it should call some code that will bless the array. The interesting thing is that I only invented the !perl/Foo::Bar:array syntax to deal with empty arrays, when I didn't know the kind. Hey! <IDEA> I think I can boil this into a simple ruleset: - The *kind* is what determines the _transfer_ - The _transfer_ is always: [map, sequence, string] - The *family URI* is what determines the __transform__! If we don't allow family URI-s for plain map, seq, str; then we guarantee that no "transFORM" will be performed! And I think that's something we all want. </IDEA> Hmm. I really like this. --- I also have weird types to deal with: $g1 = YAML::Load <<'...'; --- !perl/:glob PACKAGE: main NAME: xyz SCALAR: 3 ... $main::xyz = 3; $g2 = *main::xyz; In this case it seems that I have encoded the "kind" into the family. But I haven't. This gets transfered as a map, and transformed as a Perl typeglob. I can even bless this: $g1 = YAML::Load <<'...'; --- !perl/Foo::Bar:glob PACKAGE: main NAME: xyz SCALAR: 3 ... $main::xyz = 3; $g2 = bless \*main::xyz, "Foo::Bar"; I could still use: --- - !Foo::Bar:hash {} - !Foo::Bar:array [] - !Foo::Bar:string '' But they would just be redundant. So I won't. > I'm not convinced that in fact there is an actual cost involved. I'm also > unconvinced that YNYRT is impossible - I feel it is both possible, easy, and > important. Perhaps I'm missing something. I only meant that it was impossible to preserve the original formatting (obvious). I really am actually on board for preserving kind. > I realize in Perl you may want to just build a hash/array/string *first*, > and only then look up the URI in the dispatch table and let its "loader > method" do its stuff... in which case having an "empty transfer method" > seems natural. Yes. That's how I'd do it. It would be the fastest way to go. No transform for an empty (NULL) family. I told Clark that 99% of my use cases have no transfers. He says its about 90% for him. It seems like a quite beautiful "special case" to me. :) > However, it may make more sense to just redirect parser events to the > "loader method" of the URI rather than build it a full hash/array... I'd > certainly write a C/C++ loader API this way, and probably the Java one as > well. Actually the Java one may just convert the events into Java's > serialization format - but no way it would build a Hash before calling the > class "load" method! That's an interesting approach. I would never do it for YAML.pm because it adds complexity to the transform API for end-users. > I don't think that a language-specific implementation optimization should > effect the way we define our model. Take for example our equality notion... > we'd have to redefine it around the "empty type family name" case. Ugh. I agree that YAML.pm should not drive the process. I think there's much more compelling reasons to not use a family on common cases. > Perhaps we should IRC on this... I think the mailing list approach is still best. It just might take time. You should move Clark's timezone. I did. ;) Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-05-05 07:33:05
|
Brian Ingerson [mailto:in...@tt...] wrote: > I want to make another edict: "There will be no 2-party > edicts on the mailing list from now on!" > > :-) +10 :-) > > You have got this right. I thought that the edict implied > > that each type > > family had a "kind" property... i.e., for every type family > > URI there is > > exactly one kind. Which means that using a tuple makes no sense... > > I'll defer on this... Are you truly neutral on this? The only example I have for not doing the above is your Perl trick... And Clark originally favored this - so there's no problem, right? :-) > > than !bloop {}. My vote is that !bloop can only work with > > one of them, so > > the question doesn't even arise. Note that this allows Brian his > > !perl/Foo::Bar [] trick - as long as there can't possibly be a > > !perl/Foo::Bar {} (or vice-versa). I think this is the case > > in Perl... if it > > isn't, well, too bad - sorry, Brian. > > In Perl, object classes can just pop into existence via a _bless_ of a > particular *kind* of structure. I'm well aware of that, I just wondered whether it was OK to bless different types of structures with the same name. I gather it is... Which means you need some way to distinguish between them (BTW, why did you give up on using @, $ and * for that, with hash ('%') being the default? It seems natural enough for any Perl programmer). > Hey! <IDEA> I think I can boil this into a simple ruleset: > - The *kind* is what determines the _transfer_ > - The _transfer_ is always: [map, sequence, string] > - The *family URI* is what determines the __transform__! > If we don't allow family URI-s for plain map, seq, str; then > we guarantee > that no "transFORM" will be performed! And I think that's > something we all > want. > </IDEA> > > Hmm. I really like this. This is a nice way indeed but it is 100% Perl and doesn't translate to any other language. It builds upon Perl's ability to "bless" existing structures, which is a rather unique property. I think Python doesn't work this way, and certainly Java/C#/Smalltalk/C++/Lisp/etc. don't. > > However, it may make more sense to just redirect parser > > events to the > > "loader method" of the URI rather than build it a full > > hash/array... I'd > > certainly write a C/C++ loader API this way, and probably > > the Java one as > > well. Actually the Java one may just convert the events into Java's > > serialization format - but no way it would build a Hash > > before calling the > > class "load" method! > > That's an interesting approach. I would never do it for > YAML.pm because it > adds complexity to the transform API for end-users. I'm not arguing you should. In Perl, your way makes perfect sense. > > I don't think that a language-specific implementation > > optimization should > > effect the way we define our model. Take for example our > > equality notion... > > we'd have to redefine it around the "empty type family > > name" case. Ugh. > > I agree that YAML.pm should not drive the process. I think > there's much more > compelling reasons to not use a family on common cases. I'd love to hear them. To me it seems obvious every in-memory structure has some native type, hence it has a type family, hence I can specify it explicitly. Map/List/String are only magical in Perl; they have no special standing in Java/C++/Smalltalk/etc. So from a Java/C++/Smalltalk/etc. POV, it seems like an unnecessary special case to say some nodes "don't have a type family". And, as I've said in my reply to Clark, I really want "type family completeness"... > I think the mailing list approach is still best. It just > might take time. You > should move Clark's timezone. I did. ;) You've moved to the east coast? Great. Perhaps we'll finally manage to meet in person, next time I'm around. Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-05-05 12:07:56
|
On 05/05/02 03:34 -0400, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > > I'll defer on this... > > Are you truly neutral on this? The only example I have for not doing the > above is your Perl trick... And Clark originally favored this - so there's > no problem, right? :-) Bad choice of words on my part, and wishful thinking on yours. :) I'm definitely 100% interested in working through this to my satisfaction. Let me say that I am only about 10% interested in the whole Perl bless solution. It's relatively minor to the entire YAML concept. But I'm still not convinced that transferring by kind and transforming by family is not a big simplification. All of your arguments seem to be from the side of "YAML as XML replacement" rather than "YAML as serialization language". I'll reply more specifically to your arguments later today, after I have a chance to think on them more. Cheers, Brian |
From: Brian I. <in...@tt...> - 2002-05-05 18:04:38
|
On 05/05/02 03:34 -0400, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > > I want to make another edict: "There will be no 2-party > > edicts on the mailing list from now on!" > > > > :-) > > +10 :-) > > > > You have got this right. I thought that the edict implied > > > that each type > > > family had a "kind" property... i.e., for every type family > > > URI there is > > > exactly one kind. Which means that using a tuple makes no sense... > > > > I'll defer on this... > > Are you truly neutral on this? The only example I have for not doing the > above is your Perl trick... And Clark originally favored this - so there's > no problem, right? :-) No. And I don't consider it a "trick". I consider it a "transform" like any other. The kind determines the graph model, unless it is modified by a "transform" URI. The transform URI is a mechanism for an application to register a transformation callback. In the perl.yaml.org domain, which is owned by YAML.pm, YAML.pm itself will define the transforms. Note that the transform can't simply be looked up by strcmp() on the URI. That's because there's a many to one relationship, at least for the Perl stuff. For real Perl Modules/Classes/Objects, I think I'll need to use a different domain altogether, because the transforms will be registered by the module authors and will be out of YAML.pm's direct control. (I was thinking I could take cpan.yaml.org for this. Then I could do !cpan/Real::Foo instead of !cpan.org/Real::Foo.) > > > than !bloop {}. My vote is that !bloop can only work with > > > one of them, so > > > the question doesn't even arise. Note that this allows Brian his > > > !perl/Foo::Bar [] trick - as long as there can't possibly be a > > > !perl/Foo::Bar {} (or vice-versa). I think this is the case > > > in Perl... if it > > > isn't, well, too bad - sorry, Brian. > > > > In Perl, object classes can just pop into existence via a _bless_ of a > > particular *kind* of structure. > > I'm well aware of that, I just wondered whether it was OK to bless different > types of structures with the same name. I gather it is... Why wouldn't it be? This is a basic Perl concept. > Which means you need some way to distinguish between them (BTW, why > did you give up on using @, $ and * for that, with hash ('%') being > the default? It seems natural enough for any Perl programmer). Let's see, Perl has the following internal data types: - hash - array - scalar - ref - glob - code - io - regexp - lvalue ?? I'm not sure about the specifics of lvalue, so I'll skip that one. Now I'm pretty sure you can bless all of these except ref. Ref is currently handled by YAML !ptr, which I think is fine. So now I'm left with 14 types of things to deal with. The first is plain types: - !perl/:glob or !perl/* - !perl/:code or !perl/& - !perl/:io or !perl/<> ??? - !perl/:regexp or !perl// ??? so what are these: - !perl/:hash or !perl/ ??? - !perl/:array or !perl/@ - !perl/:scalar or !perl/$ Then I need to add blessed types: - !perl/Foo::Bar:glob or !perl/*Foo::Bar etc... The glob, io, and regexp all use map serializations, and the code uses the string form. > > Hey! <IDEA> I think I can boil this into a simple ruleset: > > - The *kind* is what determines the _transfer_ > > - The _transfer_ is always: [map, sequence, string] > > - The *family URI* is what determines the __transform__! > > If we don't allow family URI-s for plain map, seq, str; then > > we guarantee > > that no "transFORM" will be performed! And I think that's > > something we all > > want. > > </IDEA> > > > > Hmm. I really like this. > > This is a nice way indeed but it is 100% Perl and doesn't translate to any > other language. It builds upon Perl's ability to "bless" existing > structures, which is a rather unique property. I think Python doesn't work > this way, and certainly Java/C#/Smalltalk/C++/Lisp/etc. don't. Excuse me? The IDEA I had above had nothing to do with the Perl bless quandry specifically. It was merely a new way to phrase to _you_ the way I envision that YAML works. Consider python's tuple, !python/tuple. This must be serialized as a sequence. Any YAML parser would parse it as such. But in Python Loader it gets transformed into a tuple graph. In Perl's loader it get's turned an array and then transformed into an array marked with '!python.yaml.org/tuple'. > > > However, it may make more sense to just redirect parser > > > events to the > > > "loader method" of the URI rather than build it a full > > > hash/array... I'd > > > certainly write a C/C++ loader API this way, and probably > > > the Java one as > > > well. Actually the Java one may just convert the events into Java's > > > serialization format - but no way it would build a Hash > > > before calling the > > > class "load" method! > > > > That's an interesting approach. I would never do it for > > YAML.pm because it > > adds complexity to the transform API for end-users. > > I'm not arguing you should. In Perl, your way makes perfect sense. > > > > I don't think that a language-specific implementation > > > optimization should > > > effect the way we define our model. Take for example our > > > equality notion... > > > we'd have to redefine it around the "empty type family > > > name" case. Ugh. > > > > I agree that YAML.pm should not drive the process. I think > > there's much more > > compelling reasons to not use a family on common cases. > > I'd love to hear them. To me it seems obvious every in-memory structure has > some native type, hence it has a type family, hence I can specify it > explicitly. Map/List/String are only magical in Perl; they have no special > standing in Java/C++/Smalltalk/etc. So from a Java/C++/Smalltalk/etc. POV, > it seems like an unnecessary special case to say some nodes "don't have a > type family". And, as I've said in my reply to Clark, I really want "type > family completeness"... The "smell" here is that you are determining the type family implicitly and directly from the kind, which means you are making an unnecessary layer of abstraction. Why not just use the kind to determine the graph model (which is what every newbie will expect) and use the family to transform the graph? !map, !seq, !str are completely redundant now that we have inlines for empties and have eliminated the magical !seq sparsity transform. Why can't we eliminate them. Consider that almost every YAML document that a newbie will look at will be devoid of !foo. But all the documentation they read mandates they know the difference between series and sequence and that [1,2,3] doesn't make something an array; it's really the invisible, redundant, and powerless: !http://yaml.org/seq. Blech! > > > I think the mailing list approach is still best. It just > > might take time. You > > should move Clark's timezone. I did. ;) > > You've moved to the east coast? Great. Perhaps we'll finally manage to meet > in person, next time I'm around. Definitely. I've already made the arrangements :) Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-05-05 07:34:26
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | OK, back to basics. There is the issue of whether !bloop [] > | is different > | than !bloop {}. My vote is that !bloop can only work with > | one of them, so > | the question doesn't even arise. Note that this allows Brian his > | !perl/Foo::Bar [] trick - as long as there can't possibly be a > | !perl/Foo::Bar {} (or vice-versa). > > I can imagine structures having both an array and a hashtable > blessed with "Foo::Bar". And this may not even be an exception. And each being a different, unrelated type? I find this to be pretty strange. Can anyone provide an example of when it is useful, or even just being used? Besides, whatever we decide on the relationship between type family and kind, I really, really, want the type family URI to be the single, unique, complete identifier for the relevant type (call it "type family completeness"). I want to be able to specify, in a schema language, that a node needs to be of type family "<some URI>" and be done - I don't want to have to say: required-node-type: uri: <some-uri> kind: <some-kind> I want to be able to say: required-node-type: <some-uri> I want to be able to use the URI as a key in a method dispatch table (as Brian said in his loader example), rather than have to use a tuple as the key. I want to be able to *print* the "complete unique type identifier" of a node without resorting to packing the URI with additional data. And so on and so forth. I *really* want these. Please? > Ok. Let me argue this one... ;) > > 1. It is illegal to use map, seq, and str if the kind is not keyed, > series, and scalar respectively; thus there isn't a difference > between them; and further you wouldn't even want to give someone > the idea that they could do otherwise. For the zillionth time... No matter what we decide, "map => keyed, but keyed !=> map". There *is* a difference. That's why we need two names ("keyed" and "map"). Maybe I'm just not getting it... > 2. For 95% of my data (I use !int and !date periodically) the > families are currently map, seq, and str. For efficiency, it'd > be alot easier to do NULL compares than strcmp. "Premature optimization is the root of all evil". It is trivial to return an interned string for these three types. In fact it would be extremely good practice to do so for *all* type family URIs, so you could do pointer comparison anyway. Besides, your way you'd be doing tuple comparison, rather than a simple pointer comparison. > 3. The parser interface has to communicate the data in some structure > be it events or a hashtable, array, string. And in this case, the > structure used to present the node has 1-1 correspondence with the > kind. Thus, to say that the kind is somehow implied by the family > doesn't reflect the reality of the current (and probably > future APIs). The *acceptable kind* is implied by the family. We've been through that when we discussed required restrictions for making YPATH an effective tool. We *must* prevent: this: !date year: 2002 month: 4 day: 12 From being equal to: this: !date 2002-04-12 Because /this/year works for one and not for the other, and hence YPATH would break after a "safe round tripping" of the data. Hence the 'date' type family *must not* accept a keyed node - it only accepts a scalar one; a simple case of a type family implying a single unique node kind. This is obvious for scalars vs. collections. It also holds for keyed vs. series. Suppose I write: this: !pair - zero - one And the YPATH expression: /this/one/../<previous node> (For some "<previous node>" syntax). Is this expected to work with: this: !pair 1: one 0: zero I think that's too much to ask. But according to our current rules, we'd have to require it so that YPATH would survive round-trips. Hence I see the logic in forcing each type family to force exactly one node kind. It immediately gains us the safety of YPATH across round-trips. Now, neither of this is reflected by the API. It is an additional restriction we impose. I don't see any difference between saying the "date" example above is illegal - something which I hope we all agree on - and saying the "pair" example is illegal. It is exactly the same restriction, and I don't care whether or not it is reflected in the API. If it bothers you that much... specify the expected kind for each type family when you register its loading methods. Have the parser emit an error if the node kind doesn't match, way before it invokes the method. This way it *would* be reflected in the API. It would also allow this method to be type-safe - have a different signature depending on the expected node kind. > 4. By eliminating the map, seq, and str transfers we can simplify > the language in the rest of the spec because we don't have to then > differentitate between the transfer and the kind > (syntatical structure). You *ALWAYS* have to make this distinction because... zillionth and one... *NOT EVERY KEYED NODE IS A MAP*. And it would *complicate* the language. First you need to come up with a good name for this tuple. After all, two nodes are equal if their "type tuple" is equal and their content is equal, etc. I doubt the term "type tuple" would be used. "Type family" is taken by the URI. I can't think of a good name for it that won't be confused with "type family". You'd have to call the *tuple* the "type family" and call the URI "type URI". A type URI is just a bunch of characters which has *absolutely no meaning on its own* - it must be combined with a "kind" to become a true "type family". UGH. *Please* no? > 5. What happens when a scalar is marked !seq? Yes, this is an error, > but it makes for a class of errors that we don't even need to have. What happens when a keyed node is marked !int? A series node is marked !date? This class of errors is inevitable. > IMHO, if you buy the argument that family does not necessarily > imply the kind, then you are left with the "type" of node being > a tuple (kind,family). And as soon as you go this far, having > a NULL family be allowable and dropping the redundant !seq, !map, > and !str transfers seems like the next logical simplification. Actually, no, there's a third way. Which happens to be in the spec right now. The type family uniquely identifying the type, and it is OK to use either keyed or series style for a collection type family. I agree this isn't good (it breaks YPATH). It isn't clear to me this is worse than breaking "type family completeness". I'd much rather keep both YPATH *and* type family completeness, even at the cost of Brian having to write !perl/Foo::Bar::array. Sorry, Brian... BTW, why can't you just use '@' for array and '$' for scalar, as in !perl/@Foo::Bar and !perl/$Foo::Bar (and use !perl/*Foo::Bar for references, etc.)? The set of valid URI characters seems to allow it... > | I'd much rather go for Clark's original preference, one > | type family => one > | kind, period. If we do go this way, the rest of my > | modifications to the > | edict become more-or-less inevitable - I think. > > Yep. I agree, if you have family -> one kind, then I think the > rest of your argument works. And since, if one insists on both "type family completeness" and safe YPATH, this is the only reasonable way... :-) > However, after some thought, the > more "stupid" approach of letting them freely vary seems better. > Thank you for challenging this. It's made me re-think things > and the more times this is done, the better the new specification > will read. > > What do you think? I'm afraid... very afraid :-) I think giving up on "type family completeness" is a really bad idea that will haunt us for ages. I'll go as far as comparing it to XML's singularly destructive idea of having a tag's unique name being a tuple {namespace, local-name}. I recall with a sinking feeling the hoops people went through in order to define a single unique canonical string representation of the tag name, something not given in the spec but so useful that everyone re-invented it. *shudder*. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-05-05 14:45:25
|
Oren, You have good arguments here, let me think on it some. Two comments off hand: 1. If we require that family implies kind, then we cannot do the "substitutability" mechanism talked about earlier; in other words: --- !name Clark C. Evans cannot migrate easily to... --- !name =: Clark C. Evans given: Clark family: Evans middle: C. suffix: 2. I wanted to remark that the current spec (which does not distinguish between keyed and series collections) does indeed support YPATH consistency since a series is viewed as a collection with integer domain. Thus, the two items below would have the same YPATH if we don't put the keyed/series distinction in the graph model: --- !foo 0: one --- !foo - one "one" can be fetched by /0 in both cases. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-05-05 16:58:19
|
Ok. There are two things to decide here: 1. If family implies kind; that is, if kind is a property of each node, or if it is a property of each kind. 2. If family can be NULL. Let's not cover this one yet as it is less important (I'm more willing to go either way on this one) and is dependent on the first decision. | And each being a different, unrelated type? I find this to be pretty | strange. Can anyone provide an example of when it is useful, or even just | being used? The biggest use case i can imagine is "forward compatibility" where a given family which starts out as a scalar grows to become an array or mapping (with the scalar value stored as the first item in the array or as the value of =). | Besides, whatever we decide on the relationship between type family and | kind, I really, really, want the type family URI to be the single, unique, | complete identifier for the relevant type (call it "type family | completeness"). I agree with the concept; but I have reservations <insert xml-dev URI discussion here>. You can determine if two objects "named" by a URI are the same if their URIs are the same, but you can't determine if the two objects are different if their URIs are different. It is a very weak form of equality. Things in the real world may have multiple names depending on context... no way around it. | I want to be able to specify, in a schema language, that a node needs to be | of type family "<some URI>" and be done - I don't want to have to say: | | required-node-type: | uri: <some-uri> | kind: <some-kind> | | I want to be able to say: | | required-node-type: <some-uri> This was why I was initially leaning to the family implies kind position. However, when all the other considerations are laid out on the table this one takes a back seat. Also, this can be done as nice syntax sugar in the schema language. | I want to be able to use the URI as a key in a method dispatch table (as | Brian said in his loader example), rather than have to use a tuple as the | key. I want to be able to *print* the "complete unique type identifier" of a | node without resorting to packing the URI with additional data. And so on | and so forth. Ok. I don't think this one is possible. Beacuse if you wanted to have family->kind, then a "registry" mechanism would be required by the parser (or else the convention could not be enforced). And this registry would require the pairing. The pairing is not escapable it is a matter of when the paring occurs: - at the node level (or via application defined schema) - at the document level - at the stream level - globally I say we go with the former. You are proposing the latter (which is unenforcable without a global, centralized registry). The middle ground is more confusing than either extreme, but it is enforcable with some amount of work; namely a parser registration of the paring. I do strongly believe that what ever level the pairing happens, it should be enforced (otherwise we will have lots of non-compliant YAML documents everwhere). | > 2. For 95% of my data (I use !int and !date periodically) the | > families are currently map, seq, and str. For efficiency, it'd | > be alot easier to do NULL compares than strcmp. | | "Premature optimization is the root of all evil". | | It is trivial to return an interned string for these three types. In fact it | would be extremely good practice to do so for *all* type family URIs, so you | could do pointer comparison anyway. Besides, your way you'd be doing tuple | comparison, rather than a simple pointer comparison. This much is true. But alas, it is the second decision. | > 3. The parser interface has to communicate the data in some structure | > be it events or a hashtable, array, string. And in this case, the | > structure used to present the node has 1-1 correspondence with the | > kind. Thus, to say that the kind is somehow implied by the family | > doesn't reflect the reality of the current (and probably | > future APIs). | | The *acceptable kind* is implied by the family. Yes, but isn't this a schema issue? Much like "one" is not an integer? See above... where does the kind/family pairing happen. I think it should happen at the node level, with the restrictions being imposed by the schema langauge. | We've been through that when we discussed required | restrictions for making YPATH an effective tool. We | | this: !date | year: 2002 | month: 4 | day: 12 | | >From being equal to: | | this: !date 2002-04-12 But they _are_ different according to the graph model, as the kind of "this" is different. | Because /this/year works for one and not for the other, and hence YPATH | would break after a "safe round tripping" of the data. Hence the 'date' type | family *must not* accept a keyed node - it only accepts a scalar one; a | simple case of a type family implying a single unique node kind. Certainly; in most cases a family has exactly one acceptable kind. However, must we force this to be the case for all families in all circumstances? How can we feisably do this without having 1000's of invalid YAML documents beacuse people have not read the specification in painstaking detail? | This is obvious for scalars vs. collections. It also holds for keyed vs. | series. Suppose I write: | | this: !pair | - zero | - one | | And the YPATH expression: | | /this/one/../<previous node> In my vision of YPATH this expression would not work with the above; however, you could select "one" with /this/1 | this: !pair | 1: one | 0: zero /this/1 would work here too and select "one"; but alas, since the two structures above are "different" in the model, YPATH need not return this way. | I think that's too much to ask. But according to our current rules, we'd | have to require it so that YPATH would survive round-trips. | Hence I see the logic in forcing each type family to force exactly one node | kind. It immediately gains us the safety of YPATH across round-trips. I think you are saying that by not having family -> kind we would be vulnerable to round-trip "swaps". This is not the case, as the two above are _not_ equivalent since the kind is different and since kind is in the graph model. Thus, YPATH need not return the same value for both since they are different; if it does (beacuse they both happen to be collections), great, but it need not. | If it bothers you that much... specify the expected kind for each type | family when you register its loading methods. Have the parser emit an error | if the node kind doesn't match, way before it invokes the method. This way | it *would* be reflected in the API. It would also allow this method to be | type-safe - have a different signature depending on the expected node kind. I think if we went with family -> kind, then something like this would have to become part of the API or people would start using YAML incorrectly (and violating the information model) and hurt interoperability. I think it is a steep price to pay, don't you? | And it would *complicate* the language. First you need to come up with a | good name for this tuple. After all, two nodes are equal if their "type | tuple" is equal and their content is equal, etc. I would just say that two nodes are equal if their family, kind, and content are equal. Not too hard. | > 5. What happens when a scalar is marked !seq? Yes, this is an error, | > but it makes for a class of errors that we don't even need to have. | | What happens when a keyed node is marked !int? A series node is marked | !date? This class of errors is inevitable. Yep, it was a weak point. ;) | Actually, no, there's a third way. Which happens to be in the spec right | now. The type family uniquely identifying the type, and it is OK to use | either keyed or series style for a collection type family. | | I agree this isn't good (it breaks YPATH). It isn't clear to me this is | worse than breaking "type family completeness". I'd much rather keep both | YPATH *and* type family completeness With the replacement of "collection" with "keyed" and "series" in the information model we have less YPATH problems (if we had them in the first place... I distinctly had YPATH in mind with the collection definition). As for "type family completeness", I'm still a bit confused as to what the benifit I get with such a big cost (pre-registration to make sure that a given family is using the right kind). | I think giving up on "type family completeness" is a really bad idea that | will haunt us for ages. I'll go as far as comparing it to XML's singularly | destructive idea of having a tag's unique name being a tuple {namespace, | local-name}. I recall with a sinking feeling the hoops people went through | in order to define a single unique canonical string representation of the | tag name, something not given in the spec but so useful that everyone | re-invented it. *shudder*. Yes. The namespace/local-name break was hugely unfortunate, since it is so prevalent in the language. I don't think that this is anywhere near as problematic. In 99% of the cases, people will _know_ that family X is KEYED and write their data/schema accordingly. As for the schema, if it is too much of a burden to specify that a given node X has a family of F and a kind of K, we can always provide a mechanism where the schema can provide a "default" kind for a given family. Thus, a way to register (within the schema) that a family F always uses a kind K. And in this way, the schema language need not specify the kind on ever node. However, I feel that this is really a schema level issue... and not one that is core to our inforamation model. To summarize the arguments for having kind in the graph model and having the kind/family paring done on a per node basis: 1. It reflects the API without any restrictions, the loader/application is passed both a kind and a family as distinct separate items (one via the structure of the events, and the other via a string) on a per node basis. 2. It aligns directly with the syntax. Each node has both a kind and a family. The family isn't declared above with its acceptable kind, for example. 3. It allows the kind of a node to change over time for a given family (aka schema migration). 4. It moves the restrictions into the realm of schema langauge where, IMHO, it belongs. 5. It does not require a complicted enforcment mechanism. ;) Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Steve H. <sh...@ha...> - 2002-05-26 13:04:59
|
http://wiki.yaml.org/yamlwiki/YamlIsToXmlWhatWikiIsToHtml?action=show |
From: Oren Ben-K. <or...@ri...> - 2002-05-05 20:46:10
|
Clark C . Evans wrote: > Ok. There are two things to decide here: > > 1. If family implies kind; that is, if kind is > a property of each node, or if it is a property > of each kind. > > 2. If family can be NULL. Let's not cover this > one yet as it is less important (I'm more willing > to go either way on this one) and is dependent > on the first decision. OK. > The biggest use case i can imagine is "forward compatibility" > where a given family which starts out as a scalar grows to > become an array or mapping (with the scalar value stored > as the first item in the array or as the value of =). The good old color idiom. Hmmm. That is a point. However there's something inconsistent here... in this use case, at least at some level, one would ideally want to keep the "equality" of an old, simple scalar and a degenerate array (with one element) or mapping (with just a sibgle '=' key). It seems that without this property, you aren't really "forward compatible". However, if { <uri>, keyed } is taken to be a different type than { <uri>, scalar }, then you can't have it... This requires more thought. I'll sleep over it. And see "the 4-model view" below... > | Besides, whatever we decide on the relationship between type family > | and > | kind, I really, really, want the type family URI to be the single, > | unique, > | complete identifier for the relevant type (call it "type family > | completeness"). > > I agree with the concept; but I have reservations <insert xml-dev > URI discussion here>. You can determine if two objects "named" by > a URI are the same if their URIs are the same, but you can't determine > if the two objects are different if their URIs are different. It is > a very weak form of equality. Things in the real world may have > multiple names depending on context... no way around it. Certainly everyone is entitled to creating their own !http://mine.com/int type or whatever, and we can't stop them. But I don't see this as a serious problem. Certainly anyone using this type intends for it not to compare equal with any !int node... > | I want to be able to say: > | > | required-node-type: <some-uri> > > This was why I was initially leaning to the family implies kind > position. However, when all the other considerations are laid > out on the table this one takes a back seat. I'm still unconvinced... No, I'm still convinced this is wrong :-) > Also, this can be > done as nice syntax sugar in the schema language. The syntax of the schema language should be YAML... I don't want to rely on additional syntax sugar. > | I want to be able to use the URI as a key in a method dispatch table > | ... > > Ok. I don't think this one is possible. Beacuse if you wanted > to have family->kind, then a "registry" mechanism would be required > by the parser (or else the convention could not be enforced). As always we must distinguish between two cases. A schema-specific application would *know* the type family so it would know the acceptable kind - without a need for a central registry. Enforcement is automatic by checking the input does the schema. A schema-blind application could care less... it just encodes the type family information for round-tripping. Again, no need for central registry. Enforcement is impossible anyway because it doesn't know the schema. > And > this registry would require the pairing. The pairing is not escapable > it is a matter of when the paring occurs: > ... > I do strongly believe that what ever level the pairing happens, > it should be enforced (otherwise we will have lots of non-compliant > YAML documents everwhere). What do you mean, "enforced"? If the definition of each YAML type family, per our spec, includes a single kind, and everyone who is defining such a type family, uses it in a schema, implements code etc. is aware of its kind - isn't that enforcement enough? Why is a central registry required? If someone just slaps a random type family to a random node without any consideration to whether this node satisfies the definitions of the type family, naturally you'd get a lot of invalid documents. But I don't see this as being specific to the kind issue. You could make the same claim about, say, the format of dates - that there should be enforcement that every node given the !date transfer method must be in a valid date format. Verifying that a given document satisfies a given schema can only be achieved if one is aware of the schema; and if in every schema, each type family has one kind, then there's no problem of enforcement. > | The *acceptable kind* is implied by the family. > > Yes, but isn't this a schema issue? Much like "one" is not an integer? Yes. The definition of a type family is exactly a schema issue. This of course is neither here nor there... > | We've been through that when we discussed required > | restrictions for making YPATH an effective tool... I still think this is a crucial - *the* crucial - point. > | > We must prevent > | this: !date > | year: 2002 > | month: 4 > | day: 12 > | > | > From being equal to: > | > | this: !date 2002-04-12 > > But they _are_ different according to the graph model, > as the kind of "this" is different. Now this will be extremely confusing to a newbie, and hell on the veterans. The whole point of using the same !date type family for both nodes is to say "this is, at some level, the same thing". But under your rules, it *isn't* the same thing. It seems the only consistent way to grok the above, and Brian has implied it in his last post, is to assume *four* models, not three: - syntax, as today; - tree, as today; - intermediate: map/list/string *only*, possibly with cycles etc., with each node annotated by a type family; - native: native data structures. Brian said this almost explicitly by saying: > ... Why not just use the kind to determine the graph [intermediate] > model (which is what every newbie will expect) and > use the family to transform the graph? [into the native model] What Brian refered to as "the graph model" seems to be an intermediate step - which does physically exist in YAML.pm - between the tree model and the final native model. This is natural in Perl... these are the unblessed data structures. What I call "the graph model" - and Brian didn't name - are the blessed data structures (the final native model). If I'm right in this interpretation it explains why we haven't managed to get very far... same words, two different concepts. The above clarifies a lot to me; for example, the "NUL transfer method" makes perfect sense in the 4-model view. "Transfer" is the operation converting between the intermediate model and the final native model - and for some nodes, there is no conversion necessary. In fact, a lot of what Brian said makes perfect sense in a 4-model view (I diagree, however, that to a newbie, or to anyone for that matter, '{ ~, "series" }' is a better globally unique name for the data type "a sequence of values" than "!http://yaml.org/seq". At least a URI is an identifier. Really, '{ ~, "series" }' - Ugh!). At any rate, in the four-model view, what you seem to be saying is that the !date example above is different in the intermediate model, while being identical in the native model. Likewise your "forward compatibility" use case. Is this a correct understanding? Well, if it is, I dislike it. I believe that 4 models is one model too many. I think today's graph model, modified so that each type family has just one kind, is the best possible formalism for describing the final (blessed, if you will) native data structures. The thing is, I don't see the point in formalizing an intermediate model between the final native data structures and the tree model. Any operation that can be done at that level can be done at the graph/native level as well. Any intermediate representation (including cave drawings :-) can be viewed as just another way to represent the graph/native model (as long as all the information is there), and so on. I can't think of a single reason why an additional level would be useful *conceptually*. It obviously may be useful as a programming technique, e.g. in YAML.pm: Perl works this way, and more power to it for explicitly supporting it, and allowing for YAML.pm to be written in such an elegant way. But this has nothing to do with YAML model definitions. > Certainly; in most cases a family has exactly one acceptable kind. > However, must we force this to be the case for all families in > all circumstances? I think yes, if only for playing it safe (be more restrictive up front, just like Brian suggested for trailing commas). > How can we feisably do this without having > 1000's of invalid YAML documents beacuse people have not read > the specification in painstaking detail? What painstaking detail? The spec clearly says that a type family must have: - a URI - a kind - formats (if scalar) - canonical format - explicit formats - implicit formats This is clearly stated up front rather than being subtly implied. And, as you agree, it just plain makes sense for most cases. I'd argue it makes sense for all cases (because of YPATH issues): > | Suppose I write: > | > | this: !pair > | - zero > | - one > | > | and: > | > | this: !pair > | 1: one > | 0: zero > | > | And the YPATH expression: > | > | /this/one/../<previous node> > > In my vision of YPATH this expression would not work with > the above; [second version] > however, you could select "one" with /this/1 [in both cases] So... again this only makes sense in the 4-model view: > I think you are saying that by not having family -> kind we would > be vulnerable to round-trip "swaps". This is not the case, as the > two above are _not_ equivalent since the kind is different and since > kind is in the graph [intermediate] > model. [but they are equal in the native model] > Thus, YPATH need not return the same > value for both since they are different; if it does (beacuse they > both happen to be collections), great, but it need not. [since YPATH can't be applied to the native model] And this is supposed to be simpler to explain to a newbie? All !<uri> are equal, but some !<uri> are more equal? I don't think so. > | If it bothers you that much... specify the expected kind for each type > | family when you register its loading methods. Have the parser emit an > | error > | if the node kind doesn't match, way before it invokes the method. This > | way > | it *would* be reflected in the API. It would also allow this method to > | be > | type-safe - have a different signature depending on the expected node > | kind. > > I think if we went with family -> kind, then something like this > would have to become part of the API or people would start using > YAML incorrectly (and violating the information model) and hurt > interoperability. I think it is a steep price to pay, don't you? Nope. Like you said, in most cases, each type family accepts only one kind anyway. I think having only 3 models instead of 4, and having a safe YPATH that works for all models (rather than just for 3 out of 4 models), specifically allowing it to safely and directly apply to native application data structures, is well worth the extra pain for the very small number of people who want !xyzzy [ ... ] to contain a shopping list and !xyzzy { ... } to contain a French-Swahili dictionary. They would just have to call one !xyzzy:list and the second !xyzzy:dictionary. > | I'd much rather keep both > | YPATH *and* type family completeness > > With the replacement of "collection" with "keyed" and "series" in the > information model we have less YPATH problems (if we had them in the > first place... > I distinctly had YPATH in mind with the collection definition). I agree, though it is still easier on the tounge to say "collection" rather than "series or keyed" - which you'd be saying a lot when defining YPATH this way :-) > As for "type family completeness", I'm still a bit confused as to > what the benefit I get with such a big cost (pre-registration to > make sure that a given family is using the right kind). Benefit: 3 models vs. 4 models; safe YPATH and a coherent equality definition for all models, including native data structures. Cost: one lousy additional argument to the registration method, *if you are using one*, where in 100% of the cases its value is well known to the code's author (since he's just written the loading method being registered). Oh, and not being able to give two completely unrelated types the same URI name (using kind to distinguish them), or to give the same type two different "unique tuple names" (having two kinds of the same URI mean the same native type). Talk about confusion. Sorry, I just don't see the problem. > | I think giving up on "type family completeness" is a really bad idea > | that will haunt us for ages. And I continue to believe so :-) > To summarize the arguments for having kind in the graph model > and having the kind/family paring done on a per node basis: > > 1. It reflects the API without any restrictions, the > loader/application is passed both a kind and a family > as distinct separate items (one via the structure of > the events, and the other via a string) on a per node basis. The API can and should, where possible, be fixed to reflect the restrictions we *must* place in order for YPATH and other YAML-level tools to be able to function. If we don't place these restrictions, all such tools would be at the mercy of alternate representation of the "same, but not same" data such as the date and pair examples above. > 2. It aligns directly with the syntax. Each node has both > a kind and a family. The family isn't declared above with > its acceptable kind, for example. You could make the same argument about the format. By this logic, nodes with different formats should also be considered of different "types". Please give me one argument that applies to distinguishing type by node kind that doesn't directly apply when replacing the word "kind" with the word "format". > 3. It allows the kind of a node to change over time > for a given family (aka schema migration). Like I said, I'll have to sleep on this one. The above is only true if one takes a 4-model view and accepts that the native model is beyond the scope of YAML tools. I feel that in this case, "compatibility" is rather useless, since any YAML-level tool (e.g. YPATH) would simply fail on these "compatible representations". I admit I have no good way of solving the problem in the 3-model view either. I'd call it a draw with a large "Hmmm...". > 4. It moves the restrictions into the realm of schema > langauge where, IMHO, it belongs. Restrictions are always in the realm of the schema language. But we have restrictions on what the schema language may specify. For example, you can't have a type with partial order on its members, or a set, etc. (not at the YAML level, anyway). And you can't give the same name to two different types! The whole point of a URI is that it is unique! > 5. It does not require a complicted enforcment mechanism. Neither approach does - beyond normal schema enforcement which you get anyway, again in both approaches. Well, this has been long, and educational (at least for me). I hope the 4-model view vs. the 3-model view clarifies some things. Perhaps it would even help to sway you towards my view :-) And whatever we decide we have an open issue with regard to the color idiom... Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-05-06 00:57:48
|
Outstanding question: Does family imply kind -- is kind a property of each node (flexible), or a property of each family (restrictive). This is asked from the context of the "graph" model, the model upon which our path, schema, and transform languages will be defined. To give a tangable example of the restriction, the following would become illegal YAML since the family is used with two different node kinds: --- - !perl/Foo::Bar [] - !perl/Foo::Bar {} We loose the above syntax, and would require a more verbose syntax such as... --- - !perl/Foo::Bar:array [] - !perl/Foo::Bar:hash {} but in return we get a URI which uniquely identifies the type. ... | > | I want to be able to use the URI as a key in a method dispatch table | > | > Ok. I don't think this one is possible. Beacuse if you wanted | > to have family->kind, then a "registry" mechanism would be required | | As always we must distinguish between two cases. | | A schema-specific application would *know* the type family so it would know | the acceptable kind - without a need for a central registry. Enforcement is | automatic by checking the input does the schema. Ok. So, in this case, you are implying that it is the application's responsibility to enforce the restriction and not the responsibility of the YAML tool library. In this case, I suggest that the restriction can be an application specific restriction. ;) | A schema-blind application could care less... it just encodes the type | family information for round-tripping. Again, no need for central registry. | Enforcement is impossible anyway because it doesn't know the schema. Ok. So once again, the YAML tool library cannot enforce this restriction. Thus, I suggest that this restriction (and its enforcement) is an application specific thing and that there is no point in limiting YAML proper (or the YAML tool library) with the restriction. | > I do strongly believe that what ever level the pairing happens, | > it should be enforced (otherwise we will have lots of non-compliant | > YAML documents everwhere). | | What do you mean, "enforced"? I mean the following would have to all cause errors, no? <begin stream> --- - !perl/Foo::Bar [] - !perl/Foo::Bar {} </end stream> <begin stream> --- !perl/Foo::Bar [] --- !perl/Foo::Bar {} </end stream> <begin of time> <begin stream> --- !perl/Foo::Bar [] </end stream> ... hours, days, months later ... <begin stream> --- !pero/Foo::Bar {} </end stream </end of time> The YAML toolset could catch a few of these cases... but not all of them, and not without a pretty extensive mechanism. | I still think this is a crucial - *the* crucial - point. | | > | > We must prevent | > | this: !date | > | year: 2002 | > | month: 4 | > | day: 12 | > | | > | > From being equal to: | > | | > | this: !date 2002-04-12 | > | > But they _are_ different according to the graph model, | > as the kind of "this" is different. | | Now this will be extremely confusing to a newbie, and hell on the veterans. | The whole point of using the same !date type family for both nodes is to say | "this is, at some level, the same thing". But under your rules, it *isn't* | the same thing. Why is this confusing? At a particular level they may be equivalent, but this "application" level is out-of-scope. Lots of things may also be equivalent and our low-level "graph" model can't possibly know about these things... so we shouldn't even try. However, in the model we are talking about, our graph model, is where YPATH, YSCHEMA, and other YAML specific tools will be defined on. In this model, the above two "dates" are very different; obvious from the vetran to newbie alike. | It seems the only consistent way to grok the above, and Brian has implied it | in his last post, is to assume *four* models, not three: | | - syntax, as today; | - tree, as today; | - intermediate: map/list/string *only*, possibly with cycles etc., with each | node annotated by a type family; | - native: native data structures. Yes, this is close to my thinking; only that what you call the "native" data structure is application-level, and out-of-scope. Thus, the best model we can have is the intermediate (aka graph) model; the model upon which YPATH, YSCHEMA, and other "generic" YAML tools will be based. In this model, you appear not to have the restriction... you mention a node with a kind (map/list/string) and annotated with a type family. | Brian said this almost explicitly by saying: | > ... Why not just use the kind to determine the graph | [intermediate] | > model (which is what every newbie will expect) and | > use the family to transform the graph? | [into the native model] | | What I call "the graph model" - and Brian didn't name - are the blessed data | structures (the final native model). If I'm right in this interpretation it | explains why we haven't managed to get very far... same words, two different | concepts. Perhaps. Although, the decision all comes down to if the following example is legal YAML: --- - !perl/Foo::Bar [] - !perl/Foo::Bar {} | At any rate, in the four-model view, what you seem to be saying is that the | !date example above is different in the intermediate model, while being | identical in the native model. Likewise your "forward compatibility" use | case. Is this a correct understanding? I would say that the examples are _different_ in the graph(intermediate) model. And I'd further say that you have to go to the applicaiton model (or use YSCHEMA) to say anything further about their "identical" nature. In this particular example, the yaml.org application domain defines date to only be a scalar kind, and thus the second form is illegal. But it is illegal not at the YAML graph model, but at the application level where further restrictions can be imposed. | The thing is, I don't see the point in formalizing an intermediate model | between the final native data structures and the tree model. Any operation | that can be done at that level can be done at the graph/native level as | well. Hmm. I would say that you can't possibly know more about the "final native" model... and thus this is out-of-scope. The graph model is the best we can do with our limited scope. It is the model where our path, schema, and transform languages will be defined. | > I think you are saying that by not having family -> kind we would | > be vulnerable to round-trip "swaps". This is not the case, as the | > two above are _not_ equivalent since the kind is different and since | > kind is in the graph | [intermediate] | > model. | [but they are equal in the native model] I think it is up to the application to determine if they are equal in the "native" model. Frankly, if YPATH, YATL, YSCHEMA, and other generic YAML tools don't operate a the "native" model, then I don't care what happens at that level. If someone wants to round-trip (in what ever variety) they must preserve information at the YAML graph model. If two nodes are different in the YAML graph model model, but the same in the application, then "generic YAML tools" won't be able to help the application. It's actually pretty simple. | > Thus, YPATH need not return the same | > value for both since they are different; if it does (beacuse they | > both happen to be collections), great, but it need not. | [since YPATH can't be applied to the native model] Right. YPATH is only concerned with what is in the graph model, it can't know that two things may be equivalent in the application level... | Cost: one lousy additional argument to the registration method, *if you are | using one*, where in 100% of the cases its value is well known to the code's | author (since he's just written the loading method being registered). Cost: YAML tools which cannot determine if a given YAML document is valid since it lacks the application level knowledge to associate family to the allowed kind. Or, significant registration complexity, most likely requiring a central type registry. IMHO, a show-stopper. | Oh, and not being able to give two completely unrelated types the same URI | name (using kind to distinguish them), or to give the same type two | different "unique tuple names" (having two kinds of the same URI mean the | same native type). Talk about confusion. If an application does this, it is their fault; the best we can do is set a good example. | > 1. It reflects the API without any restrictions, the | > loader/application is passed both a kind and a family | > as distinct separate items (one via the structure of | > the events, and the other via a string) on a per node basis. | | The API can and should, where possible, be fixed to reflect the restrictions | we *must* place in order for YPATH and other YAML-level tools to be able to | function. Agreed. You have not shown how YPATH or YAML-level tools will be impaired by allowing different kinds for the same type family. | If we don't place these restrictions, all such tools would be at the mercy | of alternate representation of the "same, but not same" data such as the | date and pair examples above. Once again, this is an application level issue; nothing we can do about it at this level. If they want to shoot their foot off, I'm sure there are more efficient ways to do it. | > 2. It aligns directly with the syntax. Each node has both | > a kind and a family. The family isn't declared above with | > its acceptable kind, for example. | | You could make the same argument about the format. By this logic, nodes with | different formats should also be considered of different "types". Please | give me one argument that applies to distinguishing type by node kind that | doesn't directly apply when replacing the word "kind" with the word | "format". Hmm. Good point. Let me chew on this one. This is going to have lots of other implications besides this particular topic. | > 3. It allows the kind of a node to change over time | > for a given family (aka schema migration). | | Like I said, I'll have to sleep on this one. The above is only true if one | takes a 4-model view and accepts that the native model is beyond the scope | of YAML tools. I feel that in this case, "compatibility" is rather useless, | since any YAML-level tool (e.g. YPATH) would simply fail on these | "compatible representations". Right. I think the most important item to come out of this discussion is what the limitations of YAML's generic representation will be... | > 4. It moves the restrictions into the realm of schema | > langauge where, IMHO, it belongs. | | Restrictions are always in the realm of the schema language. But we have | restrictions on what the schema language may specify. For example, you can't | have a type with partial order on its members, or a set, etc. (not at the | YAML level, anyway). And you can't give the same name to two different | types! The whole point of a URI is that it is unique! Ok. Let's put this one in bold. The whole point of a URI is that it is a unique identifier for a given type. Hmm. | Well, this has been long, and educational (at least for me). I hope the | 4-model view vs. the 3-model view clarifies some things. Perhaps it would | even help to sway you towards my view :-) Yes. It has been educational. If we do allow multiple kinds for a given family then we definately break that uniqueness. Hmm. | And whatever we decide we have an open issue with regard to the color | idiom... We could define the color idiom as an _insertion_ of a mapping into the ancestor chain: --- name: !http://clarkevans.com/name Clark Evans --- name: !http://clarkevans.com/name.v2 =: !http://clarkevans.com/name Clark Evans given: Clark family: Evans I'm not sure that I like it, but the restriction should not be a huge impediment to deploying the color idiom. Best, Clark |
From: Brian I. <in...@tt...> - 2002-05-04 12:46:39
|
On 04/05/02 07:09 -0400, Oren Ben-Kiki wrote: > Whow. One turn one's back for a lousy 24 hours, and... :-) Better late than never. We really could have used you yesterday. > I just read through the dozens of messages with interest and the occasional > glazed eyes. Mercifully Clark has summerized a potential edict so I can > address that directly rather than having to collect paragraphs from this > discussion... I really would have liked to know your take on the specific issues we raised. Clark called me in desparation and after talking a bit I asked him to make the following suggestions, and he seemed to agree with them... > In essense I agree with it, but there are two (minor?) modifications I feel > must be made: > > | 1. Empty strings are written using "" or '' > > +1. OK. And let's disallow trailing comma. > > | 2. Empty maps and sequences use inline syntax {} and [] > > +1. > > | 3. We add distinction of map and sequence in the graph model. > > Just to make sure I get it straight - in the graph model, each type family > has a kind which is one of series (collection)/keyed (collection)/scalar? As > opposed to today's being a collection/scalar? Yes. But... When we talk about the graph model (which we should do too often :) can we just drop the distinction of collection (or branch or whatever) and just talk in terms of map and sequence and scalar as three independant entities. Lumping map and series together as collections just adds mysticism to YAML. All (scripting) languages refer to these as separate entities. Let's keep it simple and not add unnecessary terms. Specifically I would like to drop the terms qw(leaf branch keyed series collection) from our vocabulary when talking about the syntax and graph models. Which by the way are the only two models that exist so far. When we start addressing the tree model specifically, then it might make sense to add branch and leaf back in. > > If so, +1. > > | 4. We drop !map, !seq and !str and allow for empty transfer. > | If a format is provided, a non-empty family is required. > | If transfer is empty, an implementation must obviously > | use the kind (scalar,map,or sequence) to determine the > | in memory object to use. > > -10. > > I want to be able to have a unique explicit transfer URI for each and every > node, such that strcmp on the transfer would work. The above kills this > property, replacing it by some magical empty string which can mean any of > three different things. Ugh. I disagree with you here. I don't know how to express my thoughts tersely so I will follow up in a separate long-winded message. *After* breakfast. > > | 5. We change "lingo" to use map, sequence, and scalar > | for things as much as possible. This is possible since > | the !map, !seq, and !str transfers are goone. > > -1/+1. > > We have the following concepts: > > (A) The kind of the type family (today, collection/scalar); > > (B) The default type family used for each (today, seq/map/string); > > (C) The syntactical style of the node (today, series branch/keyed > branch/leaf). > > Now, given (3), we can, and should, unify (A) and (C). But unifying (A)/(C) > with (B) is wrong. Instead I suggest we should just unify "collection" with > "branch" and "scalar" with "leaf": > > (A)/(B): series (collection)/keyed (collection)/scalar > (C): seq/map/string > > (C) is the default transfer method for the appropriate node kinds. > > Why do we need this? Well, try writing the following otherwise: > > "All '!map' nodes are keyed collection nodes, but not all keyed collection > nodes are '!map' nodes. For example, a keyed collection node may be a > '!perl/Foo::Bar' node." > > Your way, the above can't be said - and it needs to be :-) Perhaps I'm dense, but I've read the above 5 times and can't make any real sense of it. Perhaps I just need breakfast. > > | 6. We drop sparse sequence syntax and leave sparse arrays > | to application specific domains, since whe have no good > | way of preserving the "kind" of node (map or sequence) > | as required by the graph model. See you after breakfast... Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-04 16:28:42
|
On Sat, May 04, 2002 at 05:46:32AM -0700, Brian Ingerson wrote: | When we talk about the graph model (which we should do too often :) can we | just drop the distinction of collection (or branch or whatever) and just talk | in terms of map and sequence and scalar as three independant entities. | Lumping map and series together as collections just adds mysticism to YAML. | All (scripting) languages refer to these as separate entities. Let's keep it | simple and not add unnecessary terms. Well, we may have to talk about map and series as "branches" or "collections" so that properties common to both of them but not common to scalars, "leaf", can be explained. But other than that purpose, I agree with Brian that our primary kind list should just be ('scalar','map','sequence') Best, Clark |
From: Brian I. <in...@tt...> - 2002-05-04 18:04:47
|
On 04/05/02 12:34 -0400, Clark C . Evans wrote: > On Sat, May 04, 2002 at 05:46:32AM -0700, Brian Ingerson wrote: > | When we talk about the graph model (which we should do too often :) can we > | just drop the distinction of collection (or branch or whatever) and just talk > | in terms of map and sequence and scalar as three independant entities. > | Lumping map and series together as collections just adds mysticism to YAML. > | All (scripting) languages refer to these as separate entities. Let's keep it > | simple and not add unnecessary terms. > > Well, we may have to talk about map and series as "branches" or "collections" .........................................^sequence :) > so that properties common to both of them but not common to scalars, "leaf", > can be explained. But other than that purpose, I agree with Brian that > our primary kind list should just be ('scalar','map','sequence') Let's try not to use these terms. If it turns out that we must have a generic term in our writings, then I vote not to formalize the term. Formalizing it gives it a seemingly special meaning. We gotta keep this stuff simple. Nobody'll want to have to study YAML to use it. Cheers, Brian |
From: Brian I. <in...@tt...> - 2002-05-04 13:44:02
|
On 04/05/02 07:09 -0400, Oren Ben-Kiki wrote: > | 4. We drop !map, !seq and !str and allow for empty transfer. > | If a format is provided, a non-empty family is required. > | If transfer is empty, an implementation must obviously > | use the kind (scalar,map,or sequence) to determine the > | in memory object to use. > > -10. > > I want to be able to have a unique explicit transfer URI for each and every > node, such that strcmp on the transfer would work. The above kills this > property, replacing it by some magical empty string which can mean any of > three different things. Ugh. OK. I just had a bowl of cereal. I could really use some coffee... I don't know where to start on this one but I feel like I have a lot to say, so I'll just jump in. To me, YAML is just a serialization syntax. If I Dump something out I should be able to recreate it on a Load. This is dubbed NYNRT (native->YAML->native round tripping). Now YNYRT has never seemed possible, and it isn't, formatting wise. But Clark has convinced me that it is important to retain the *kind* of node when going YNY. *BUT* What I want to propose is that this isn't always the case. I say that we only uphold YNYKRT when there is no type family (explicit/implicit) transfer. That's as far as we go. We dictate that a *plain* YAML/map->Perl/dict->YAML/map->Python/dict->YAML/map round trips for YNYKRT. It's all about domains really. There are three separate categories of domains thus far: - YAML defined - Language defined - User defined The decision of how things YNY is within the scope of those domains. --- I am looking at this scenario from the perspective of writing a Loader. Here's the sequence of steps a loader performs: - Interacts with the parser to recieve nodes - Builds each node up in memory as a hash, array, or scalar - When a given node is complete: - Lookup the node's transfer URI in a dispatch table - Call the transfer-method for that URI - Take whatever the transfer-method returned and insert it into the graph Note that a transfer method might change a hash into an array or some completely opaque object. But that's ok, because the transfer was registered within a particular domain. I particularly don't want to call a transfer method for plain YAML nodes. I want to disallow that to preserve interoperability in the YAML domain. That's one reason why we decided that an empty transfer would be nice. I suppose it's not necessary though. We could stick with !map and !seq as transfers, it's just that the loader wouldn;t do anything with them, and it would always be redundant for a user to specify them. That was my point for "why do we need them?". Can't you do a strcmp on an empty string? --- The key concept for me is the concept of domains, and that we only flex our YAML muscles within the yaml.com domain. The Perl team defines operation within the perl.yaml.com, and that Big Motors has fee reign in the bm.com domain. --- I've said enough for now. I hope you can find some wisdom in the rambling. Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-04 16:53:08
|
| To me, YAML is just a serialization syntax. If I Dump something out I should | be able to recreate it on a Load. This is dubbed NYNRT (native->YAML->native | round tripping). Now YNYRT has never seemed possible, and it isn't, | formatting wise. But Clark has convinced me that it is important to retain | the *kind* of node when going YNY. Yep. | What I want to propose is that this isn't always the case. I say that we only | uphold YNYKRT when there is no type family (explicit/implicit) transfer. That's too restrictive; see below. | It's all about domains really. There are three separate categories | of domains thus far: | | - YAML defined | - Language defined | - User defined | | The decision of how things YNY is within the scope of those domains. Exactly. And the graph model is only talking about the YAML defined domain. It talks about the image of appliation's data onto a graph model. A few permutations of the application data may not be considered a "change" from the application level; however, it may be a change at the YAML level. This is to be expected. The graph model does not define what an application must consider "changed", but it does define what YAML believes is a "change". The whole point of the model is so that an application knows what due care is expected when it intends to round-trip to other language bindings and/or applications. To be safe, keeping the data identical at the "graph model" level is a promise that all YAML processors have to behave identically. Let's look at it this way, given a stateless process, the graph model says that given two inputs Y, X this process should behave identically if both inputs are equal according to the graph model. Thus, if various syntax sugars are used, this should not affect the process's behavior (perhaps with the exception of an YAML editor...). The model does not say anything about the converse; that is, there may be 1000s of examples of a YAML input with different YAML representations that may be treated as equivalent. This is the application's choice. However, if the app is interacting with other "unknown" processes, the YAML graph model provides a guideline as to what it should preserve for the other processes to behave as if two texts are the "same". This is hard to articulate, and should be worked out further and put in the specification. BTW, defining the model like this is what XML did retroactively; and thus it had many processes using XML that would treat different things as changes; and this greatly hinders interoperability. | I am looking at this scenario from the perspective of writing a Loader. | Here's the sequence of steps a loader performs: | | - Interacts with the parser to recieve nodes | - Builds each node up in memory as a hash, array, or scalar | - When a given node is complete: | - Lookup the node's transfer URI in a dispatch table | - Call the transfer-method for that URI | - Take whatever the transfer-method returned and insert it into the graph | | Note that a transfer method might change a hash into an array or some | completely opaque object. But that's ok, because the transfer was registered | within a particular domain. Yes. This process is great. And at this point, it is the application's job to preserve the "YAML graph model" if it wishes to just parrot the information (or a subset of the information) without having other processs consider that the information was changed. So, if a transfer changes a hash into an array, and it gets saved as an array, then the graph model tells us that the information, from the perspective of YAML has changed. And other processes may treat this information differently. In essence, this "change", if it is not changed back by the emitter will result in the output stream being a "different" YAML structure. And that's OK. It just isn't a round-trip which preserved structure. | I particularly don't want to call a transfer method for plain YAML nodes. I | want to disallow that to preserve interoperability in the YAML domain. I really like this idea. If there isn't a transfer method, it is just plain-jane hashtable, array, and scalar (dict,list,string). | That's one reason why we decided that an empty transfer would be nice. I | suppose it's not necessary though. We could stick with !map and !seq as | transfers, it's just that the loader wouldn;t do anything with them, and it | would always be redundant for a user to specify them. That was my point for | "why do we need them?". Can't you do a strcmp on an empty string? They seem redundant to me. I think Oren has not groked that the category or "type" of a YAML node is a tuple (family,kind) and not just family as we were trying to do before... | The key concept for me is the concept of domains, and that we only flex our | YAML muscles within the yaml.com domain. The Perl team defines operation within | the perl.yaml.com, and that Big Motors has fee reign in the bm.com domain. I like your concept; but I also think that the idea of YAML invariance can be extended to cover nodes having non-NULL families. Hopefully the above discussion help clarify the need and purpose of the graph model. Boy, everyone in XML land knows that they need an information model, but the average person doesn't grok it. It is bitching hard thing to explain, and leaving it to "experience" doens't cut it. Hmm. Was the above explanation good? Anyway I can fix it? Somehow it needs to be included in the specification so that the underlying purpose of the graph model is clear. | I've said enough for now. I hope you can find some wisdom in the rambling. Same here. I think we are on the same page, but we just got there from a different path. ;) Clark |
From: Clark C . E. <cc...@cl...> - 2002-05-04 16:24:37
|
On Sat, May 04, 2002 at 07:09:04AM -0400, Oren Ben-Kiki wrote: | | 4. We drop !map, !seq and !str and allow for empty transfer. | | I want to be able to have a unique explicit transfer URI for each and every | node, such that strcmp on the transfer would work. The above kills this | property, replacing it by some magical empty string which can mean any of | three different things. Ugh. It means that the unique classification of a node is defined by a tuple (kind,family) where kind is in ('scalar','map','sequence') and family is a uri string. Up till a few days ago, I had the funny idea that the classififcation of the node was exactly the node's family. This was incorrect since a scalar with a family of "!!yahoo" is a different node than a collection with a family of "!!yahoo". Now that we extend kind to replace collection with map and sequence it becomes clear to me that this illusion has to go away. --- !perl/Foo::Bar [] For example, you can't uniquely determine the above node's classification by just the family; the fact that it is a sequence is also needed. Thus, the classification of node above is ('perl/Foo::Bar','sequence'). Once you look at it this way, allowing the family to be NULL doesn't hurt things at all, in fact it probably makes them a bit cleaner. | Why do we need this? Well, try writing the following otherwise: | | "All '!map' nodes are keyed collection nodes, but not all keyed collection | nodes are '!map' nodes. For example, a keyed collection node may be a | '!perl/Foo::Bar' node." Hmm. The problem with the answer above is that a "perl/Foo::Bar" node does not unqiquely label the node, I also need to know if it is a keyed, series, or scalar. Thus, since kind is othogonal to family in this way; I see no harm in allowing family to be NULL. Further, at an implemetation level, it is a pain to toss around a bunch of "!map", "!seq" and "!str" identifiers; I'd much rather have them NULL (and then use the node's kind) rather than trying to do a string comparison all of the time... In effect, think of NULL kind as the "default" family for a given kind; then we apply other kinds on an exception basis... Best, Clark |