From: Oren Ben-K. <or...@ri...> - 2002-09-09 13:46:12
|
Clark C. Evans [mailto:cc...@cl...] wrote: > Really, I don't see the > need to indent top level scalars unless one line of the > scalar starts with ---, and that's a rather rare case. I > think the quoting of --- will solve the key issue. There is the problem of allowing: --- |0 foo bar Currently '|0' is impossible in the productions. A minor point. At any rate, I don't feel strongly about indenting top-level scalars either way; it was Brian's pet. How about we discuss it with him? > I'm not sure what the DWIM proposal is. I'll re-read the last post. I re-posted it with notes. In a word: > | - It is a very minor change to the info model section > | (making transfer > | method optional at the graph model) > > I'm not certain that this is needed (or even good) Hat's why it is an open issue :-) > From what I get from the DWIM proposal it means that there is > no way (other than through external knowledge) if a given node > is suppose to be an integer or not. This removes the > "self-describing" property of YAML and I think hinders the > usefulness of types. Tell me. Would you call the following document self-describing? --- #IMPLICIT:foo 17-12 : 123//15 ... What does 'foo' mean? Which values are 'foo' - "17-12" or "123//15"? Or both? Just saying 'this document uses "foo"' isn't enough. You need to know what foo means. What does #IMPLICIT:foo buy you, exactly? Besides causing my YAML-pretty-print to choke on the above, because it doesn't know what 'foo' is (_not_ a good reason)? > | So. Please show that this new syntax is absolutely > | necessary. By this I mean: > | > | - Please describe a scenario (program A does this, tool B > | does that, > | program C reads the result, and so on) where the DWIM > | proposal leads > | to trouble and the #IMPLICIT proposal does not. > > The #IMPLICIT proposal puts common type information into a > YAML file in a way that is independent of application > semantics. Thus, I could pop the file into a YAML Query, > give it a query and it'd know how to > load the file and operate on it. Oh really? I could just pop the above document to your YQUERY tool and it would know how to compare 'foo's? > With DWIM, the user would have to "register" or some how tell the > toolset which nodes have which types; this will involve N > mechanisms and actually will probably lead to a schema to be > really useful Come on... We both know the YQUERY tool would have to be given executable code that handles 'foo' types first. Regardless of #IMPLICIT or anything else. > (which is ok. I think a schema should be able to do this > without the #IMPLICIT thingy. The problem is we don't have > schema yet and probably won't for another 6 to 18 months. So... You put a part of the schema in the document header line instead, "for the mean while"? That smells of a hack. > Further, the default #IMPLICIT could be our current set of > implicits... requiring #IMPLICIT:OFF to get to the new > behavior where everything is a string. Complicated. _Needlessly_ complicated. As your example above shows, each Y<whatever> tool needs to have some executable code to handle implicits (_and_ explicits), no matter what. Having #IMPLICIT doesn't change this by one iota. What is the gain? > | - Alternatively, please show how this proposal limits the power of > | graph-level YAML tools... > > Right, and if the loader is responsible, then generic tools > written against the SERIAL model cannot take advantage of > type information. IMHO, this kinda sucks. Pray tell me. How can a generic tool take advantage of type information unless it is augmented by executable code to handle the specific type? And if it does have such code, what is the big deal of it including regexp-based detection code? > Also, I'm not sure what impact the typed vs not-typed flag > will have in the graph model. Simple. A change of a few words in the spec :-) As for implementation... > I'm ok with it in the syntax > model. Handling NULL cases complicates things, and I'm not sure that > the benefits outweigh the consequences. I still don't see a single negative consequence. > For example, YPATH > would have to add a COALESCE (IF_NULL) function to allow > types to be comparied. I'm not saying that its bad, just > that I don't konw the consequences/impacts and haven't had > time to study them. I'm not certain what you mean by "COALESECE". At any rate, sure, let's think the consequences through. > | - YAML documents are very readable by humans. > | > | The DWIM proposal is more readable than the #IMPLICIT one (no > | #IMPLICITs, no > | () around integers, dates, Booleans, etc.). > > Well, the () and #IMPLICIT are quite complementary, you > probably don't need both in a single document. Well, you'd need one of them, and both are ugly :-) > | - YAML interacts well with scripting languages. > | > | I think the #IMPLICIT proposal requires, or at least > | encourages, that > | YAML-specific types be used to represent implicit types > | while the DWIM > | proposal allows most cases to be represented as normal > | strings, which > | is the natural strategy of most scripting languages (Perl/TCL/Korn > | Shell/JavaScript/etc.). I could be wrong here. > > How am I going to save/restore an integer without jumping > through a ton of hoops? Like this: an integer: 12 The point I keep hammering on and which is somehow lost is that even if this was written Like this: an integer: !int 12 The loader would _still_ have to have int-specific executable code to handle integers. No way around it. And in the implicit case, somewhere in the system there's a regexp saying that \d+ is an integer - again, no way around it. The only question is where to put this code. Well, IMVHO, one large pile of dung is better than two small piles of dung :-) > -- learning how and registering the > implicits with both the Perl and Python parser... each > perhaps with slightly different ways to do it. If you think that doing this in a C library that is callable from both Perl and Python is the right way to go - be my guest. Personally I think that relying on a single implementation of libyaml as a way of achieving consistency between all languages isn't the right way to go. > Further, if > my app is distribued, I'll have to somehow explain to people > using my data files that X is an integer, but Y isn't a > such-and such. I think #IMPLICIT and () do this very nicely > without having to have a third document. How does #IMPLICIT explain anything? My above document uses 'foo'. Don't I need to explain to you what it means? > Overall I think the DWIM mechanism to be really useable in > anything other than a quick-one-off will have to migrate into > a full-blown schema. I support this; but we still need > something which is short-term. I think that whoever is interested in a validation-schema has his work cut out for him regardless of #IMPLICIT. And anyone who doesn't only suffers from #IMPLICIT. As for typing-schema and comparison-schema, these are inevitable with or without #IMPLICIT. In short I don't see where #IMPLICIT buys you anything. Consider my example of #IMPLICIT:foo above. Is this document valid? You have no idea of knowing because you have no idea what 'foo' means. > | - YAML uses host languages' native data structures. > | > | I don't see a difference between the two proposals here. > > Well, in the DWIM case, it goes from regex->native; And transfer method if available... > but in > the other case, it first goes through a type-uri. Huh? Why? Transfer method is optional on output as well as input. It is perfectly valid to say that node X has no transfer method (i.e., has implicit transfer) when dumping it. > Thus, I > think that the intermediate type-uri is very valueable step > in that it provides a handle for us to talk about similar > types across language boundaries. No disagreement here. I think it is good practice to assign a type-uri to all types, including implicit types; not only for the above reason, but also to be able to force the loader to interpret a given value in a given way. That does not imply that transfer method is mandatory, or that you need #IMPLICIT in order to support it. > With DWIM, you don't need > type-uri cuz it's all registered directly. This may seem > simpler, but it hinders the ability to formalize abstract > types and provide for consistent bindings across language boundaries. OK, I'll go as far as _requiring_ anyone who uses an implicit type to also assign it a type-uri. Like I said, it is good practice anyway. > | - YAML enables stream-based processing. > | > | Same for both proposals. > > Not really. In the DWIM proposal, I can't have the > string+type-uri combination to process with. And as such, a > mode of processing (which would primarly be stream-based) > would be curtailed. Sigh. In DWIM you have the string+regexp combination to work with. Anything you can do with one you can do with the other, which is a big fat zero unless you have type-specific executable code loaded into your generic tool to handle the specific type/regexp. > DWIM Good for custom types which are application specific > #IMPLICIT Good for standard types with well-known language bindings I don't see it is better in that respect. > Schema Good for custom types (DWIM-ish) which are problem > domain specific I presume you are referring to a validation schema; it is good regardless of implicit and explicit types (i.e., using #IMPLICIT). I see no difference between requiring a node have some transfer method, requiring its value conform to a regexp, or limiting its value in any other way. All are validation constraints. > () Good for simple one-offs using common well-known types, > like (true) or (2002-01-01) without requiring > #IMPLICIT header > or DWIM registration. Ugh. > In our conference we were OK with Steve's registration system, this > is similar to DWIM I think... right? I have no idea. I don't see that DWIM _requires_ a central registration system. Each document has its semantics. If one chooses whether he wants to be "public" and only use the implicits we'll register in yaml.org - fine. If one wants to load all strings of the format [A-Z][0-9][0-9]-[A-Z] into his private "Product code" data type, that's also OK. The rest of the world will think it is a string so if he sends it to my YAML pretty printer, it won't choke on "unknown implicit !product-code". > Only that the DWIM happens at > the loader level (with the quoted flag available) Quotes imply !str. It is a syntax shorthand; in the serial model it should behave "as if" a "!str" was given. > leaving the > model to still have a mandatory type... steve uses String > rather than having a NULL type. In that case it isn't the DWIM proposal. The DWIM proposal acknowledges that there's a DWIM type family, separate from string. > In short, by keeping the > type mandatory, DWIM and IMPLICT can co-exist (and is what we > agreed to last night). I don't see how... In your way a parser should reject a document that has undeclared implicits it does not know about. Right? I think that's a big flaw (e.g., YAML-pretty-print). > Speaking of #IMPLICIT. We could call this #SCHEMA, A whole different ball game. I'm willing to consider #SCHEMA, if and when we seriously discuss one. This is too big an issue to pick a small part of it and force it into the 1.0 spec. #IMPLICIT does just a small part of what a #SCHEMA would do. Either we do this right or not at all - I think that at this point in time, "not at all" is the right call. Have fun, Oren Ben-Kiki |