RE: [Yaml-core] Trying to make sense of things...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Clark C. Evans [mailto:cc...@cl...] wrote:
> Really, I don't see the 
> need to indent top level scalars unless one line of the 
> scalar starts with ---, and that's a rather rare case.  I 
> think the quoting of --- will solve the key issue.

There is the problem of allowing:

--- |0
 foo
bar

Currently '|0' is impossible in the productions. A minor point. At any rate,
I don't feel strongly about indenting top-level scalars either way; it was
Brian's pet. How about we discuss it with him?

> I'm not sure what the DWIM proposal is.  I'll re-read the last post. 

I re-posted it with notes. In a word:

> | - It is a very minor change to the info model section 
> | (making transfer 
> | method optional at the graph model)
> 
> I'm not certain that this is needed (or even good)

Hat's why it is an open issue :-)

> From what I get from the DWIM proposal it means that there is 
> no way (other than through external knowledge) if a given node 
> is suppose to be an integer or not.  This removes the 
> "self-describing" property of YAML and I think hinders the 
> usefulness of types.

Tell me. Would you call the following document self-describing?

--- #IMPLICIT:foo
17-12 : 123//15
...

What does 'foo' mean? Which values are 'foo' - "17-12" or "123//15"? Or
both?

Just saying 'this document uses "foo"' isn't enough. You need to know what
foo means. What does #IMPLICIT:foo buy you, exactly? Besides causing my
YAML-pretty-print to choke on the above, because it doesn't know what 'foo'
is (_not_ a good reason)?

> | So. Please show that this new syntax is absolutely 
> | necessary. By this I mean:
> | 
> | - Please describe a scenario (program A does this, tool B 
> | does that, 
> | program C reads the result, and so on) where the DWIM 
> | proposal leads 
> | to trouble and the #IMPLICIT proposal does not.
> 
> The #IMPLICIT proposal puts common type information into a 
> YAML file in a way that is independent of application 
> semantics.  Thus, I could pop the file into a YAML Query, 
> give it a query and it'd know how to 
> load the file and operate on it.  

Oh really? I could just pop the above document to your YQUERY tool and it
would know how to compare 'foo's?

> With DWIM, the user would have to "register" or some how tell the 
> toolset which nodes have which types; this will involve N 
> mechanisms and actually will probably lead to a schema to be 
> really useful

Come on... We both know the YQUERY tool would have to be given executable
code that handles 'foo' types first. Regardless of #IMPLICIT or anything
else.

> (which is ok.  I think a schema should be able to do this 
> without the #IMPLICIT thingy.  The problem is we don't have 
> schema yet and probably won't for another 6 to 18 months.

So... You put a part of the schema in the document header line instead, "for
the mean while"? That smells of a hack.

> Further, the default #IMPLICIT could be our current set of 
> implicits... requiring #IMPLICIT:OFF to get to the new 
> behavior where everything is a string.

Complicated. _Needlessly_ complicated. As your example above shows, each
Y<whatever> tool needs to have some executable code to handle implicits
(_and_ explicits), no matter what. Having #IMPLICIT doesn't change this by
one iota. What is the gain?

> | - Alternatively, please show how this proposal limits the power of 
> | graph-level YAML tools...
>
> Right, and if the loader is responsible, then generic tools 
> written against the SERIAL model cannot take advantage of 
> type information.  IMHO, this kinda sucks.

Pray tell me. How can a generic tool take advantage of type information
unless it is augmented by executable code to handle the specific type?

And if it does have such code, what is the big deal of it including
regexp-based detection code?

> Also, I'm not sure what impact the typed vs not-typed flag 
> will have in the graph model.

Simple. A change of a few words in the spec :-) As for implementation...

> I'm ok with it in the syntax 
> model. Handling NULL cases complicates things, and I'm not sure that
> the benefits outweigh the consequences.

I still don't see a single negative consequence.

> For example, YPATH
> would have to add a COALESCE (IF_NULL) function to allow 
> types to be comparied.  I'm not saying that its bad, just 
> that I don't konw the consequences/impacts and haven't had 
> time to study them.

I'm not certain what you mean by "COALESECE". At any rate, sure, let's think
the consequences through.

> | - YAML documents are very readable by humans.
> | 
> | The DWIM proposal is more readable than the #IMPLICIT one (no 
> | #IMPLICITs, no
> | () around integers, dates, Booleans, etc.).
> 
> Well, the () and #IMPLICIT are quite complementary, you 
> probably don't need both in a single document.

Well, you'd need one of them, and both are ugly :-)

> | - YAML interacts well with scripting languages.
> | 
> | I think the #IMPLICIT proposal requires, or at least 
> | encourages, that 
> | YAML-specific types be used to represent implicit types 
> | while the DWIM 
> | proposal allows most cases to be represented as normal 
> | strings, which 
> | is the natural strategy of most scripting languages (Perl/TCL/Korn 
> | Shell/JavaScript/etc.). I could be wrong here.
> 
> How am I going to save/restore an integer without jumping 
> through a ton of hoops?

Like this:
  an integer: 12

The point I keep hammering on and which is somehow lost is that even if this
was written Like this:
  an integer: !int 12

The loader would _still_ have to have int-specific executable code to handle
integers. No way around it. And in the implicit case, somewhere in the
system there's a regexp saying that \d+ is an integer - again, no way around
it. The only question is where to put this code. Well, IMVHO, one large pile
of dung is better than two small piles of dung :-)

> -- learning how and registering the 
> implicits with both the Perl and Python parser... each 
> perhaps with slightly different ways to do it.

If you think that doing this in a C library that is callable from both Perl
and Python is the right way to go - be my guest. Personally I think that
relying on a single implementation of libyaml as a way of achieving
consistency between all languages isn't the right way to go.

> Further, if 
> my app is distribued, I'll have to somehow explain to people 
> using my data files that X is an integer, but Y isn't a 
> such-and such. I think #IMPLICIT and () do this very nicely 
> without having to have a third document.

How does #IMPLICIT explain anything? My above document uses 'foo'. Don't I
need to explain to you what it means?

> Overall I think the DWIM mechanism to be really useable in 
> anything other than a quick-one-off will have to migrate into 
> a full-blown schema.  I support this; but we still need 
> something which is short-term.

I think that whoever is interested in a validation-schema has his work cut
out for him regardless of #IMPLICIT. And anyone who doesn't only suffers
from #IMPLICIT. As for typing-schema and comparison-schema, these are
inevitable with or without #IMPLICIT. In short I don't see where #IMPLICIT
buys you anything. Consider my example of #IMPLICIT:foo above. Is this
document valid? You have no idea of knowing because you have no idea what
'foo' means.

> | - YAML uses host languages' native data structures.
> | 
> | I don't see a difference between the two proposals here.
> 
> Well, in the DWIM case, it goes from regex->native;

And transfer method if available...

> but in
> the other case, it first goes through a type-uri.

Huh? Why? Transfer method is optional on output as well as input. It is
perfectly valid to say that node X has no transfer method (i.e., has
implicit transfer) when dumping it.

> Thus, I 
> think that the intermediate type-uri is very valueable step 
> in that it provides a handle for us to talk about similar 
> types across language boundaries.

No disagreement here. I think it is good practice to assign a type-uri to
all types, including implicit types; not only for the above reason, but also
to be able to force the loader to interpret a given value in a given way.
That does not imply that transfer method is mandatory, or that you need
#IMPLICIT in order to support it.

> With DWIM, you don't need 
> type-uri cuz it's all registered directly.  This may seem 
> simpler, but it hinders the ability to formalize abstract 
> types and provide for consistent bindings across language boundaries.

OK, I'll go as far as _requiring_ anyone who uses an implicit type to also
assign it a type-uri. Like I said, it is good practice anyway.

> | - YAML enables stream-based processing.
> | 
> | Same for both proposals.
> 
> Not really.  In the DWIM proposal, I can't have the 
> string+type-uri combination to process with.  And as such, a 
> mode of processing (which would primarly be stream-based) 
> would be curtailed.

Sigh. In DWIM you have the string+regexp combination to work with. Anything
you can do with one you can do with the other, which is a big fat zero
unless you have type-specific executable code loaded into your generic tool
to handle the specific type/regexp.

> DWIM       Good for custom types which are application specific
> #IMPLICIT  Good for standard types with well-known language bindings

I don't see it is better in that respect.

> Schema     Good for custom types (DWIM-ish) which are problem
> domain specific

I presume you are referring to a validation schema; it is good regardless of
implicit and explicit types (i.e., using #IMPLICIT). I see no difference
between requiring a node have some transfer method, requiring its value
conform to a regexp, or limiting its value in any other way. All are
validation constraints.

> ()         Good for simple one-offs using common well-known types, 
>            like (true) or (2002-01-01) without requiring 
> #IMPLICIT header
>            or DWIM registration.

Ugh.

> In our conference we were OK with Steve's registration system, this
> is similar to DWIM I think... right?

I have no idea. I don't see that DWIM _requires_ a central registration
system. Each document has its semantics. If one chooses whether he wants to
be "public" and only use the implicits we'll register in yaml.org - fine. If
one wants to load all strings of the format [A-Z][0-9][0-9]-[A-Z] into his
private "Product code" data type, that's also OK. The rest of the world will
think it is a string so if he sends it to my YAML pretty printer, it won't
choke on "unknown implicit !product-code".

> Only that the DWIM happens at
> the loader level (with the quoted flag available)

Quotes imply !str. It is a syntax shorthand; in the serial model it should
behave "as if" a "!str" was given.

> leaving the 
> model to still have a mandatory type... steve uses String 
> rather than having a NULL type.

In that case it isn't the DWIM proposal. The DWIM proposal acknowledges that
there's a DWIM type family, separate from string.

> In short, by keeping the 
> type mandatory, DWIM and IMPLICT can co-exist (and is what we 
> agreed to last night).

I don't see how... In your way a parser should reject a document that has
undeclared implicits it does not know about. Right? I think that's a big
flaw (e.g., YAML-pretty-print).

> Speaking of #IMPLICIT.  We could call this #SCHEMA,

A whole different ball game. I'm willing to consider #SCHEMA, if and when we
seriously discuss one. This is too big an issue to pick a small part of it
and force it into the 1.0 spec. #IMPLICIT does just a small part of what a
#SCHEMA would do. Either we do this right or not at all - I think that at
this point in time, "not at all" is the right call.

Have fun,

	Oren Ben-Kiki