Ingy dot Net wrote:
On 26/04/07 18:53 -0400, Kenneth Downs wrote:
Ingy dot Net wrote:
On 17/04/07 17:40 -0400, Clark C. Evans wrote:

short answer:
 You should be able quote those Y/N items to force them 
 to be a string.

long answer:
 You might be able to configure your parser to not "implicitly
 type" Y/N as a boolean value.  

longest answer: >
 You've hit the #1 usability prooblem with YAML, it's called "implicit
 type resolution" and different implementations are doing it
 The original goal was to make it easy to type in integers and have
 them show up as integers w/o littering your text with "!!int".
 Unfortuntely, where that line should be drawn is a bit hard.

 In the next pass of YAML, I am going to recommend that all parsers
 _only_ do implicit typing on:
   (a) symbolic values, such as <<, which can be used to 
       augment the YAML syntax /w very nice hooks
   (b) numbers, "true" "false" and "null", following
       the JSON standard (for compatibility)

 At least, this should, IMHO, be the default.  I think Ingy begs
 to differ and believes the default should be *all strings* with
 no implicit typing.  What ever we come up with, getting there 
 is sure to be unpleasent; but probably far less unpleasant than
 the current state of affairs.
I disagree but in specifics, not in spirit. The "Parser" should not do 
of any form. It reports for each scalar, a char-string value, and whether 
scalar was plain or not.

A yaml "Load" operation consists of at least 3 steps, "parse", "compose",
"construct". According to the spec, introduction of node "tag" (aka "type")
happens in the composer.


Your point is that we got too cute with the default implicit types. String,
Integer and Number are fine as a default.

I would reccomend that all implementations support a everything is string 
I would ask, what is simplest and most consistent.  And also, how did 
the conversation start?

The conversation started because these two elements (is that the right 
word?) give different results:

 prop_1st: value    # yields the string "value"
 prop_2nd: Y         # yields a numeric 1!  Newbie says huh??

What options are available?

1) Tweak default behavior.  Pro: Might satisfy this case.  Con: Might 
break older files, or require them to be version-stamped.  Con: Will 
just be an invitation to more tweaking and nobody will ever be happy.  
Con: will create a list of incompatible versions with incomprehensible 
variations (this is the end-result of taking this road).  Think HTML 3, 
html 3 for ie, html 4, html 4 for ie, html 4 for mozilla pre 6, mozilla 
6, etc etc etc.

2) Support for header directives for the possible options.  Possible 

none: follow behavior from before directives became available, so my Y 
above becomes a 1

"booltrue: Y, 1, Yes, YES", some kind of explicit list of values that 
will be treated as boolean true.  If my Y is not listed it wont be 
treated as a boolean.

boolfalse: same as booltrue, list of false values

date: xx-xx-xxxx,  anything that fits the picture is treated as a date.

numdigits: Treat any string composed only of numerals as a number

...others as they come to mind.  Those are off the top of my head, a 
real effort would have to be made to seek the list of directives that 
served all purposes without overlap or missing possibilities.

3) Declaring types for named properties. In the above example I would 
declare in the header that "prop_2nd" is a string.

4) Type-casting at the definition, which I believe is supported now with 

If Ken were calling the shots, option 1 would be thrown out, option 4 is 
already supported, so supporting options 2 and 3 would produce the 
general solution, and then it becomes a matter of programmer preference 
and then you wait for best practices to emerge through community use of 
the various approaches.

It's a little more involved than picking a typing solution. 

Actually my problem is a typing problem, and solving my own problem is exactly as involved as picking a typing solution.  The only situations it touches for other parties are typing situations.

YAML is intended
to be used in both closed systems and open. In situations with a single
producer and consumer, and in those with many. In single proramming languages
and multi. etc.

And this is relevant how?  A general solution that allows a file to be self-describing solves all cases.  Different languages can make use of the typing directives as needed/able.  The typing method used by YAML cannot change the fundamental typing abilities of any given language, so the best way to handle lots of languages is to have the most flexible way to describe the data.

Actually, to round out the general solution, the processor itself might accept run-time parameters that override the directives inside of the file.

In small closed systems the YAML tool in question should be assumed to do the
right thing.

Except when it doesn't, and you end up squeezing the balloon and always watching it pop out somewhere else.  The problem is that one person's Right Thing is another person's Wrong Thing.  You can never assume except in trivial cases that code which is making assumptions will always make the right assumptions.

 If it doesn't this can easily be fixed by local code.

Yikes!  Pushing the problem to code!  A very strange approach in a data-serialization project.  I would expect more focus on the possibilities of data-driven configurations.

There is also the times when a document should be considered appropriate for
the masses, and data typing must be perfect and enforced. So far we really
only have tags for this. But we have talked here for years about defining a
"Schema" language for yaml. And documents could be given a schema, perhaps in
a header directive...

So we need to do that. But that will take effort...

But the real question for to answer and make clear for now, is when to apply
implicit typing.

Given two parties using it, you'll get two opinions.  Three parties, three opinions.  Good luck.

Kenneth Downs
Secure Data Software, Inc.
631-379-7200   Fax: 631-689-0527