On Friday 06 August 2004 13:45, Tim Parkin wrote:
> ... eg if an Italian techy accountant wrote a yaml file
> it may have the following data.
>
> mynum1: 1,105
> mynum2: 5.500
>
> If an american accountant then used a default yaml parser to pick this
> up he would get one thousand one hundred and five and five point five.
Of course, probably so would the Italian YAML parser :-)
> The Italian actually meant one point one zero five and five thousand,
> five hundred. The italian would be condfused because the default YAML
> parser wouldn't parse his document correctly.
Right.
> What that means is YAML is assuming a locale already which is what you
> are saying is a bad thing.
Yes - I guess you can way that the *yaml.org* types are based on the "C"
locale or something close to it. The "C" locale is, of course, based on
western conventions...
> I don't think I like the idea of YAML assuming that everyone uses
> american western number and date formats.
YAML, as such, does no such thing!
> I would much prefer that the
> default behaviour of YAML should be strings only.
This *IS* the default behavior. Numbers - *any* form of numbers - *including*
the western-specific numbers format given in the type repository - are
*add-ons* depending on the specific schema used.
The YAML type repository is just one possible way to implement just one
specific set of types. If is *not* the "default" as far as the YAML spec is
concerned.
As a matter if maximal interoperability between different applications, people
are _encouraged_ to use the specific (western based) number presentation
format given in the yaml.org types (as well as ISO based date formats and so
on). That's as far as it goes. The fact that most people (especially YAML
implementors) decided to make it the "default" for their system is in this
spirit: increasing the chances that a YAML file printed by a Java program
will be readable by a Python one, etc.
> Then if you want to
> add a schema that sets an american western implicit typing strategy then
> it should be obvious from the schema name that this is an american
> western schema.
That's a schema naming convention. You can do that:
> This should then enable the italian guy to write
>
> %YAML:1.1/italian
> mynum1: 1,105
> mynum2: 5.500
Actually, he'd probably write something like that:
!my.com,2004/schema#it
mynum: 1,105
...
And the american would write:
!my.com,2004/schema#en
mynum: 1.105
...
- '!' instead of '%' because '%' is for directives and '!' is for types
(schemas). What we are talking about here is giving a schema.
- '#' instead of '/' because YAML suggests that, by convention, '#<fragment>'
be used for giving a schema "variant" and different locales are "variants" of
the same schema.
- "it" instead of "italian" because there's a standard two-letter
locale/language name and it is best to stick with that.
At any rate, YAML certainly does not prevent anyone from creating schemas like
the above. I suppose we could consider doing something along similar lines
for the yaml.org types... but this would require serious thought first. For
one thing, I'm unhappy with the thought that every YAML processor that wants
to support the yaml.org types also needs to have a library of locales just in
order to parse integers - Ugh! And the potential for mixups when cutting &
pasting YAML fragments is immense.
For dates, we have decided that the yaml.org format will be the simple,
unambiguous ISO format. For numbers, things aren't that easy... For booleans
it gets even worse. We initially wanted to allow '+' and '-' to present the
boolean values 'true' and 'false'. Alas, we can't use '-' as it is also used
to denote sequence entries, and we can't use '!' because it is used to denote
types. We finally settles on using 'y' and 'n', which is an English-ism, but
we really have no better way at this point.
(Anyone for using '=' for sequence entries instead of '-', thereby allowing
'-' to stand for 'false'? That's a major change in almost every YAML document
out there...)
Finally, as a Hebrew speaker, I'd like to point out that the number format is
the least of your worries when it comes to localizing your files :-)
Left-to-right and right-to-left issues are much worse. In my experience, it
is much easier to "localise" applications as long as you only consider the
European family of languages. When you hit middle-eastern and far-eastern
languages, things take on whole new dimensions.
Therefore, IMVHO, YAML shouldn't even try to be localized. After all, YAML
isn't a documentation format - it is a data serialization format! That said,
people are free to localize it to the extent they want to (using schema
variants as above), if they feel they have to.
Have fun,
Oren Ben-Kiki
|