On Fri, 2009-09-04 at 17:20 -0400, Brad R wrote:
> Yes, the %DEFAULTSCALAR directives were an idea for how to make the
> document prettier in the future. The immediate problem requires a new
> data type. The idea for the directive was in response to a comment
> that the tags would clutter the document, something I've encountered
> as well.
This is a real problem but there's no helping it at this point in time.
That said, I agree that for some specific YAML files there would be a
lot of !!utf8u tags, but if you only use them when actually needed,
across the universe of YAML files as a whole, I suspect this would
impact only a very small number of files.
> I also agree with you that it is a "blunt instrument" that would, in
> fact, only solve some problems. Your idea of using a schema language
> is much better, if more involved. In the meantime, applications can
> usually, if the processor (usually from a library) supports it, assume
> a schema when tags are not explicit. The problem with that, of course,
> is that a generic YAML application, such as an editor, would then not
> have all that information available and would have to treat the
> scalars as strings.
Well, you could say there are three types of YAML files in the world.
1. Those edited by notepad are less of a problem; the user, by
definition, needs to be aware of the "schema" (even if it is never
2. Those created by a specialized program, using the equivalent of
"printf" statements. Such programs embody the schema (as executable code
if nothing else) so again, there no real problem.
3. Those created by a program calling "Yaml.Dump(something)". Here
things get tricky. Realistically, there should be some setup code where
the application informs the YAML library that some strings (or all
strings) should pass through some filter (which possibly looks at the
string content) to decide whether to emit them as !!utf8u instead of !!
str. This requires some generic library API. If this turns out to be a
common use case (which I personally doubt :-), the library API can
evolve to make this specific operation as easy as you can want.
BTW, this problem is not unique to !!utf8u. You face it with formatting
of numbers and dates, choice of scalar types, whether or not to sort
mapping keys, and other related issues. Breaking it to the above three
cases helps zeroing in on the real issue, which is that YAML was
intended to be a human-editable format, and _automatically_ generating a
"pretty" YAML file is a non trivial operation.
I'd love to see a powerful "YAML beautifier". Being anal about the data
model helps such a tool a *lot*. But writing such a tool still remains
quite a challenge.
> Is there currently a movement under way to define a schema language?
We wish :-( I'm trying to find the time to fix the errata in the current
spec and bring YamlReference up to par. Xitology still needs to validate
libyaml and we need to somehow get rid of syck (say, by turning libyaml
into a drop-in replacement by using wrapper code). Defining a schema
language is _very_ hairy, although we have some ideas on how to proceed
> ... there's simply no reason to ever use this tag for encoding
> Actually, having a URL inside one of these scalars would not be that
> strange. Say for instance that we had a YAML document that represented
> an e-mail message...
Ok, I take it back. I'd say there's _hardly_ ever a reason to use this
tag for encoding URLs. One should never make absolute statements! :-)
> Still, I'm also in favor of having the % sign always signal an escape
Ok then :-)