On Fri, Sep 4, 2009 at 5:53 AM, Oren Ben-Kiki <oren@ben-kiki.org> wrote:

As for %DEFAULT...SCALAR directives (suggested by BlueGM): Adding a new
standard tag is easy and has no effect on the spec. We just add it to
the tag repository and we are done. Adding a directive, on the other
hand, is a *huge* deal, and brings us to YAML 1.3 territory. So this is
not on the table for a "long while" at this point.

In addition I think this directive a serious overkill. It is a blunt
instrument and I can see many problems with it. A much better approach
would be to work on a schema language for YAML that would allow one to
specify how tags are associated with nodes at a much more controlled and
fine grained manner.

Yes, the %DEFAULTSCALAR directives were an idea for how to make the document prettier in the future. The immediate problem requires a new data type. The idea for the directive was in response to a comment that the tags would clutter the document, something I've encountered as well.

I also agree with you that it is a "blunt instrument" that would, in fact, only solve some problems. Your idea of using a schema language is much better, if more involved. In the meantime, applications can usually, if the processor (usually from a library) supports it, assume a schema when tags are not explicit. The problem with that, of course, is that a generic YAML application, such as an editor, would then not have all that information available and would have to treat the scalars as strings.

Is there currently a movement under way to define a schema language?

As far as '%20' goes: I strongly believe that '%20' should be a space
and not '%' '2' '0' inside such !!utf-u. The only argument against it
was mangling of URLs, and it fails on two accounts.

First, there's simply no reason to ever use this tag for encoding URLs.
All the URLs (actually all the URIs) in the world can be easily
processed as normal YAML strings. They have their own (%nn) built-in
escape mechanism. { path: "file://foo%ff" } _already_ works, without
having to annotate it with !!utf-u. So why would you ever want to?

Second, even if you decided to pass a URL inside a !!utf-u tag (for some
strange reason), the fact that %20 would be preserved but %80 would be
mangled to %2580 is _extremely_ confusing. Requiring % to always be
escaped as %25 is a no-brainer; it is simple, consistent and follows the
rule of least surprise.

Actually, having a URL inside one of these scalars would not be that strange. Say for instance that we had a YAML document that represented an e-mail message and the url was part of the body of that e-mail message (user A sends a message to user B saying, "hey, you have to check out this site: http:\\www.my%20cool%20site.com" for example). The document may very well be using the new data type for its body because e-mail is expected to have different encodings, but (in English speaking countries, at least) will almost always contain mostly ASCII characters.

Still, I'm also in favor of having the % sign always signal an escape sequence. My own reason, stated more clearly than what I said before, is so that control characters could be escaped when not using a double quoted scalar (before I only said bytes below 0x80). I backed off because of the URL's though. I'm not sure which is more valuable in general.