From: Oren Ben-K. <or...@ri...> - 2002-08-05 15:50:47
|
Ned Konz [mailto:ne...@bi...] wrote: > > I'm for option #2, adding "/" and "." to denote a string allowing > > paths to be unquoted within YAML. > > You mean "Unix paths", don't you? I thought YAML was supposed to be > cross-platform <g>... So it is... > After all, there's also Mac (HFS) and Windows/DOS paths that don't > look like that. Well, absolute Windows/DOS paths start with a drive letter, so they'll be string by default anyway. Paths starting with a '\' are an issue I suppose - to be fully cross-platform, all of '/' '\' and '.' should denote a string. > Is it really so troublesome to type the quotes? I don't know. Configuration files are a major use case after all... OK, step back a bit. All non-ASCII Unicode characters are assumed to be "alpha" and denote a string (except for controls characters, line breaks etc.). That leaves us the 128 ASCII characters, out of which several are control characters (invalid), space (separator), letters, '_' and digits (start a string), and the following: ! " # $ & ' ( ) * + , - . / : < = > ? @ [ \ ] ^ ` { | } ~ Out of these, YAML uses the following as indicators so they can't start a string: ! " & ' * > [ ] { } | (Note that ':' '#' ',' *can* start a string if not followed by a space). Also, ` was defined to mark "private implicit types", so that's out too. Likewise (word) is used for enum-like implicit types, and ~ is used for null, leaving us: # $ + , - . / : < = ? @ \ ^ Now, the question is: which of the above should be used to denote a string and which should be reserved for potential future global implicit types? The current answer is "all are reserved for potential future implicit types". However, if we ask the reverse question: which of the above is commonly used in string values in use cases we are aware of today? The answer to this seems to be: / . \ - The . \ start paths; / starts paths in UNIX and command line options in DOS; - and + start command line options in UNIX. Assuming we use these to denote strings, it leaves us only 8 controversial characters: # $ , : = ? @ ^ Now, can anyone think of a common use case where any of the above would be the first character of a string value? Off the top of my head, I can't think of any. Alternatively, can anyone thing of potential implicit types that may make use of these characters? Well, '$' seems a natural for currency. We could prefix a '#' to make binary an implicit type. '@' is a natural for all sort of addresses (IP/domain/E-mail, maybe also URLs). I don't know. But, given my crystal ball is broken, I'd rather play it safe and reserve all these 8 characters for possible future implicit types, unless someone points out a common use case requiring them. Thoughts? Have Fun, Oren Ben-Kiki |