From: Clark C . E. <cc...@cl...> - 2002-08-09 14:44:05
|
Ok. To summarize, the ability to use paths "unquoted" is desired for two farily large use cases: configuration files and ypath oriented stuff like query, transform, or schemas. I'm very strongly behind this feature request since I think it aligns nicely with our first goal of making YAML very readable (sometimes at a cost of complexity for the implementer). I'm not quoting people's exact wording, but I tried to represent all of the issues which have emerged. There are two options for allowing paths unquoted: (a) make "path" a new type, starting implicitly by '/', '.', or '\' ; this type would have to be quite flexible as it could be a unix path or a ypath expression... (b) add path to the regex for strings I'll go with either approach, although I think string is a much better choice. There have been several objections levied against this proposal (b), among them: - issue: complicated type regognition system response: > The most complicated part of the type system is that things starting with numbers but don't look like other types are strings. I was moderately opposed to this, but let it pass since a few people didn't want to have to quote IP addresses. Street addresses was also a "use case", but most street addresses are multi-line thingys and bettter handled by block form. Anyway, I let it pass to make YAML more "readable" even though it significantly complicates implementations and may even cause compatibility problems. 2000-01 is a string while 2002-01-01 is a date; 34.0 is a float while 34.0.0 is a string. Not exactly "intutive" and I think anyone who lobbied so strong for this feature has no ground to stand on with this argument of "complicated". - issue: no precidence response: > It's been argued that \w, or [A-Za-z_0-9] is a very common word starter in most languages. C, Python, etc. have been named. Certainly, however, any words beginning with 0-9 are numeric and if they don't match one of the numeric regular expressions they are an error. Thus, IMHO, there is no presidence for non-numeric values starting with a digit. Once again, those who lobbied for the layered regular expression rule have no ground to stand on here. We're already breaking predicence and the layered regex rule isn't exactly obvious and doesn't have presidence anyway; thus arguing that we are going from something that has presicence to something that doesn't isn't all that right. - issue: paths are too complicated, can start with $ and other chars. response: > 99.9% of all unix paths start with \ or . Most DOS paths begin with either . / or a drive letter. So, I think that /.\ covers most cases, good enough! - issue: backwards compatibility response: > Currently, the only time / is allowed is for our special comment key; and this is thus far only a lightly used feature. Therefore, I think that being able to leave paths unquoted trumps // special comment keys. On this topic, see the related idea of using (#comment) for comment keys so that it leverages our special "parenthesized" form mechanism. I think that this is probably the best answer as to what do to with comment keys if this is adopted - issue: changing the spec too much or on a whim response: > Yes, I understand the pain, as I have alot of YAML data myself. However, if the impact is minimal, I don't see why not make a change such as this. Overall, the spec is now going through a serious "early adopter" phase where people using the system get a chance to comment. It's quite interesting that one of the very very first comments we get is that paths need to be quoted is unnecessary. - issue: why not go further? response: > Well, I'd like to keep other characters reserved till there is a demonstrated need. This gives us the ability to alter the system in a backwards compatible way in the future. Un-reserving all of the other characters prevents us from making moves in the future. As for why is the / . \ special, this much should be clear, they have common and non-trivial use cases. I hope this addresses the proposal. In summary, I'd like to add /.\ to the string regular expression; for the comment special key I suggest we use (#comment) and for consistency, we can use (=) for the migration special key, but alas this is a somewhat-separable issue. Best, Clark |
From: Brian I. <in...@tt...> - 2002-08-09 19:29:42
|
On 09/08/02 10:49 -0400, Clark C . Evans wrote: > Ok. To summarize, the ability to use paths "unquoted" is desired for > two farily large use cases: configuration files and ypath oriented stuff > like query, transform, or schemas. I'm very strongly behind this feature > request since I think it aligns nicely with our first goal of making > YAML very readable (sometimes at a cost of complexity for the implementer). > > I'm not quoting people's exact wording, but I tried to > represent all of the issues which have emerged. > > There are two options for allowing paths unquoted: > > (a) make "path" a new type, starting implicitly by > '/', '.', or '\' ; this type would have to be > quite flexible as it could be a unix path or a > ypath expression... > > (b) add path to the regex for strings > > I'll go with either approach, although I think string is > a much better choice. There have been several objections > levied against this proposal (b), among them: > > - > issue: complicated type regognition system > response: > > The most complicated part of the type system is > that things starting with numbers but don't look like > other types are strings. I was moderately opposed to > this, but let it pass since a few people didn't want to > have to quote IP addresses. Street addresses was also a > "use case", but most street addresses are multi-line > thingys and bettter handled by block form. > > Anyway, I let it pass to make YAML more "readable" even > though it significantly complicates implementations and > may even cause compatibility problems. 2000-01 is a string > while 2002-01-01 is a date; 34.0 is a float while 34.0.0 > is a string. Not exactly "intutive" and I think anyone > who lobbied so strong for this feature has no ground to > stand on with this argument of "complicated". > - > issue: no precidence > response: > > It's been argued that \w, or [A-Za-z_0-9] is a very common > word starter in most languages. C, Python, etc. have been > named. > > Certainly, however, any words beginning with 0-9 are numeric > and if they don't match one of the numeric regular expressions > they are an error. Thus, IMHO, there is no presidence for > non-numeric values starting with a digit. Once again, those > who lobbied for the layered regular expression rule have no > ground to stand on here. We're already breaking predicence > and the layered regex rule isn't exactly obvious and doesn't > have presidence anyway; thus arguing that we are going from > something that has presicence to something that doesn't isn't > all that right. > - > issue: paths are too complicated, can start with $ and other chars. > response: > > 99.9% of all unix paths start with \ or . Most DOS paths begin > with either . / or a drive letter. So, I think that /.\ > covers most cases, good enough! > - > issue: backwards compatibility > response: > > Currently, the only time / is allowed is for our special comment > key; and this is thus far only a lightly used feature. Therefore, > I think that being able to leave paths unquoted trumps // special > comment keys. > > On this topic, see the related idea of using (#comment) for comment > keys so that it leverages our special "parenthesized" form mechanism. > I think that this is probably the best answer as to what do to with > comment keys if this is adopted > - > issue: changing the spec too much or on a whim > response: > > Yes, I understand the pain, as I have alot of YAML data myself. > However, if the impact is minimal, I don't see why not make a change > such as this. Overall, the spec is now going through a serious > "early adopter" phase where people using the system get a chance to > comment. It's quite interesting that one of the very very first > comments we get is that paths need to be quoted is unnecessary. > - > issue: why not go further? > response: > > Well, I'd like to keep other characters reserved till there is > a demonstrated need. This gives us the ability to alter the system > in a backwards compatible way in the future. Un-reserving all of > the other characters prevents us from making moves in the future. > > As for why is the / . \ special, this much should be clear, they > have common and non-trivial use cases. > > I hope this addresses the proposal. In summary, I'd like to add /.\ > to the string regular expression; for the comment special key I suggest > we use (#comment) and for consistency, we can use (=) for the migration > special key, but alas this is a somewhat-separable issue. I think it adequately addresses *your* thoughts. Perhaps you could have just posted our thoughts rather than a full blown editorial. Anyway, here is my current position. And, as usual, I'll keep it short. I'm not willing to extend the string regex for this one minor use case du jour. We spent over a month hammering out the current rules, and they are good. I *am* willing to open up the string regex to more characters, but let's just go for the whole hog right now. How about we just reserve '(' for future implicits, and allow all other non ambigous characters to start a string. Most notably, I can see '$' as becoming a use case for string. And I'm not talking about a yaml/currency type. Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-08-10 01:09:31
|
On Fri, Aug 09, 2002 at 12:29:35PM -0700, Brian Ingerson wrote: | I'm not willing to extend the string regex for this one minor use case du | jour. We spent over a month hammering out the current rules, and they are | good. And the current rules lack a very common use case... paths. Thus far we've had a nice feedback loop between use and specification -- in each change the severity of the change has decreased... | I *am* willing to open up the string regex to more characters, but let's just | go for the whole hog right now. How about we just reserve '(' for future | implicits, and allow all other non ambigous characters to start a string. So, this would expand the string regex to allow starting with... `@%$^/\.; I think its overkill, but I'm game as long as currency is added. | Most notably, I can see '$' as becoming a use case for string. And I'm not | talking about a yaml/currency type. I think that path is a completely distinct issue from currency. Currency is very important for serious adoption in the business world. Having !currency everwhere would suck. And currency is a fixed point number, not a floating point; thus the current !float type fails to model the actual type behavior. The regex would be \$\w+\s\d+(,d+)*(.\d+)? anything else would be a string, just like floating point numbers. Examples: - $JPY 3.45 - $USD 345,394.00 - $EUR 34303 - !currency $USD 323.39 Where the abbreviations come from ISO 4217. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-10 01:26:27
|
On Fri, Aug 09, 2002 at 09:15:11PM -0400, Clark C . Evans wrote: | | I *am* willing to open up the string regex to more characters, but let's | | just go for the whole hog right now. How about we just reserve '(' | | for future implicits, and allow all other non ambigous characters | | to start a string. | |So, this would expand the string regex to allow starting with... | | `@%$^/\.; | | I think its overkill, but I'm game as long as currency is added. Sorry to bring in the currency issue; it seems that Brian is trying to eliminate the possibility of a non-parenthesized currency implicit. I'd like to keep this issue separate as we arn't adding a new implict -- I'm not proposing a "path" implicit type here. We could thus expand the regex to `@%^\./; there are a few others -+~= that we could also allow; but given current implicits, it may be best to leave these out. ... | | Most notably, I can see '$' as becoming a use case for string. And I'm not | | talking about a yaml/currency type. While on the topic of currency implicit; there is another option not using the dollar sign. That is, it would be similar to float, only requiring a single word after. This would, actually, be the most ideal. Examples: - 3.45 JPY - 245,394.00 USD - 34303 EUR Once again, this isn't and can't be modeled by a float plus a currency label since currency should be fixed point. This would be the cleanest yet. Clark |
From: Neil W. <neilw@ActiveState.com> - 2002-08-10 02:13:22
|
Clark C . Evans [09/08/02 21:32 -0400]: > While on the topic of currency implicit; there is another option > not using the dollar sign. That is, it would be similar to float, > only requiring a single word after. This would, actually, be the > most ideal. Examples: > > - 3.45 JPY > - 245,394.00 USD > - 34303 EUR - 17000 MPH # circumferencial speed of orbiting satellites - 250 KPH # my top speed - 2000 sheep # per capita in new zealand > Once again, this isn't and can't be modeled by a float plus > a currency label since currency should be fixed point. This > would be the cleanest yet. I think it sucks. Either the parser knows all the world's currencies, or the above are loaded as currencies in nonexistent countries. The '$' is better than the above. But, if you mean to encompass all the world's currencies, '$' is very North American. It may be better to just bite the bullet and say whatcha mean: - !currency|euro 17000 Later, Neil |
From: Clark C . E. <cc...@cl...> - 2002-08-10 15:22:46
|
On Fri, Aug 09, 2002 at 07:15:17PM -0700, Neil Watkiss wrote: | > - 3.45 JPY | > - 245,394.00 USD | > - 34303 EUR | | - 17000 MPH # circumferencial speed of orbiting satellites | - 250 KPH # my top speed | - 2000 sheep # per capita in new zealand Yes, I thought of these. I wasn't really considering that the idea above had wings. | The '$' is better than the above. But, if you mean to encompass all the | world's currencies, '$' is very North American. Yes; $ is americentric, although it is the only ASCII marker for denomination. And if it was followed by the ISO code is is quite clear, $JPY, for example. | It may be better to just bite the bullet and say whatcha mean: | | - !currency|euro 17000 Quite ugly, esp for a ballance sheet filled with numbers, etc. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-10 15:28:04
|
This is getting off topic. I'm not proposing currency, useful as it may be for me and my associates. How about a compromise: 1. We only add '/' to the string regular expression. 98% of all paths on unix are absolute and thus the above will work. For DOS, absolute paths start with a drive letter and thus are already strings. 2. We agree in advance that if any more characters are added to the string regular expression, we will throw in all of the characters which are not indicators; including $. This should alleviate any concern about one-offs. Does this work? Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-10 23:41:08
|
Brian, I'd like to just add '/' to the string regular expression and keep all other special characters reserved for two reasons: - It gives us flexibility to change stuff in YAML: 2.0 (no more changes like this for 1.0) - Having other "special" characters may confuse people as to what is an indicator and what isn't. Given that # ! * & all mean stuff, it wouldn't be too hard to assume that someone may puzzle if $ % ^ @ mean something special as well. We can justify / and only / as a 95% usage rule, but I'd like to keep all the others out. As for "currency", after some soul searching there are three aspects... and none of them should be "implicit" - Support for fixed point mathematics; in general you don't want to use floating point for currency manipulations but instead want to use integers with very careful watch for rounding/truncation to make sure that you are compliant with any laws that apply. This is out of YAML's scope, but may be in the scope of a !fixed type which is not part of the core specification. - Support for units. I was actually musing that if we hadn't allowed for things that start with a number but arn't a timestamp, floating, or integer to be a "unit", aka "45 inch" Or "3 foot 2 inch". Anyway, its too late now, but it was an interesting muse. - Support for currency. This is a mix of fixed point plus units. Since neither of the above are supported in most languages, it just doesn't make sence to try to bring this into the core of YAML. So, I'm making the above proposal without any hint, suggestion, or underlying motive that $ be reserved for currency later. I'd just like to exclude all of the other characters for now since I think that they would be confusing and since I don't see a use case for them. If you wish; we can add both '/' and '$' to the string regex, but I would still like to restrict the other potential indicators for now for the two reasons sited above. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-11 04:17:15
|
I had a talk with Brian for a short spell: - | He is ok with the new type family (!) styles: !!private !yaml-specific $domain,year/whatever $language/whatever - > I assume he wants to keep !/whatever reserved for now, pending a discussion of the #DOMAIN proposal. - > He is ok with just adding '/' to the string regular expression, keeping other characters reserved. This is primarly justified by the ypath use case and not by a unix path use case, although unix paths would be useable unquoted. - > Since this breaks the //comment special key, he suggested that perhaps the # could be used since it is not immediately followed by a space. This works for me: --- #: one comment special key #more: Another comment special key - > He's ok with keeping things reserved given the two reasons below (flexibility and simplicity). He's not in favor of adding any more implicit types. - > Brian brought up the topic of how URIs are handled, does a parser report the tag:uri or not. I answered no, it returns exactly what is in the YAML file as these strings themselves should be unique. One restriction is needed, so that yaml.org,2002 is not used for domain,year which is easy since we control yaml.org ;) This leaves Brian's big question: - Do we need special keys, and if so, how can we clarify the specification so that they are used properly. Overall: > I hope this accurately reflects things. I think it should be OK to move ahead with the spec changes, perhaps marking the comment special key as "subject to change" and changing it from // to # in the short-run. That said, I promised to review over the next week or so the special key mechanism, questioning if they are necessary and articulating the value they provide (or removing them). Best, Clark On Sat, Aug 10, 2002 at 07:46:49PM -0400, Clark C . Evans wrote: | | I'd like to just add '/' to the string regular expression | and keep all other special characters reserved for two reasons: | | - It gives us flexibility to change stuff in YAML: 2.0 | (no more changes like this for 1.0) | | - Having other "special" characters may confuse people | as to what is an indicator and what isn't. Given that | # ! * & all mean stuff, it wouldn't be too hard to assume | that someone may puzzle if $ % ^ @ mean something special | as well. We can justify / and only / as a 95% usage | rule, but I'd like to keep all the others out. | | As for "currency", after some soul searching there are three | aspects... and none of them should be "implicit" | | - Support for fixed point mathematics; in general you don't | want to use floating point for currency manipulations but | instead want to use integers with very careful watch for | rounding/truncation to make sure that you are compliant | with any laws that apply. This is out of YAML's scope, but | may be in the scope of a !fixed type which is not part | of the core specification. | | - Support for units. I was actually musing that if we hadn't | allowed for things that start with a number but arn't a | timestamp, floating, or integer to be a "unit", aka "45 inch" | Or "3 foot 2 inch". Anyway, its too late now, but it was | an interesting muse. | | - Support for currency. This is a mix of fixed point plus | units. Since neither of the above are supported in most | languages, it just doesn't make sence to try to bring this | into the core of YAML. | | So, I'm making the above proposal without any hint, suggestion, | or underlying motive that $ be reserved for currency later. I'd | just like to exclude all of the other characters for now since | I think that they would be confusing and since I don't see a use | case for them. | | If you wish; we can add both '/' and '$' to the string regex, but | I would still like to restrict the other potential indicators for | now for the two reasons sited above. | | Best, | | Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |