From: Oren Ben-K. <or...@ri...> - 2002-09-05 08:26:56
|
We've agreed to get rid of #TAB. That was easy. As for timestamps, we got dragged into the whole problem of implicit types, whether they can be extended, and so on. This is heavy stuff and debate was very lively :-) I raised a proposal that was tentatively accepted as "worth pursuing". The core of it is to accept that type family is optional (i.e., a node may have *no* type family, only a *kind*). It is the loader's duty to convert a generic node - which may or may not have an associated type family, and/or a format - into some native data structure (and the dumper's job to do the reverse). This is almost, bit not quite, completely unlike our current "implicit typing". I'll need to write this down "properly" - what are the exact effects on the various data models, processing, consequences for generic tools, the schema language, and so on (most of these issues were discussed a bit in the IRC session, but not all the way through). This will take me some time so the earliest I'll be able to post this would be Sunday. I've already started thinking about formalizing this, and here are some very preliminary notions: - Maybe the native model shouldn't be defined in terms of type family at all; instead maybe it should have the concept of a "native data type", a "native value", and a "kind". "Type family" and "Format" would only exist in the generic model, with the "Viewer" responsible for the mapping. (The Viewer is used by the Loader and Dumper - take a look at the data models diagrams). - In this view type family and format are merely instructions to the Viewer (OK, Loader) on how to map the generic node to a native one (i.e., it is a "transfer method" - what a coincidence :-). - Would containers benefit from format? It seems to me they very well might (admittedly rarely), and given the above view forbidding format for containers is arbitrary and a needless exception. It would be simpler to allow them. - I'm impressed by the fact that this is almost identical to Perl's type system - and we arrived at it independently. Either Larry Wall was extremely lucky, he had a working crystal ball that told him this type system would be good for Yaml, or this approach is "right" in some deep way (for scripting languages) and he "merely" arrived at it as an "inevitable" result. Of course this made our life easier because we had it in front of us, while he more-or-less invented it from scratch (AFAIK). I don't want to start a language war here or anything, and Parrot is supposed to run Python and Ruby programs as well anyway... I'm just rather surprised by this result. If you would have asked me a year back my bet would have been that we'd end up being more "traditional" and Perl would be the "odd man out". In fact when I first encountered "bless" I thought it was a horrible hack; now I want to bless Larry for getting it right. Either way, YAML makes a *perfect* fit for Parrot now. Way to go! - As for timestamps... I think we had better leave them out of the core spec. The use cases we have to day (logging etc.) don't require timestamp as a type family. They are all happy using strcmp on two different values (for ==, >= etc.). They aren't different in any way from the use cases for using URLs in YAML, or IP addresses, or E-mail addresses, etc. In all these cases, simply thinking of them as a string and letting the application worry about its internal format is the right way to go. And in all these cases, there are standards external to YAML that specify how these strings should be formatted (in the case of dates, there's ISO as well as other de-facto standards). When a time data type is actually _needed_ it is when the above isn't enough (e.g. you need generic YAML tools to provide operators on these values). But then a simple timestamp type also isn't enough (e.g., due to time zone issues). We should start work on a separate spec that would cover both time/date and currency (A "Recommended YAML type families for business applications" spec). Clark could take the lead there. We'll cover fun stuff such as time zones and time periods and currency conversion rates and so on. There may also be a similar separate spec for URLs and E-mail addresses and IP addresses and domain names (A "Recommended YAML type families for network applications" spec) - The Ruby people may want to drive this one, as it matches some of their built-in types. Maybe another spec for units (A "Recommended YAML type families for engineering" spec), and so on. The core spec should only contain core "_language_ data types" (as opposed to core "_application_ data types"), which means all the types we have today minus the timestamp. Thoughts? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-09-05 16:05:08
|
On Thu, Sep 05, 2002 at 11:28:12AM +0300, Oren Ben-Kiki wrote: | We've agreed to get rid of #TAB. That was easy. Yes. | As for timestamps, we got dragged into the whole problem of implicit types, | whether they can be extended, and so on. This is heavy stuff and debate was | very lively :-) Indeed. Extra flexibility at a cost of complexity; although our end goal is user readability... so perhaps this isn't so bad. The biggest problem with this approach is that it will take several months to flesh out and explore. Although it does seem to be the "right way" to do it. | I raised a proposal that was tentatively accepted as "worth pursuing". The | core of it is to accept that type family is optional (i.e., a node may have | *no* type family, only a *kind*). It is the loader's duty to convert a | generic node - which may or may not have an associated type family, and/or a | format - into some native data structure (and the dumper's job to do the | reverse). Right. Aka a "scalar" value but not necessarly a string, integer, etc. | I'll need to write this down "properly" - what are the exact effects on the | various data models, processing, consequences for generic tools, the schema | language, and so on (most of these issues were discussed a bit in the IRC | session, but not all the way through). This will take me some time so the | earliest I'll be able to post this would be Sunday. | | - Maybe the native model shouldn't be defined in terms of type family at | all; instead maybe it should have the concept of a "native data type", a | "native value", and a "kind". "Type family" and "Format" would only exist in | the generic model, with the "Viewer" responsible for the mapping. (The | Viewer is used by the Loader and Dumper - take a look at the data models | diagrams). Hmm. | - In this view type family and format are merely instructions to the Viewer | (OK, Loader) on how to map the generic node to a native one (i.e., it is a | "transfer method" - what a coincidence :-). Ok. So in this formulation there isn't a need to distinguish between type family and format, they can be rolled together as a single entity; a transfer method? | - Would containers benefit from format? It seems to me they very well might | (admittedly rarely), and given the above view forbidding format for | containers is arbitrary and a needless exception. It would be simpler to | allow them. Ok. | - I'm impressed by the fact that this is almost identical to Perl's type | system - and we arrived at it independently. Either Larry Wall was extremely | lucky, he had a working crystal ball that told him this type system would be | good for Yaml, or this approach is "right" in some deep way (for scripting | languages) and he "merely" arrived at it as an "inevitable" result. Of | course this made our life easier because we had it in front of us, while he | more-or-less invented it from scratch (AFAIK). Ok. So, in the generic model each node then is either a scalar, list or mapping. Further, each node may have a !transfer|method but this is optional. Let us call "implicit typing" the process whereby transfer methods are either added or stripped. The question becomes _where_ does this typing occur. We have several choices: (a) it is done by the parser (b) it is done in a step between the parser and the loader (c) it is done by the loader (d) it is done after the loader In my opinion, for greatest compatibility, it should not be done in the loader as this would cause each language to have their own implicit typing mechanism; giving code duplication and thus implying interoperability concerns. Indeed, this step could be done in a shared "C" libyaml which all native bindings leverage. Ideally then, this "implicit typing" should be expressed not in code but as a YAML document which fills in implied !transfer|methods at either the parser level or right before the loader. The loader would then be responsible for finding an appropriate binding for the given transfer|method. Further, there are two ways typing can occur: (a) regular expression (b) by path The output of the parser is the serial model; thus it appears as if this typing should be expressable at this model (and not the generic model). Thus, a YPATH restricted to sequential access would be sufficient for path-based access. How about making the type family _mandatory_ in the generic model and _optional_ in the serial model? If you agree up to here, then this leaves us a question as to how we keep this "implicit typing" process open for specification in the future while allowing us to finalize YAML core. I was thinking that the #SCHEMA or #DOMAIN directive could provide for this escape hatch. In any case, once we fix the models nothing else really needs to be done here. | - As for timestamps... I think we had better leave them out of the core | spec. The use cases we have to day (logging etc.) don't require timestamp as | a type family. They are all happy using strcmp on two different values (for | ==, >= etc.). This only works if everyone uses an even stricter subset, namely a full explicit ISO 8601 with T and specified up to N digits for the fraction of a second. IMHO, this is just too ugly to be workable. | They aren't different in any way from the use cases for using | URLs in YAML, or IP addresses, or E-mail addresses, etc. Time is different. URLs and such can be compared for equality directly by string comparison and various operators arn't defined. For timestamp it is more complicated. | in the case of dates, there's ISO as well as other de-facto standards And lots of ways to write a date in those standards. Ick. This is exactly the problem that a YAML timestamp solves. | When a time data type is actually _needed_ it is when the above isn't enough | (e.g. you need generic YAML tools to provide operators on these values). But | then a simple timestamp type also isn't enough (e.g., due to time zone | issues). zoneinfo is available almost everwhere and is sufficient | The core spec should only contain core "_language_ data types" (as opposed | to core "_application_ data types"), Agreed. And I use alot of relational database programming languages, which have TIMESTAMP as a core language data type. Clark |
From: Clark C . E. <cc...@cl...> - 2002-09-05 16:23:49
|
On Thu, Sep 05, 2002 at 04:06:35PM +0000, Clark C . Evans wrote: | | - As for timestamps... I think we had better leave them out of the core | | spec. The use cases we have to day (logging etc.) don't require timestamp as | | a type family. They are all happy using strcmp on two different values (for | | ==, >= etc.). | | This only works if everyone uses an even stricter subset, namely | a full explicit ISO 8601 with T and specified up to N digits for | the fraction of a second. IMHO, this is just too ugly to be workable. | | | They aren't different in any way from the use cases for using | | URLs in YAML, or IP addresses, or E-mail addresses, etc. | | Time is different. URLs and such can be compared for equality | directly by string comparison and various operators arn't defined. | For timestamp it is more complicated. | | | in the case of dates, there's ISO as well as other de-facto standards | | And lots of ways to write a date in those standards. Ick. This | is exactly the problem that a YAML timestamp solves. | | | When a time data type is actually _needed_ it is when the above isn't enough | | (e.g. you need generic YAML tools to provide operators on these values). But | | then a simple timestamp type also isn't enough (e.g., due to time zone | | issues). | | zoneinfo is available almost everwhere and is sufficient Also, our timestamp family doesn't use timezones, it only uses offsets from UTC; which is good enough for 90% of the cases. If someone *does* use actual timezones, then they can worry about it at the application level since it won't match the regular expression. For example, 2002-01-01 10:00:00.00 EST which is, by the way, ambiguous since EST could mean Eastern Standard Time in the US or in AU. World of difference. We arn't touching timezone abbreviations and such with our timezone type; so these issues arn't ours. I'd actually like to hear what the issues with the current timestamp are... Clark |
From: Clark C . E. <cc...@cl...> - 2002-09-05 18:15:05
|
On Thu, Sep 05, 2002 at 01:14:21PM -0400, Steve Howell wrote: | Steve: | 1) I want 2002-10-12 to be implicitly a string, because it's simpler and | more portable. Why not have 34.3 be implicitly a string beacuse its simpler and more portable. Afterall some hand-held devices don't have support for IEEE floating point numbers. This is just a matter of where you draw your line of "core types". I think that if integer, floating point, boolean are included, then timestamp should be right up there with them. | 2) I don't want YAML discussions to be bogged down with endless timezone | issues. I've yet to hear one timezone issue. Oren brought up one, but it was a timezone string... not an offset. The !time in the spec doesn't actually use timezones it uses numeric offsets from UTC. Oren's strawman just doesn't apply to what we have. | 3) I want all my end users to have solutions that lead to human readable | YAML documents, but type-aware programs. Agreed. | 2) Clark doesn't want to explicitly type the timestamps, because he feels | that they philosophically deserve the same treatment as other core datatypes. | 3) Clark doesn't want to add a YAML extension that would let him implicitly | type the timestamps, because he feels nervous about the pluggable type concept. I think there are alot of issues to work out with the pluggable implicit type mechanism. For instance, how do you specify the implicit type rules so that the document is portable? I see handling implicits as two complementary processes: 1) finding a !transfer|method given a regular expression or perhaps even a path expression. 2) binding the !transfer|method to a native data type. The latter should be part of the loader, the former should Be done between the parser and the loader and preferrably in a manner which is independent of the particular binding. Also, I'm concerned that this pluggable type mechanism will become something similar to DTDs where you can't really use the given document unless you know all of the implicit rules. This, IMHO, is a tough road... I'm not sure that we should even head down this path -- its complicated. We need then to provide a type-lookup system, local-registry, etc. And this mechanism needs to be done in a manner which is consistent across every language binding. In short... this isn't simple. It will take many months, and probably the best place to do it is libyaml so that each language binding can inherit the behavior without having to code-it-up for each language. | The solution that works for both of us is to just force Clark to put parens | around the dates, or force Clark to put the T in the middle, or force Clark to | put !timestamp in front. Alternativly, the solution for integers, floats, booleans is to put parens around or explicitly type. Ick. | But the pluggable type solution is the better long term solution, IMHO. | Yesterday was excellent brainfood. I want to revive the pluggable type | solution, but only after all the timezone emails slow down, and people have a | few days to resubscribe to the list. ;) I've yet to see a single timezone issue. There is a problem in that noon was chosen and that seconds arn't optional. But these are minor problems that can be fixed. I'm very irritated that we are going over this during last call. I say we fix up timestamp to address Mike's direct problems. And then focus on libyaml, ypath and schema. This pluggable type system can come later on down the road after we have a solid libyaml, ypath and a schema language to build upon. Best, Clark |
From: Steve H. <sh...@zi...> - 2002-09-05 16:37:41
|
----- Original Message ----- From: "Clark C . Evans" <cc...@cl...> > > | As for timestamps, we got dragged into the whole problem of implicit types, > | whether they can be extended, and so on. This is heavy stuff and debate was > | very lively :-) > > Indeed. Extra flexibility at a cost of complexity; although our > end goal is user readability... so perhaps this isn't so bad. > I think the complexity was already there; we just didn't notice it until Mike Orr started trying to use dates in a different way than Clark Evans uses dates. The complexity comes from the intrinsic ambiguity of dates. The good news is that we have a proposal on the table that drives the intrinsic complexity of date handling out of YAML and up into the application layer. Instead of YAML having to be date-aware, URL-aware, email-aware, etc., YAML just becomes pluggable-type aware. Then, the added flexibility gives YAML simplicity, not complexity. > The biggest problem with this approach is that it will take several > months to flesh out and explore. Although it does seem to be > the "right way" to do it. > "Months" sounds too pessimistic to me. I am thinking more in terms of days and weeks. We won't have a perfect solution right away, but we can begin making progress to a better solution within a matter of days. Actually, we already made good progress yesterday. I'm glad that you feel it's the "right way" to do it, because it also feels right to me. Cheers, Steve |
From: Brian I. <in...@tt...> - 2002-09-05 20:12:39
|
On 05/09/02 16:06 +0000, Clark C . Evans wrote: > On Thu, Sep 05, 2002 at 11:28:12AM +0300, Oren Ben-Kiki wrote: > > | The core spec should only contain core "_language_ data types" (as opposed > | to core "_application_ data types"), > > Agreed. And I use alot of relational database programming languages, > which have TIMESTAMP as a core language data type. That's not a reasonable argument IMO. What are these languages that you speak of, and most importantly, do they have YAML implementations? Please don't call Perl and Python "database programming languages" for the sake of this argument ;) Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-09-05 20:19:41
|
On Thu, Sep 05, 2002 at 01:11:45PM -0700, Brian Ingerson wrote: | On 05/09/02 16:06 +0000, Clark C . Evans wrote: | > On Thu, Sep 05, 2002 at 11:28:12AM +0300, Oren Ben-Kiki wrote: | > | > | The core spec should only contain core "_language_ data types" (as opposed | > | to core "_application_ data types"), | > | > Agreed. And I use alot of relational database programming languages, | > which have TIMESTAMP as a core language data type. | | That's not a reasonable argument IMO. What are these languages that you speak | of, and most importantly, do they have YAML implementations? PostgreSQL and yes, I'm working on a YAML binding. Clark |
From: Brian I. <in...@tt...> - 2002-09-05 20:48:20
|
On 05/09/02 20:21 +0000, Clark C . Evans wrote: > On Thu, Sep 05, 2002 at 01:11:45PM -0700, Brian Ingerson wrote: > | On 05/09/02 16:06 +0000, Clark C . Evans wrote: > | > On Thu, Sep 05, 2002 at 11:28:12AM +0300, Oren Ben-Kiki wrote: > | > > | > | The core spec should only contain core "_language_ data types" (as opposed > | > | to core "_application_ data types"), > | > > | > Agreed. And I use alot of relational database programming languages, > | > which have TIMESTAMP as a core language data type. > | > | That's not a reasonable argument IMO. What are these languages that you speak > | of, and most importantly, do they have YAML implementations? > > PostgreSQL and yes, I'm working on a YAML binding. And I'm working on a KornShell binding, which is much closer to a general purpose programming language than PostgreSQL. I don't think either of these should have a serious impact on core YAML. What you are so desparately fighting for is a single use case (the timestamp) instead of a more generic approach for the data type du jour. Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-09-05 21:07:09
|
On Thu, Sep 05, 2002 at 01:47:30PM -0700, Brian Ingerson wrote: | I don't think either of these should have a serious impact on core YAML. | What you are so desparately fighting for is a single use case (the | timestamp) instead of a more generic approach for the data type du jour. I respectfully disagree. Timestamps are core data types, just as core if not more core than floating point numbers, for example. I'll agree to remove timestamp when floating point numbers are removed. I've been very up front over the last year about compatibility with most SQL databases, hence my support for NULL, BOOLEAN, INTEGER data types. FLOAT isn't perfect since many databases don't support float well (or do so via fixed point). As part of this support a general TIMESTAMP is very common and indeed core. I'll gladly support a more generic approach -- but I think it is optimistic to think that such an approach can be formalized in before the end of the year as there are a number of concerns which require some serious thought and prototyping. Libyaml really needs to be further along and I think some of schema and/or ypath should probably be involved if we are to do it right. Although Mike's specific concerns were directed at dates, they either are implementation issues (which can be fixed) or are fixable with minor tweaks to the current time stamp issue. In fact, I would argue that his questions about the timestamp type is good argument that it is a good idea and not a bad one. Attempting to re-cast the timestamp data type as a general time/date solution was never the focus of the type. It does not include timezone (only includes UTC offset) etc. And it was corrected along time ago with my lack of understanding of TIME (without DATE). So, for the last call we have a serious question: Do we wait for a revisit of the type system for 1.0? If so, then I think we should recall the "last call" status as this will be a fundamental change and one that may take further revisions to get right. Else, I'd like to make the two changes to timestamp (seconds optional and using midnight) to make it more useable. But not make any other serious changes. In any case, we can continue our discussion of a new implicit type mechanism... Clark |
From: Steve H. <sh...@zi...> - 2002-09-06 06:44:59
|
----- Original Message ----- From: "Clark C . Evans" <cc...@cl...> > > So, for the last call we have a serious question: > > Do we wait for a revisit of the type system for 1.0? If so, then > I think we should recall the "last call" status as this will > be a fundamental change and one that may take further revisions > to get right. The YAML spec needs to be firm about the syntax model. The YAML spec needs to be agnostic and flexible about higher levels. We should go ahead and put a "last call" on version 1.0 of the spec. YAML is stable enough to call 1.0. But there's gonna be a version 2.0, and people need to understand that. It's gonna grow up, just like Perl grew up, just like Python grew up, just like Java grew up, and just like Ruby grew up. We are only just now learning how programmers really use YAML. We have only just begun to dig into schemas, and ypaths, and many killer apps for YAML. We are gonna learn lessons later that we will want to apply, and I don't want us to be afraid to make important changes. We won't go changing things willy-nilly, but we'll make YAML better. > Else, I'd like to make the two changes to timestamp (seconds > optional and using midnight) to make it more useable. But > not make any other serious changes. > I support that change. But PyYaml will soon have a coarse option to turn off all implicit conversions, and it may allow more fine-grained code-based tunings of implicit conversions in the future. I don't know if the YAML spec needs to deliberately condone this practice, but I would like for it not to forbid it. Cheers, Steve |
From: Clark C . E. <cc...@cl...> - 2002-09-06 15:55:55
|
On Fri, Sep 06, 2002 at 02:45:09AM -0400, Steve Howell wrote: | From: "Clark C . Evans" <cc...@cl...> | > | > So, for the last call we have a serious question: | > | > Do we wait for a revisit of the type system for 1.0? If so, then | > I think we should recall the "last call" status as this will | > be a fundamental change and one that may take further revisions | > to get right. | | The YAML spec needs to be firm about the syntax model. The YAML spec needs to | be agnostic and flexible about higher levels. Let's be careful here. The higher levels -- specifically the serial and generic models -- are what YPATH, SCHEMA, and other generic YAML tools will be based upon. The biggest mistake that XML made was the assertion that it was "just a syntax" this has caused quite a bit of grief as each parser and company has their own interpretation as to what the XML information means. | We should go ahead and put a "last call" on version 1.0 of the spec. YAML is | stable enough to call 1.0. But there's gonna be a version 2.0 According to the plan we put forth about two months ago, we were going to last call on Sept 1 or there abouts (which we did) and then go to "proposed recommendation" in December, the "final recommendation" will then follow about 4-6 months after (after December only typos and bug-fixes are allowed). So. I don't think there is time to completely flesh-out a more generic implicit type identification system; but perhaps we could add some sort of place holder mechanism for it to be turned off such as #IMPLICIT:OFF | We are only just now learning how programmers really use YAML. We have only | just begun to dig into schemas, and ypaths, and many killer apps for YAML. Absolutely... but I think a 2.0 will follow some years down the road. | > I'd like to make the two changes to timestamp (seconds | > optional and using midnight) to make it more useable. But | > not make any other serious changes. | > | | I support that change. Thanks. | But PyYaml will soon have a coarse option to turn off all implicit | conversions, and it may allow more fine-grained code-based tunings | of implicit conversions in the future. There are two seperable concerns: - Building a native type given a string and a !transfer|method - A mechanism to change the implicit type detection rules. I fully support #1 as this is what we had originally intended for the loader. I'd rather you not move forward with the second item without some serious talk on this list. For those that don't want to use implicit types, they can simply quote everything for now... it's not _that_ painful. A quick hack, which we could do if this is a serious show-stopper for someone who has a serious application of YAML is to introduce a directive which turned-off the implicit type detection; and perhaps in the future the same directive could be used to signfy alternative type detection rules. | I don't know if the YAML spec needs to deliberately condone this | practice, but I would like for it not to forbid it. What the application does in the Native Model is up to it, but if there isn't a isomorphism between the application's native binding and the Generic Model then the native binding isn't YAML compliant (as generic YAML tools, such as a dumper YPATH engine, or YAML Schema) won't be able to act on the information in a consistent manner. Thus, while someone can use the first concern (loading a string plus a !transfer|method) into a native object to accomplish the second (changing the implicit type detection), I'd rather not condone this practice or even give examples in a YAML tool. If you really need to *disable* the implicit type detection rules, then let's put in a #IMPLICIT:OFF directive for this or something similar. I'm sure we can do something like this during the last call... Best, Clark |
From: Steve H. <sh...@zi...> - 2002-09-06 16:40:32
|
----- Original Message ----- From: "Clark C . Evans" <cc...@cl...> > | The YAML spec needs to be firm about the syntax model. The YAML spec needs to > | be agnostic and flexible about higher levels. > > Let's be careful here. The higher levels -- specifically the > serial and generic models -- are what YPATH, SCHEMA, and other > generic YAML tools will be based upon. The biggest mistake that > XML made was the assertion that it was "just a syntax" this has > caused quite a bit of grief as each parser and company has their > own interpretation as to what the XML information means. > The fact that XML is "just a syntax" is what makes it wildly popular, in my opinion. The fact that XML then allows multiple interpretations of this syntax is also what makes it wildly popular, in my opinion. The folks that said that XML would eliminate the grief of interoperability were selling snake oil. > So. I don't think there is time to completely flesh-out a more > generic implicit type identification system; but perhaps we could > add some sort of place holder mechanism for it to be turned off > such as #IMPLICIT:OFF > Just a minor quibble here--I don't like the directive in the YAML itself. A YAML file can be just data, and the data doesn't have to imply any native typing. But some applications require data types. Multiple applications may use different data types for the same YAML file. This is true not only when you port the same application between multiple languages, but it's also true when you use the same data for multiple applications. Think of a YAML log file. One application may read the log file to compute uptime statistics. This application may want to treat dates as dates, memory usage as floats, and number of machines down as integers. Another application may just want to pretty print the log to a web browser. It wants to treat the data as strings. So there's no use for a directive here in the YAML; you need flexibility in the application itself. How much thought has the YAML group given to these types of interoperability scenarios? Do we really want to lock down the spec for the next two years on this issue, just as YAML application developers and YAML implementors are really beginning to dig into issues like schemas, ypaths, and yaml-rpc? I'd love to discuss some solutions now, even if they jeopardize the sacred "last call" on the spec, and even if some of my proposed solutions will cause Clark's Python-to-Ruby SlideShowell clone to break, or his goldfish to die, six months from now, because of something I haven't thought of. ;) If we can break it, we can fix it. Cheers, Steve P.S. Most of the concerns raised about optional pluggability of implicit type conversions are straw man arguments. If you think that customized implicit type conversions threaten interopability for your application, then just turn them off. Or I should say more accurately, don't turn them on. |
From: Clark C . E. <cc...@cl...> - 2002-09-06 17:02:24
|
On Fri, Sep 06, 2002 at 12:40:11PM -0400, Steve Howell wrote: | | The fact that XML is "just a syntax" is what makes it wildly popular, | in my opinion. Or wildly unpopular. I'm here beacuse XML's information model (or lack there of) sucks big time. Its syntax sucks too... but this to me is a very minor point. | The fact that XML then allows multiple interpretations of this syntax is also | what makes it wildly popular, in my opinion. No. What makes it wildly popular is hype. People who have to deal with tools that arn't consistent on a daily basis curse at XML. | > So. I don't think there is time to completely flesh-out a more | > generic implicit type identification system; but perhaps we could | > add some sort of place holder mechanism for it to be turned off | > such as #IMPLICIT:OFF | | Just a minor quibble here--I don't like the directive in the YAML itself. | A YAML file can be just data, and the data doesn't have to imply any native | typing. But some applications require data types. Multiple applications may | use different data types for the same YAML file. This is true not only when | you port the same application between multiple languages, but it's also true | when you use the same data for multiple applications. Well, if you want to ignore the implicit typing and don't want to single quote everything then we have two options: 1) Eliminate all implicit typing. 2) Have a directive to turn it off. It is completely unacceptable to me that implementations can pick/choose how they want to do implicit typing. Either we have it or we don't. Clark |
From: <ir...@ms...> - 2002-09-06 17:15:54
|
On Fri, Sep 06, 2002 at 12:40:11PM -0400, Steve Howell wrote: > > So. I don't think there is time to completely flesh-out a more > > generic implicit type identification system; but perhaps we could > > add some sort of place holder mechanism for it to be turned off > > such as #IMPLICIT:OFF > > > > Just a minor quibble here--I don't like the directive in the YAML itself. A > YAML file can be just data, and the data doesn't have to imply any native > typing. But some applications require data types. Multiple applications may > use different data types for the same YAML file. This is true not only when you > port the same application between multiple languages, but it's also true when > you use the same data for multiple applications. > > Think of a YAML log file. One application may read the log file to compute > uptime statistics. This application may want to treat dates as dates, memory > usage as floats, and number of machines down as integers. Another application > may just want to pretty print the log to a web browser. It wants to treat the > data as strings. > > So there's no use for a directive here in the YAML; you need flexibility in the > application itself. That's what I've been thinking. Whether to use implicit types is not a function of the document, it's a function of the application. A simple constructor argument or instance variable in the parser convertImplicitTypes = False # default True would do everything I've requested *and* would be unobtrusive to those who believe in always using YAML's types or want to implement pluggable types. It would just trigger a huge if-block that bypasses all that. The reason to implement it now is that *if* anybody has any problems with our implicit types -- problems we haven't identified yet -- they can turn it off rather than cursing YAML and vowing never to use it again. Later, when/if layers are implemented, convertImplicitTypes would mean "skip layer 2". Although by that point we'd hopefully have separate entry points for "layer 1 only" and "layers 1 & 2". -- -Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@oz...) http://iron.cx/ English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Clark C . E. <cc...@cl...> - 2002-09-06 17:34:59
|
On Fri, Sep 06, 2002 at 10:15:52AM -0700, Mike Orr wrote: | On Fri, Sep 06, 2002 at 12:40:11PM -0400, Steve Howell wrote: | > > So. I don't think there is time to completely flesh-out a more | > > generic implicit type identification system; but perhaps we could | > > add some sort of place holder mechanism for it to be turned off | > > such as #IMPLICIT:OFF | > | > Just a minor quibble here--I don't like the directive in the YAML itself. ... | > So there's no use for a directive here in the YAML; you need | > flexibility in the application itself. | | That's what I've been thinking. Whether to use implicit types is not a | function of the document, it's a function of the application. This is a short-term, code-centric view. It works, but I think it is exactly the opposite of what YAML is about. In my background, a big driver for XML was the recognition that the life of data almost always out-lives the program which processes it (this is what Gartner Group was pushing in their earlier XML peopers encouraging CIOs to adopt XML). The other aspect of XML was "self-describing" which drove adoption. To me it is clear that XML was brought up in a world where more than one program are the frequent consumer of a given data file -- even when the original author of the primary program thinks otherwise. | The reason to implement it now is that *if* anybody has any problems | with our implicit types -- problems we haven't identified yet -- they can | turn it off rather than cursing YAML and vowing never to use it again. If they have problems with it, they can just single quote! No? Were talking about YAML having multiple language bindings (implemetations) used in many different types of applications. YAML is already complicated why do we even need to do this now? | Later, when/if layers are implemented, convertImplicitTypes would mean | "skip layer 2". Although by that point we'd hopefully have separate | entry points for "layer 1 only" and "layers 1 & 2". I have no problem with a flag to turn it off. But I really think that such a flag belongs in the YAML file itself. To do otherwise is quite mis-leading. Clark |
From: Brian I. <in...@tt...> - 2002-09-06 18:10:55
|
On 06/09/02 17:36 +0000, Clark C . Evans wrote: > On Fri, Sep 06, 2002 at 10:15:52AM -0700, Mike Orr wrote: > | On Fri, Sep 06, 2002 at 12:40:11PM -0400, Steve Howell wrote: > > I have no problem with a flag to turn it off. But I really think > that such a flag belongs in the YAML file itself. To do otherwise > is quite mis-leading. This is similar to the discussions we had on tabs. At one point I wanted to let the application decide how to handle them. Clark and Oren talked me out of it, and I think they were right. You want a YAML document to stand on its own as much as possible, and to contain all the information that an application needs to process it and roundtrip it accurately. The temptation is great to just use YAML as a syntax and then apply any sort of useful semantics you wish to it. This of course kills interoperability. On one hand, I buy that argument. Clark will die by it. Oren will be next in line. I personally am not that sure. The key point is on how important of a use case "interoperability" actually is. I tend to think that for the motherload of applications, there will be one programming language, one set of programmers and one application domain. I sure as hell don't care how well my config file for my foobar written in Perl, works for your barson written in Ruby, if you get my drift. Perhaps we need to define the usage categories first and then graph out what level of interoperability is important in each. Oren is EXCELLENT at this type of thing (if you get my drift Oren ;) For instance, maybe YAML is only completely interoperable under the guidance of a schema. We need to map usage categories against semantic properties. I'll start a list of each. Feel free to flesh them out. After we get enough, let's graph them. YAML Usage Categories: - Internal Data Serialization - Config Files - Log Files - Data Dumping - RPC Messaging - Object Persistence - Documents (like invoices) - YAML Editors YAML Semantic Properties: - NYN Roundtripping - YNY Roundtripping - Non string data types - Numeric types - Timestamp types - Mapping key order It has been my uneasy feeling that we have never looked closely at these types of correlations. Clark has always wanted a YAML config file to be as heavy duty as a inter-platform messaging system. I think that's the crux of our problem. YAML needs to make the simple things simple and the hard things possible (like Perl! :) There needs to be a sliding scale of language features (like schema) and tools (like YPATH) and information models (like the Generic model) that make YAML stronger when it needs the strength, but (more importantly) can be avoided when it doesn't. Cheers, Brian |
From: Steve H. <sh...@zi...> - 2002-09-06 19:05:23
|
----- Original Message ----- From: "Brian Ingerson" <in...@tt...> > On 06/09/02 17:36 +0000, Clark C . Evans wrote: > > On Fri, Sep 06, 2002 at 10:15:52AM -0700, Mike Orr wrote: > > | On Fri, Sep 06, 2002 at 12:40:11PM -0400, Steve Howell wrote: > > > > I have no problem with a flag to turn it off. But I really think > > that such a flag belongs in the YAML file itself. To do otherwise > > is quite mis-leading. > > This is similar to the discussions we had on tabs. At one point I wanted to > let the application decide how to handle them. Clark and Oren talked me out > of it, and I think they were right. You want a YAML document to stand on its > own as much as possible, and to contain all the information that an > application needs to process it and roundtrip it accurately. Understood. > The temptation is great to just use YAML as a syntax and then apply any sort > of useful semantics you wish to it. This of course kills interoperability. On > one hand, I buy that argument. Clark will die by it. Oren will be next in > line. I personally am not that sure. > Understood. > The key point is on how important of a use case "interoperability" actually > is. I tend to think that for the motherload of applications, there will be > one programming language, one set of programmers and one application domain. > I sure as hell don't care how well my config file for my foobar written in > Perl, works for your barson written in Ruby, if you get my drift. +3 > Perhaps we need to define the usage categories first and then graph out what > level of interoperability is important in each. Oren is EXCELLENT at this > type of thing (if you get my drift Oren ;) > +5...this is a great idea > For instance, maybe YAML is only completely interoperable under the > guidance of a schema. > Mild disagreement at face value, but I like I where this is going... > We need to map usage categories against semantic properties. I'll start a > list of each. Feel free to flesh them out. After we get enough, let's graph > them. > > > YAML Usage Categories: > - Internal Data Serialization > - Config Files > - Log Files > - Data Dumping > - RPC Messaging > - Object Persistence > - Documents (like invoices) > - YAML Editors Not sure the best general term for it, but why and I also use YAML as a data language for document generation, from which we generate HTML pages or Microsoft help files. > YAML Semantic Properties: > - NYN Roundtripping > - YNY Roundtripping > - Non string data types > - Numeric types > - Timestamp types > - Mapping key order I really like where this is going. > > It has been my uneasy feeling that we have never looked closely at these > types of correlations. Clark has always wanted a YAML config file to be as > heavy duty as a inter-platform messaging system. I think that's the crux of > our problem. +3 > YAML needs to make the simple things simple and the hard things > possible (like Perl! :) There needs to be a sliding scale of language > features (like schema) and tools (like YPATH) and information models (like > the Generic model) that make YAML stronger when it needs the strength, but > (more importantly) can be avoided when it doesn't. > +10 |
From: <ir...@ms...> - 2002-09-06 19:05:06
|
On Fri, Sep 06, 2002 at 11:10:47AM -0700, Brian Ingerson wrote: > On 06/09/02 17:36 +0000, Clark C . Evans wrote: > > On Fri, Sep 06, 2002 at 10:15:52AM -0700, Mike Orr wrote: > > | On Fri, Sep 06, 2002 at 12:40:11PM -0400, Steve Howell wrote: > > > > I have no problem with a flag to turn it off. But I really think > > that such a flag belongs in the YAML file itself. To do otherwise > > is quite mis-leading. > > This is similar to the discussions we had on tabs. At one point I wanted to > let the application decide how to handle them. Clark and Oren talked me out > of it, and I think they were right. You want a YAML document to stand on its > own as much as possible, and to contain all the information that an > application needs to process it and roundtrip it accurately. Indentation is part of a document's structure, like "---", ":" and "- ". If you get it wrong, you get syntax errors, ambiguities, values being cut off in the middle, values containing part of the next object, etc. Tab usage must be specified in the document. Type information is different. It's "just" a detail about the value. The value doesn't have any meaning at all except in the eye of the beholder. Different beholders may see different things -- sometimes with the document author's sanction. Maybe the type information should be encoded in the document, maybe not, but it depends on the situation. If you say by fiat that 123 *will* be converted to an integer and 2002-09-15 *will* be a date, no exceptions allowed, and any unusual types *must* have a quoting escape ('' or !!className) or #DIRECTIVE in the document, you cut off entire classes of YAML applications which must find another text format because the rules are unacceptable for whatever reason. This is fine *if* you want to limit YAML to a few certain uses, and *if* you are conciously aware that this is what you're doing. I think it's possible to let in a much wider variety of YAML applications without destroying the simplicity, logic and unity we all want YAML to have. > Perhaps we need to define the usage categories first and then graph out what > level of interoperability is important in each. Good idea. > For instance, maybe YAML is only completely interoperable under the > guidance of a schema. Or when all applications use the implicit=off flag. Or when the schema is simple. Or... > YAML Usage Categories: > - Internal Data Serialization > - Config Files shared config file for multiple platforms default config file distributed with an application private config file for a language-specific, platform-specific application > - Log Files > - Data Dumping backup/restore hand editing data (e.g., database records) preparing new database records for insert > - RPC Messaging any kind of inter-process communication, whether real-time or not Jabber (uses XML to do instant messaging or remote control, could use YAML) > - Object Persistence > - Documents (like invoices) changelogs, journal entries, recipies, spreadsheet data, any "structured text" document data to be formatted for human reading document data to be parsed for program calculations > - YAML Editors meaning a text editor with YAML extensions for editing any YAML document, I hope We may also want to look at which nesting structures are the most common, and which nesting structures apply to which uses. For instance, YAML *can* embed a list inside a dictionary inside a list, but the most common structures would be: - a single dictionary - a dictionary with some values containing a list - a dictionary with some values containing a dictionary - not a top-level list, because many users will just use a series of documents instead -- -Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@oz...) http://iron.cx/ English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Clark C . E. <cc...@cl...> - 2002-09-06 19:58:59
|
On Fri, Sep 06, 2002 at 12:05:02PM -0700, Mike Orr wrote: | If you say by fiat that 123 *will* be converted to an integer and | 2002-09-15 *will* be a date, no exceptions allowed, and any unusual | types *must* have a quoting escape And why not? In almost every computer language I know, you quote things begin with a digit. This is a perfectly simple and resonable rule. | This is fine *if* you want to limit YAML to a few certain uses, | and *if* you are conciously aware that this is what you're doing. I fail to follow you. Just quote stuff that doesn't start with an alpha character. Think of it as a SYNTAX rule. Clark |
From: <ir...@ms...> - 2002-09-06 20:38:36
|
On Fri, Sep 06, 2002 at 08:00:32PM +0000, Clark C . Evans wrote: > On Fri, Sep 06, 2002 at 12:05:02PM -0700, Mike Orr wrote: > | If you say by fiat that 123 *will* be converted to an integer and > | 2002-09-15 *will* be a date, no exceptions allowed, and any unusual > | types *must* have a quoting escape > > And why not? In almost every computer language I know, you quote things > begin with a digit. This is a perfectly simple and resonable rule. In program source code, it's normal to require quotes around strings. For configuration files though, many programs use a different format precisely to avoid quotes. Otherwise they could just use an eval'able code snippet instead and skip having to find/write a parser and syntax model. > | This is fine *if* you want to limit YAML to a few certain uses, > | and *if* you are conciously aware that this is what you're doing. > > I fail to follow you. Just quote stuff that doesn't start with > an alpha character. Think of it as a SYNTAX rule. Quoting detracts from readability if the file is human read and edited. If I'm using YAML format for my changelog, do I have to quote entries like this? - Item beginning with a letter. - '124 bugs squashed today.' - '124 bugs squashed today. Reverted yesterday's patch that didn't work.' # MAYDAY, MAYDAY! Embedded "'"s in string! Especially that last part, escaping embedded quotes, is irritating. I really hate how SQL syntax forces you to do that. -- -Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@oz...) http://iron.cx/ English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Clark C . E. <cc...@cl...> - 2002-09-06 20:56:49
|
On Fri, Sep 06, 2002 at 01:38:34PM -0700, Mike Orr wrote: | > And why not? In almost every computer language I know, you quote things | > begin with a digit. This is a perfectly simple and resonable rule. | | In program source code, it's normal to require quotes around strings. | For configuration files though, many programs use a different format | precisely to avoid quotes. Otherwise they could just use an eval'able | code snippet instead and skip having to find/write a parser and syntax | model. Good argument. | > | This is fine *if* you want to limit YAML to a few certain uses, | > | and *if* you are conciously aware that this is what you're doing. | > | > I fail to follow you. Just quote stuff that doesn't start with | > an alpha character. Think of it as a SYNTAX rule. | | Quoting detracts from readability if the file is human read and edited. | If I'm using YAML format for my changelog, do I have to quote entries | like this? | | - Item beginning with a letter. | | - '124 bugs squashed today.' | | - '124 bugs squashed today. Reverted yesterday's patch that didn't | work.' # MAYDAY, MAYDAY! Embedded "'"s in string! | | Especially that last part, escaping embedded quotes, is irritating. | I really hate how SQL syntax forces you to do that. The last one is why we have multi-line forms: - > 124 bugs squashed today. Reverted yesterday's patch that didn't work It's not _that_ bad of a rule is it? Thanks for playing! ;) Clark |
From: <ir...@ms...> - 2002-09-06 21:04:34
|
On Fri, Sep 06, 2002 at 08:58:22PM +0000, Clark C . Evans wrote: > | - '124 bugs squashed today. Reverted yesterday's patch that didn't > | work.' # MAYDAY, MAYDAY! Embedded "'"s in string! > | > The last one is why we have multi-line forms: > > - > > 124 bugs squashed today. Reverted yesterday's patch > that didn't work And I thought we had multi-line forms so the embedded newline wouldn't be misread as the end of the value. :) (PS. Pretend my example didn't have an embedded newline.) -- -Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@oz...) http://iron.cx/ English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Brian D. <br...@do...> - 2002-09-06 23:31:32
|
Hi all, I'm a complete YAML newbie and I'm enjoying it. I currently have one program using YAML as simple data storage. But I was planning on using it in another - as a config file. (see below for more) On Fri, Sep 06, 2002 at 08:00:32PM +0000, Clark C . Evans wrote: > On Fri, Sep 06, 2002 at 12:05:02PM -0700, Mike Orr wrote: > | If you say by fiat that 123 *will* be converted to an integer and > | 2002-09-15 *will* be a date, no exceptions allowed, and any unusual > | types *must* have a quoting escape > > And why not? In almost every computer language I know, you quote things > begin with a digit. This is a perfectly simple and resonable rule. > > | This is fine *if* you want to limit YAML to a few certain uses, > | and *if* you are conciously aware that this is what you're doing. > > I fail to follow you. Just quote stuff that doesn't start with > an alpha character. Think of it as a SYNTAX rule. Interestingly enough, it's just this syntax that made me decide not to use YAML as a config file for a little app I have at work. I'm currently using the Python built-in ConfigParser class for a config file which looks basically like this: [options] dbserver = vmbriandbtest database = somedatabase user = sa password = 123 Which I promptly converted to YAML: --- database: somedatabase dbserver: vmbriandbtest password: 123 user: sa Great, I thought! This is so clean, no one will even know I'm dropping it straight into a dictionary! Unfortunately, it croaked the first time I tried to run it... because 123 came through as an integer. I quickly figured out that it should really look something like this: --- database: somedatabase dbserver: vmbriandbtest password: '123' user: sa Which really isn't a big deal... but... it's not something I can expect my users to know how to do. So, I skipped the YAML config file for that app. Thinking about it now, I could just str() it and be fine, but for whatever reason I didn't think about it at the time. I'm not sure what this anecdote really means... except that implicit typing is both useful and disconcerting... and most newbies infer syntax from the surrounding text. I like the idea of being able to force everything to strings in YAML parsers. Take care, -Brian PS - On a side note, including a YAML parser in distributions of Perl, Python, Ruby, etc would probably increase the user base a lot. As it is, I'll probably just include PyYAML in several applications. |
From: Clark C . E. <cc...@cl...> - 2002-09-06 23:46:31
|
On Fri, Sep 06, 2002 at 04:47:00PM -0700, Brian Dorsey wrote: | I'm a complete YAML newbie and I'm enjoying it. Fantastic to have more user feedback on this critical issue! | figured out that it should really look something like this: | --- | database: somedatabase | dbserver: vmbriandbtest | password: '123' | user: sa | | Which really isn't a big deal... but... it's not something I can expect | my users to know how to do. So, I skipped the YAML config file for that | app. Yes, this would be a gotchya. | I'm not sure what this anecdote really means... except that implicit | typing is both useful and disconcerting... and most newbies infer syntax | from the surrounding text. I like the idea of being able to force | everything to strings in YAML parsers. What do you think about requiring parenthesis around all items which are implicitly typed? strings: - 23 - 2002-01-02 - true - 2.3 typed: integer: (23) timestamp: (2002-01-02) boolean: (true) floating: (2.3) Would the use of parenthesis be both useful (for typing) but also a good enough road-sign for your newbie to ask: "Hmm. Is there something special going on here?" Best, Clark |
From: Brian D. <br...@do...> - 2002-09-07 03:04:42
|
On Fri, Sep 06, 2002 at 11:48:05PM +0000, Clark C . Evans wrote: > What do you think about requiring parenthesis around all items > which are implicitly typed? > > strings: > - 23 > - 2002-01-02 > - true > - 2.3 > typed: > integer: (23) > timestamp: (2002-01-02) > boolean: (true) > floating: (2.3) > > Would the use of parenthesis be both useful (for typing) but also > a good enough road-sign for your newbie to ask: "Hmm. Is there > something special going on here?" From a consistency standpoint I like it... there is a clear distinction between strings and everything else. On the other hand, the way YAML currently works is great for a lot of things. (just not my config files. ;) I love how easy it is to just type out a data structure and have it materialize in my app exactly as I was thinking. An interesting example along these lines: I'd never even seen YAML or Ruby before when Steve pointed me (via the SEAPIG wiki) to Why's YAML cookbook. Not only was it immediately clear what YAML is, but I found myself thinking "Oh, so that's what a list looks like in Ruby!" (Python bias showing... ;) That said, I think there are definite useful cases across the board and I'd be perfectly happy (and continue to use YAML) with either the current typing or () for typing.... especially if there is some mechanism for turning the type conversions off. (and just to drop in my 2 cents, I'd probably tend to make that decision on an app by app basis rather than in the YAML doc.) Take care, -Brian |