From: Oren Ben-K. <or...@ri...> - 2001-06-19 08:10:37
|
Yet another long post... But I think we are getting somewhere, so it may be worth reading anyway :-) Clark C . Evans [mailto:cc...@cl...] wrote: > I look at it this way: > > a) We eliminate the short-hand notation so that it is > not confusing what is a map and what is a scalar. After much agonizing, I independently came to this conclusion. In fact, I'd like to take it a bit further. Let's finalize the YAML 1.0 spec purely as a file format (plus requirements from the data model, of course). Drop all the API stuff, the discussions of color, etc. Give up on the !class shorthand, but we'll reserve all single-indicator-character keys for "special semantics" to be defined in separate API specs. Call this spec YAML-CORE. I could probably finalize it in a weekend (given we settle the binary/Unicode issue). There are going to be two types of APIs. Let's call them "Native" and "Incremental" APIs. I suggest we spec them in a separate document from the core YAML spec. First, we'll close specs faster. Second, including just one API in the core spec would give it more importance then the other, which would be wrong. Each of them has its place. The "Native" API is a simple load/save API using the language native data structures. We'll spec this quickly (once per language?), and probably implement it as quickly. We'll end up with a working, very useful, YAML support for Perl (Data::Denter), Python, C#, Java, JavaScript, and possibly C++ (based on stl; a bit of a challenge, unless we use RTTI). I suggest that to make quick progress, we just include a simple definition of v/w in such specs (as a library function), but not waste too much time hammering out the details of color behavior. Call this the YAML-NATIVE-API spec. We then move our focus on defining the "Incremental" API, which is a (mostly) language-independent push/pull API spec which fully supports color etc. This would be the YAML-INCREMENTAL-API spec. Much fun here... (I admit to not catching up on the work you've done there yet). In a fourth spec we'll go through the color idiom, what the standard colors are, what standard filters an implementation should/could provide, etc. Call this YAML-COLOR. Quite a job, right? But it drives home the point we simply can't try to do it all in one spec. We'll never finalize it; we'll be debating an YAML-INCREMENTAL-API issue, say, and in the meanwhile we are not releasing a YAML-CORE spec - because both are included in the same document. > b) We state the obvious in the spec -- adding items > to a map will, in almost every case, be a safe > addition and unlikely to break programs depending > upon its structure. Agreed. This is a YAML-COLOR issue. Keep it out of YAML-CORE, except maybe a passing statement. > c) We state that promoting a scalar to a map or list > may only be supported by particular programs and that > this technique must be used with caution. We can > underscore those cases where a structual upgrade > will work, when the applications accessing the > information: Agreed. Again, a YAML-COLOR spec issue. > 1) are written to use the visitor/iterator > interface and not dependent on a particular > in-memory representation; or That is, are written using the Incremental API. > 2) are written in languages which use a > YamlNode instead of native data structures; or Just another form of the Incremental API (I should hope). I claim that YamlNode should be the basis for the push API (the visitor visits such nodes) and hence the pull API (keep a stack of YamlNode, for example). > 3) are written to use the (w/v) methods as > described. That is, using the add-on from the Native API. Yes. > d) We also talk about the "standard" colors, such > as class, comments, classes. And we discuss the > base filters, in particular, how the standard > colors can be added (with approprite promotions) > and used with unexpecting older code if the > appropriate stripper is used... Definitely YAML-COLOR. > Overall, I think it is important to underscore that the > w/v mechanism is optional. Most users of YAML will > use them if they are "libraries" and don't have control > of the YAML. However, one-offs probably won't. Proper > filters, of course, would have to be written robustly. I'd want v/w to be a "must" part of the Native API, in the sense that "you aren't YAML if you don't provide them" (they are trivial enough to implement). But they are clearly optional in the sense that you don't have to use them. > On the subject of encoding. I'm starting to like the > idea of using a ^ indicator for this purpose. OK, wait a sec. You've written: > I was taking a walk with my g.f. and discussing this item, > it seems that there is a triparted situation: > > | Java / C# | Python | Perl | > ----------+------------+---------+----------------------------+ > ASCII | String | String | Scalar (UTF8 or Otherwise) | > UNICODE | String | Unicode | UTF8 Marked Scalar | > BINARY | Byte [] | String | Not UTF8 Marked Scalar | So, if I'm reading this right, you are suggesting we distinguish between "ASCII text" and "Unicode text". In the information model. Hmmm. Let's see. In Java/C#, being modern languages, there's no problem using just two types. That's obviously the "right thing to do". Doing three types would complicate things for them - how would you round trip an ASCII string via Java? Forcing such strings to become maps is the wrong answer :-) In Perl/Python, being older languages, there's an issue using just two types. This ties in with their problem of them supporting Unicode in the first place. Now, if by doing a three-way split we could have avoided using heuristics - the system would always do the right thing on both read and write - I'd consider it. But the table above shows that on minimum, on output you must use heuristics to distinguish between binary and ASCII. So, we must rely on heuristics to "do the right thing" in most cases. We also must provide some mechanism to override the heuristic if it guesses wrong. E.g., the printer will get an optional list of path expressions which must be printed as binary or ASCII. In Python (and only in Python) there's also the problem of reading. A reasonable heuristic would be to de-serialize any purely-ASCII text as an ASCII string ("use Unicode only when you must"). Again, the API must provide a way to change this heuristic (to "Always Unicode for text"), and possibly a way to override this on a more fine-grained level (a list of path expressions would make sense). I predict that with time the problem will slowly fade away. Perl and Python will come to terms with Unicode. So, with time the accuracy of our heuristic (16-bit == Text, 8-bit == binary) would rise. The "override" capabilities would be used less and less, and eventually we'll put the problem behind us. If, on the other hand, we enshrine the concept of "ASCII string" in the YAML information model, we'll be stuck with it. For example, whatever tricks we'll do in the Java/C# API to support it will haunt us forever. In short, the issue is ugly. I'd rather concentrate the ugliness in one single, well-defined point (using heuristics in the parser/printer of Python/Perl) so that it doesn't harm anything else (our information model, the Java/C#/future languages APIs, etc.). If we agree on the above, there are only two (easy) issues to settle to get the spec to a "pre-release" state (pre-release for me means we concentrate on polishing issues - language, examples, formatting, etc.). The two issues are whether we define a canonical form or not, and the details of the 32-bit characters. I've no problem with defining a canonical form (4 space indentation, multi-lines values start on a line of their own, use the simplest possible text style, put a single space between the key and the following ':', one space between the ':' and the value, use "most folded" white space for text, break lines after 76 characters but include at least 32 text characters, use prefix ':' in lists only for multi-line simple scalars, align multi-line quoted string text by one space as in our examples - I think that covers it). The problem is that at least some of these decisions are a matter of taste. It would be different if we defined the canonical form as "the form with the minimal number of characters" - but then it may not end up to be very readable. And of course we may simply not define one at all. Thoughts? As for 32-bit characters, I haven't done my homework on that. What does this mean, exactly? Is it OK to say that YAML uses 16-bit characters and that the 32-characters are always represented as a "surrogate pair"? So far I avoided dealing with these monstrosities - 16-bit Unicode seemed quite enough, even when dealing with Japanese clients. Clark, can you handle this issue? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-06-19 14:29:39
|
Clark C . Evans [mailto:cc...@cl...] wrote: > I'd rather have this all be one-spec to give the sense > of unity. That being said, there is nothing preventing > this spec from having multiple sections, and adopting > each section independently. So, instead of having multiple, cooperating specs, we'll have incremental spec versions (version 1 would be just the core; version 2, the core + the native API; version 3, core + native + incremental API, etc.)? I'm not aware of any other system using this method (not to say that this is necessarily bad). It has the benefit of allowing us to choose which section to add each time (e.g., version N+1 might just add language bindings for a certain new language, nothing else). Hmmm. > So... let's finalise on the binary/unicode issue. > There does not seem to be a good solution at the syntax > level, so let's punt the issue for now. Perhaps the API > may have a binary/unicode distinction, but let us leave > the YAML 1.0 file format unicode only. But keep the binary block format, right? > As for dropping the class notation. I spoke with > Brian on the phone yesterday and he seemed resistant > to this idea. So, a bit more thought may have to > go into this. Point: Brian needs it for doing Perl (de)serialization, and in Perl (or Python) all cases where you need to annotate something with a class, the "something" is a map. So in practical terms, the only difference is allowing one to write: map: !class % key: value Instead of: map: % !: class key: value I'd much rather we always use the second notation for consistency. Would Brian be happy with: map: % !:class key: value That is, we'll allow the first key/value pair to be on the same line as the map, and "we will just happen" to use it for the '!' key. Problem solved at the cost of a reorder and one additional character :-) > ... this would allow for another section (very small) > called "information model". I thought that was already covered in the current core spec. Do you want to take it out? Why? > I respectfully withdraw the ASCII suggestion. So the Unicode/Binary issue is settled. Good. > I think we can discuss the canonical form when looking > at the writer code in the impl I'm (slowly) working on. So, let's just put it in a separate section. > ... We are using Unicode, so one may use UTF8, UTF16, or UTF32 > as an internal character model; they are all encodings of the > same character set, UCS4. Recently USC4 has been restricted to > those code values expressable by Unicode, appx 2^21 characters. > Thus the ISO universal character set is now the same character > set used by Unicode. So much of this "confusion" is now gone. > UTF8, UTF16, and UTF32 all encode exactly the Char production > in the YAML spec. Nothing more, nothing less. OK. > Of course we cannot dictate what Unicode encoding a language > binding uses, it could be UTF8 (Perl). In C, wchar_t is often > either UTF16 or UTF32, depending upon the unix platform. OK. I'll just add wording to that effect in the description of characters. It seems as though we have everything settled, except for the class notation issue. If my compromise proposal above is acceptable, we could roll version 1.0 out the door somewhen next week... Have fun, Oren Ben-Kiki |
From: Jason D. <ja...@in...> - 2001-06-19 15:04:52
|
> It seems as though we have everything settled, except for the > class notation > issue. If my compromise proposal above is acceptable, we could > roll version > 1.0 out the door somewhen next week... Just a quick note before I head off to work--I'll try to post more detailed responses to the issues of the past couple days later. I was hoping, though, that YAML took it's time to develop. What's the rush to get version 1.0 out there? We could release Yet Another Spec like everybody else or we could release a spec with working implementations in 4 or 5 major languages and a conformance suite that shows they all interoperate. We haven't started eating our own dog food yet and we need to do that to get some practical, hands on experience with the language and its APIs. XML didn't really need that because it leveraged off SGML but YAML is an enitrely new creature. What I was hoping to do within the next week was create a "web service" that accepted YAML documents with a certain structure and persisted them and also allowed them to be queried. I wanted to do this so that we could get some sort of interopability testing going between our different implementations and to get a real feel for any issues that might come up _before_ the 1.0 release. My thought was to create an issues list where any of us could post issues (in YAML, of course) so that we could track and respond appropriately to them. (I got this idea while reading the RELAX NG archives--their process is very admirable.) We don't have any complete, working tools yet that can create or process YAML and without those I don't see how you could possibly roll anything out with any degree of confidence in the spec or, more importantly, expect anyone else to have any confidence in it. Jason. |
From: Clark C . E. <cc...@cl...> - 2001-06-19 15:57:55
|
On Tue, Jun 19, 2001 at 08:04:40AM -0700, Jason Diamond wrote: | We haven't started eating our own dog food yet and we need to | do that to get some practical, hands on experience with the | language and its APIs. I whole heartedly agree here. No huge reason to rush. However, it is important to get some prototypes working in short order so that we can start eating our own dog food. | What I was hoping to do within the next week was create a | "web service" that accepted YAML documents with a certain | structure and persisted them and also allowed them to be queried. Very cool. What database are you going to use? ... On another thread, I'd like to be able to write... week: 2001-W25-2 # June 18-24 2001 link: *303 # Lone Ranger Video We could modify the "simple" production to exclude the pound sign... and thus the above would be equivalent to... week: % =: 2001-W25-2 #: June 18-24 2001 link: % =: *303 #: Lone Ranger Video ... date: 2001-JAN-02 ! org.yaml.Date # Today equivalent to... date: % =: 2001-JAN-02 !: org.yaml.Date #: Today Yes! I much prefer having the class/comment follow the value for one-line scalars. ... This would involve splitting the simple scalar into "single" and "multiline" variants, where the single line variant would be free of indicators. ... Jason, what do you think? I know you don't like the idea that a node looks like a scalar, but is really a map... is this that hard? It would also mean that people could come along and add a comment. Thus a program expecting a scalar would all of a sudden encounter a map (unless a comment stripper filter was enabled at parse stage). One thing is for certain, if we went this route, it would mean that people using YAML would definately have to "grok" the color mechanism. However, the coloring mechanism is one of our strong points, isn't it? Why not leverage it? Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-06-19 16:14:09
|
On Tue, Jun 19, 2001 at 11:59:01AM -0400, Clark C . Evans wrote: | | week: 2001-W25-2 # June 18-24 2001 | link: *303 # Lone Ranger Video | | We could modify the "simple" production to | exclude the pound sign... and thus the above | would be equivalent to... | | week: % | =: 2001-W25-2 | #: June 18-24 2001 | link: % | =: *303 | #: Lone Ranger Video I think I just fell in love with this short-hand mechanism. Sorry to be so wishy-washy. | date: 2001-JAN-02 ! org.yaml.Date # Today | | equivalent to... | | date: % | =: 2001-JAN-02 | !: org.yaml.Date | #: Today Hmm. value: !org.my.class # multi-line scalar to follow This is a multi-line scalar value it begins indented. value: % !: org.my.class #: multi-line scalar to follow =: This is a mult-line scalar value it begins indented. I think for this to work, the parser will have to have built-in options to "strip class" and "strip comment" colors. Ahh. parse( ... , char *strip[] ) /* built-in global constants */ const char *yaml_strip_comment[] = "#\0"; const char *yaml_strip_class[] = "!\0"; const char *yaml_strip_class_and_comment[] = "#\0!\0"; /* custom stripper, removes comments, map pairs with "key" for the key. This stripper leaves in classes. */ const char *custom_strip[] = "#\0key\0"; Thoughts? If, after stripping, a map has a single entry, "=", then it can be replaced with the value of =. Thus... key: % =: value becomes... key: value and key: % =: @ 1 2 becomes... key: @ 1 2 Just thinking... this might be dangerous... but then again, it is a filter operation, so it can be enabled and disabled. Really, this is moving into "yaml tree" operations. Where each of these operations may be implementable as a filter and will have particular properties. Kinda interesting, hmm. ;) Clark |
From: Brian I. <briani@ActiveState.com> - 2001-06-19 22:27:14
|
Hello all, I don't know which message to reply to so I'll just jump in clean. My basic philosphy is let's get a simple spec out and start implementing something. I actually have several projects and demos using YAML that are scheduled for this summer. An initial C implementation is vital. We don't need a shortcut syntax for coloring, and we don't need comments. Let's just Keep It Simple Sweety. I'm no expert but I agreed with Clark about removing [MIME] binary syntax from the data model. We can just have it be a well supported color. And we don't *need* optional ':' for bulleting lists. Let's define just enough to get rolling, without precluding any of the above ideas in the next spec. Ingy is a dog. He's getting hungry. Arf, arf... Cheers, Ingy -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-06-21 16:21:44
|
On Tue, Jun 19, 2001 at 03:28:11PM -0700, Brian Ingerson wrote: | And we don't *need* optional ':' for bulleting lists. Seeing how Oren likes the ":", can I propose that we find another character for this bulleting of lists? Perhaps we can use the star? It's unlikely that any string would begin with a star. bulleted-list: * This is the first string in the list * This is the second string in the list Best, Clark |
From: Jason D. <ja...@in...> - 2001-06-20 14:00:15
|
> | What I was hoping to do within the next week was create a > | "web service" that accepted YAML documents with a certain > | structure and persisted them and also allowed them to be queried. > > Very cool. What database are you going to use? SQL Server 2000 but that shouldn't matter since it'll have a web "interface" (using HTTP and YAML). Thus, it should be possible for all of us to write front ends in our favorite languages that use the same database. My goal is to get some experience serializing and deserializing native objects since that's what the messages, like SOAP, will consist of. > On another thread, I'd like to be able to write... > > week: 2001-W25-2 # June 18-24 2001 > link: *303 # Lone Ranger Video > > We could modify the "simple" production to > exclude the pound sign... and thus the above > would be equivalent to... > > week: % > =: 2001-W25-2 > #: June 18-24 2001 > link: % > =: *303 > #: Lone Ranger Video > > ... > > date: 2001-JAN-02 ! org.yaml.Date # Today > > equivalent to... > > date: % > =: 2001-JAN-02 > !: org.yaml.Date > #: Today > > Yes! I much prefer having the class/comment follow > the value for one-line scalars. This is much prettier than the prefix-style shortcuts. > This would involve splitting the simple scalar into > "single" and "multiline" variants, where the single > line variant would be free of indicators. > > ... > > Jason, what do you think? > > I know you don't like the idea that a node looks like > a scalar, but is really a map... is this that hard? > It would also mean that people could come along and > add a comment. Thus a program expecting a scalar would > all of a sudden encounter a map (unless a comment > stripper filter was enabled at parse stage). > > One thing is for certain, if we went this route, it would > mean that people using YAML would definately have to > "grok" the color mechanism. However, the coloring mechanism > is one of our strong points, isn't it? Why not leverage it? I don't think it's that hard. And I really like the color idiom. I'm just wary of whether or not it would be a good idea to "complicate" the simple syntax that YAML has with these shortcuts. There could be absolutely no support for the shortcuts and yet people could still color their nodes as they see fit with just "pure" YAML. Personally, I see no need for comments or a need to standardize them so that they can be filtered. But that doesn't mean that we shouldn't have them. But it should be possible to write a processor that can accept a list of keys to filter (via the command line or whatever) and this would make it possible for people to filter '#' or any other key for that matter without having to give special semantics to arbitrary key names. If you do keep comments, though, I would highly recommend you not call them comments since they are so unlike comments in every other language that I believe it would be misleading. Here's a thought: What if # was a prefix for a (possibly empty) "comment" key. If you like your (postfix) notation for comments and want to keep it then it would color the scalar with a # key like you showed above. But, multiple comments could be added by the user as long as they start with #. date: % =: 2001-JAN-02 !: org.yaml.Date #: Today #format: This should be in the format of yyyy-mmm-dd. So all keys starting with # could easily be filtered. I'm not sure that I like this, especially considering what I just got done saying about my lack of interest in comments but I thought that I'd throw it out there in case you might like it. We could similarly allow multiple classes. date: % =: 2001-JAN-02 !: org.yaml.Date !Java: java.lang.Date !C#: System.DateTime #: Today #format: This should be in the format of yyyy-mmm-dd. Could either the prefix or postfix syntax handle allowing named classes or comments? As I was saying above, my whole objection to the color shortcuts is that it complicates the syntax for highly specialized purposes. What if we could generalize it to allow any type of named color so that expansion in the future is possible without modifying the syntax or having to reserve indicators? How about using a special prefix character (like ~) to signify "color". date: % ~org.yaml.value: 2001-JAN-02 ~org.yaml.class: org.yaml.Date ~org.yaml.class.Java: java.lang.Date ~org.yaml.class.C#: System.DateTime ~org.yaml.comment: Today This is another idea that I'm just throwing out there to get your creative juices flowing. It's obviously ugly and verbose but it's pretty general. Not only can the YAML maintainers add new "standard" color but so can everybody else. Applications can just ignore color that they don't understand. Bye, Jason. |
From: Clark C . E. <cc...@cl...> - 2001-06-19 15:19:42
|
On Tue, Jun 19, 2001 at 04:30:12PM +0200, Oren Ben-Kiki wrote: | So, instead of having multiple, cooperating specs, we'll have | incremental spec versions (version 1 would be just the core; | version 2, the core + the native API; version 3, core + native | + incremental API, etc.)? I'm not aware of any other system | using this method (not to say that this is necessarily bad). Ok. I was thinking that 1.0 would be the "full-release" and that 0.1, 0.2 would be our versions until we have an integrated whole. This at least warns people that the spec is still under development and may change... although changes to the serilization syntax will be unlikely. With each "pre-release" we can denote what sections are "stable". A problem with the XML spec is that it is seperate from the namespace specification. XML 1.0 + Namespaces is a very different monster than XML 1.0. Having them in seperate specifications creates confusion. Seriously, I can submit a XML 1.0 compliant document: <x:doc /> and it will be rejected by most XML 1.0 compliant parsers. Also, we may need to change some of the wording to be consistent with later sections, add clarifications, etc. This adequately reflects the inter-twined nature of the recommendations. I'm really not comfortable that we will have the spec "right" until the APIs are very fleshed out. If we publish the serializaiton syntax first and then find out later we need to change it... we will be tempted to hack our API instead. And I don't want to see this happen. | It has the benefit of allowing us to choose which section to | add each time (e.g., version N+1 might just add language | bindings for a certain new language, nothing else). Hmmm. Sure. Version 1.1 or 1.2 can add bindings beyond the "initial set of bindings". Or, alternatively, we can only include the "C" language binding and then have an appendix for each language binding. | > So... let's finalise on the binary/unicode issue. | > There does not seem to be a good solution at the syntax | > level, so let's punt the issue for now. Perhaps the API | > may have a binary/unicode distinction, but let us leave | > the YAML 1.0 file format unicode only. | | But keep the binary block format, right? I'm thinking the following instead.... document: % !: application/ms-word ^: base64 =: DKFK5kskfDKF30KS~DKE9bd LDF3kdFKs...DKFEf== I'm liking the idea of an "encoding" color (^) which describes how a value is encoded into Unicode(ASCII). purchased: % !: org.yaml.Date ^: iso8601 =: 2001-W05-34 We can then issue a few "sets" of standard encodings that we allow, including floating point and/or currency encodings. I think it is important to separate encoding from the class, as the encoding is a particular way to "fix" the class in unicode. ... Regarding the class abbreviation mechanism, where map: !class value is a short hand for map: % !: class =: value The following questions emerge: a) Are classes so common that they merit a shorter abbreviation mechanism? Clark: Let's get the spec out without the abbreviation mechansim and see if it is too ugly for common use cases. b) Are we sure that we don't want class as part of the information model? Clark: Yes. For (possibly future) languges without the class notion or with a different notion of classes this privledged position becomes questionable. XSLT, for instance has templates, are these 'classes'? Further, this prevents the model from being recusive, in other words, it becomes impossible to color a sgiven class. c) If class is not part of the information model, and we have this abbreviation, do we give other standard colors the same short-hand? Clark: If classes have the abbreviation mechanism, then for consistency, other standard colors should have the same right. d) Do classes and values even look good on the same line? In other words, does the short-hand increase readability to begin with? Clark: For single one-offs it appears to, but only when there isn't deep indentation and only when the value and class are short. This may be such a small use case that it isn't worth it. | map: % !:Class | key: Value Boy... getting busy. map: % !: Class =: Value Conclusion: The above (or the first version) is so much more clear. Let's just stick with the "verbose version" until we have enough use cases to dicate an abbreviation? | > ... this would allow for another section (very small) | > called "information model". | | I thought that was already covered in the current core spec. | Do you want to take it out? Why? I was just thinking that the sections would be: 1. Introduction 2. Information Model 3. Unicode Serilization 4. Color Mechanism 5. Canonical Form 6. Incremental API 7. Native API 8. Bindings Where the inforamtion model is a seperate "chapter" from the serialization format. In particular, I don't like the idea that the serialization format is "core" but the Incremental API isn't core... This also leaves the door open for a "Binary Serialization". ;) | So the Unicode/Binary issue is settled. Good. Well, if we move with an "encoding color". The API can reflect the encoding. "base64" could be an encoded leaf, where "binary" could be a raw leaf. The default encoding would be "unicode"? Thus, this brings up another question. Is there two information models? The base information model and then the information model plus "standard" colors? | > I think we can discuss the canonical form when looking | > at the writer code in the impl I'm (slowly) working on. | | So, let's just put it in a separate section. Nice. | It seems as though we have everything settled, except for the | class notation issue. If my compromise proposal above is acceptable, | we could roll version 1.0 out the door somewhen next week... Let's just implement without an abbreviation mechansim and leave this as an open issue. I have a funny feeling for common use cases, the abbreviation will be more trouble than it is worth. Only one way to find out... get an implementation out there and ask our users. Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-06-20 06:55:35
|
Hi Brian, welcome back. Now that you are here, let's settle some issues. A. Spec releases: (1) One evolving spec, with sections becoming "stable"; (2) A series of related specs. Clark favors (1), I favor (2), and it seems Brian favors (2). I'm not certain what Jason's position is - though we all agree with his point that we don't want to prematurely freeze specs. I think that most of the issues can be addressed if we do (2) but assign the version number "0.1" to our first release of each. This allows us to start implementing, but with the warning that 1.0 may be slightly different. If we discover we need to change things between "0.1" and "1.0" - to make specs work together, or just change our minds - we can introduce a "0.2" version, etc. Once we have enough of the related specs written, and enough implementation experience, we'll promote the latest version of each spec to "1.0". This means that "1.0" will be, from day one, a set of related and consistent specs. As for the list of specs, Clark's enumeration is a good base. I think we need something along the lines of: - File Format (Intro, Info model, Serialization). - Color Idiom (Standard colors, perhaps shorthands). - Canonical Form. - Incremental API (Language independent, or maybe C). - Incremental API for language X. - Native API for language X. We can do 0.1 of the first one - file format - right now; this would allow us to start working on the native APIs for Perl (Data::Denter), Python, etc. I think that as a rule, we shouldn't release a spec for an API without a reference implementation - thoughts? B. Binary vs. Unicode: (1) Put it in the information model; (2) Demote it to a color. Clark and Brian favor (1), I favor (2). I'm willing to go with (1). I'll have more to say when we define the particulars of this color. We have to ensure that "class" reflects "how will the object be represented in memory" and "encoding" reflect "how does the object appear in the YAML file". Otherwise we quickly get into trouble. One good example for this is the class "date". It is possible to "encode" one as 10-JAN-2001, 01/10/2001, 2001-10-01, etc. I agree with Brian, however, that we should keep the definitions of all the special colors out of the file format spec. Let's just reserve all single-indicator-character keys in it and move ahead. C. Shortcut syntax for colors: (1) Allow it; (2) Don't support it. Clark favors (1), and so does Jason (I think). Brian and I favor (2). I think we should go on with (2) for now. When we have a better grasp of what exactly the standard colors are, we can decide on shorthand syntax for them and introduce it in the 0.1 color spec or the 0.2 file format spec. D. File format: Given the above, it seems we have everything settled except for perhaps Brian's point with: > .. And we don't *need* optional ':' for bulleting lists. I won't let that stand between us and getting a 0.1 version out... If you feel that strongly about it (even as an option), I'll take it out. So, quick poll: Should I roll out a "YAML-FILE-FORMAT 0.1 Release Candidate" this weekend? (MY fingers are already itching to do it... :-) Have fun, Oren Ben-Kiki |
From: Jason D. <ja...@in...> - 2001-06-20 14:17:14
|
> (1) One evolving spec, with sections becoming "stable"; > (2) A series of related specs. > > Clark favors (1), I favor (2), and it seems Brian favors (2). I'm not > certain what Jason's position is - though we all agree with his point that > we don't want to prematurely freeze specs. Looking at your list below, I'm in the middle. The first three (file format, color idiom, canonical form) should be in the same spec. The APIs should be seperate. > I think that most of the issues can be addressed if we do (2) but > assign the > version number "0.1" to our first release of each. This allows us to start > implementing, but with the warning that 1.0 may be slightly > different. If we > discover we need to change things between "0.1" and "1.0" - to make specs > work together, or just change our minds - we can introduce a > "0.2" version, > etc. Once we have enough of the related specs written, and enough > implementation experience, we'll promote the latest version of > each spec to > "1.0". This means that "1.0" will be, from day one, a set of related and > consistent specs. I like this. > As for the list of specs, Clark's enumeration is a good base. I think we > need something along the lines of: > > - File Format (Intro, Info model, Serialization). > - Color Idiom (Standard colors, perhaps shorthands). > - Canonical Form. > - Incremental API (Language independent, or maybe C). > - Incremental API for language X. > - Native API for language X. > > We can do 0.1 of the first one - file format - right now; this would allow > us to start working on the native APIs for Perl (Data::Denter), > Python, etc. > I think that as a rule, we shouldn't release a spec for an API without a > reference implementation - thoughts? We should have at least _5_ (thinking of all the languages that the four of us know) implementations and a conformance suite. Are we going to be checking these in to CVS on SourceForge under a common license? > > B. Binary vs. Unicode: > > (1) Put it in the information model; > (2) Demote it to a color. > > Clark and Brian favor (1), I favor (2). I've changed my mind to (2) as I think it should be possible to use encodings other than base64. > I'm willing to go with (1). I'll have more to say when we define the > particulars of this color. We have to ensure that "class" > reflects "how will > the object be represented in memory" and "encoding" reflect "how does the > object appear in the YAML file". Otherwise we quickly get into > trouble. One > good example for this is the class "date". It is possible to > "encode" one as > 10-JAN-2001, 01/10/2001, 2001-10-01, etc. But we should follow XML Schema's example and mandate one particular encoding for transfer (ISO 8601). User applications are, of course, free to display it in whatever format they choose. > I agree with Brian, however, that we should keep the definitions > of all the > special colors out of the file format spec. Let's just reserve all > single-indicator-character keys in it and move ahead. I'm not sure if this is a good idea. Think of XML and Namespaces. Since they're divided into two specs, there's two classes of parsers: those that do and those that don't support namespaces. We don't want to have parsers that support the standard colors and those that don't, do we? > C. Shortcut syntax for colors: > > (1) Allow it; > (2) Don't support it. > > Clark favors (1), and so does Jason (I think). Brian and I favor (2). I guess I need to be more clear when I write. I favor (2) but am open to the idea of (1) if you all see fit. My experience with writing an XML parser has really turned me off of special cases in the syntax. > So, quick poll: Should I roll out a "YAML-FILE-FORMAT 0.1 Release > Candidate" > this weekend? (MY fingers are already itching to do it... :-) Yes, please. It shouldn't be too hard to update my C# parser once you do but I'd really like to do some interopability testing with the rest of you. This is where we can start building the set of files that we can use for a conformance suite! Jason. |
From: Oren Ben-K. <or...@ri...> - 2001-06-20 14:14:54
|
Clark C . Evans [mailto:cc...@cl...] wrote: > date: 2001-JAN-02 ! org.yaml.Date # Today > > equivalent to... > > date: % > =: 2001-JAN-02 > !: org.yaml.Date > #: Today > > Yes! I much prefer having the class/comment follow > the value for one-line scalars. This, of course, means that simple scalars can't contain any indicator characters. No more being able to write: text: Alas! This is now illegal! Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-06-20 14:45:46
|
Jason Diamond [mailto:ja...@in...] wrote: > > We can do 0.1 of the first one - file format - right now; > > this would allow > > us to start working on the native APIs for Perl (Data::Denter), > > Python, etc. > > I think that as a rule, we shouldn't release a spec for an > > API without a > > reference implementation - thoughts? > > We should have at least _5_ (thinking of all the languages > that the four of > us know) implementations and a conformance suite. A conformance suite is a neat idea. As for requiring 5 implementations, I thought of each language as having its own API binding (sub) spec. So we can do this incrementally. > Are we going to be > checking these in to CVS on SourceForge under a common license? Good question. GPL, LGPL, BSD, Mozilla, Artistic, ... So many to choose from... > > B. Binary vs. Unicode: > > > > (1) Put it in the information model; > > (2) Demote it to a color. > > > > Clark and Brian favor (1), I favor (2). Sorry, I meant the other way around, of course. > I've changed my mind to (2) as I think it should be possible to use > encodings other than base64. Yes, this seems to be the way the wind is blowing. > ... we should follow XML Schema's example and mandate one particular > encoding for transfer (ISO 8601). User applications are, of > course, free to > display it in whatever format they choose. I thought we agreed on UTF-8/16 as the "particular encoding for transfer"... > > ... Let's just reserve all > > single-indicator-character keys in it and move ahead. > > I'm not sure if this is a good idea. Think of XML and > Namespaces. Since > they're divided into two specs, there's two classes of > parsers: those that > do and those that don't support namespaces. We don't want to > have parsers > that support the standard colors and those that don't, do we? That's why I suggested we only promote the version of the specs to 1.0 when we have a reasonably baked, implemented set of specs; this way YAML-COLOR-1.0 won't be an optional, come-later spec, it would be part of the YAML 1.0 spec from day one. As for parser support, there isn't anything to it, really. I'll just change the file format (sub) spec to state that indicator characters are legal in keys. This means every parse will support such keys from day zero (that is, even before "YAML 1.0" is announced:-). We'll throw in a comment that any key starting with an indicator character may be given a "standard meaning" in the color (sub) spec, so people will avoid them in the meanwhile. So unlike XML's namespaces, there's no danger of incompatibility or "levels of parsers". Have fun, Oren Ben-Kiki |
From: Jason D. <ja...@in...> - 2001-06-20 15:04:11
|
> > ... we should follow XML Schema's example and mandate one particular > > encoding for transfer (ISO 8601). User applications are, of > > course, free to > > display it in whatever format they choose. > > I thought we agreed on UTF-8/16 as the "particular encoding for > transfer"... Yes, but ISO 8601 (and the W3CDTF which is a subset of 8601 at http://www.w3.org/TR/NOTE-datetime) define the sequence of characters used to represent a date/time. > > > ... Let's just reserve all > > > single-indicator-character keys in it and move ahead. > > > > I'm not sure if this is a good idea. Think of XML and > > Namespaces. Since > > they're divided into two specs, there's two classes of > > parsers: those that > > do and those that don't support namespaces. We don't want to > > have parsers > > that support the standard colors and those that don't, do we? > > That's why I suggested we only promote the version of the specs > to 1.0 when > we have a reasonably baked, implemented set of specs; this way > YAML-COLOR-1.0 won't be an optional, come-later spec, it would be part of > the YAML 1.0 spec from day one. > > As for parser support, there isn't anything to it, really. I'll > just change > the file format (sub) spec to state that indicator characters are legal in > keys. This means every parse will support such keys from day zero > (that is, > even before "YAML 1.0" is announced:-). > > We'll throw in a comment that any key starting with an indicator character > may be given a "standard meaning" in the color (sub) spec, so people will > avoid them in the meanwhile. > > So unlike XML's namespaces, there's no danger of incompatibility > or "levels > of parsers". Cool! Jason. |
From: Clark C . E. <cc...@cl...> - 2001-06-20 15:10:56
|
On Wed, Jun 20, 2001 at 04:46:22PM +0200, Oren Ben-Kiki wrote: | A conformance suite is a neat idea. +1 | > Are we going to be checking these in to CVS on | > SourceForge under a common license? | | Good question. GPL, LGPL, BSD, Mozilla, Artistic, ... So many to choose | from... Yes. yaml.sourceforge.net -- I can add you as a developer, Jason! Do you have a SF userid? As for the license, I think each binding should match what the language is in. For python, it should be the python license. For perl, the artisitc license, etc. For Java, you could make it BSD or something not too restrictive. As for the C implementation, I think this will have to be cross-licensed... Artistic, Python, BSD, etc. I'm not all that fond of GPL since I'd like our stuff to be used by proprietary software vendors as well as free software folks. | > > B. Binary vs. Unicode: | > | > I've changed my mind to (2) as I think it should be possible to use | > encodings other than base64. | | Yes, this seems to be the way the wind is blowing. Good. | I thought we agreed on UTF-8/16 as the "particular | encoding for transfer"... We did. It is UTF-8 unless there is a BOM, see the spec. | > > Let's just reserve all single-indicator-character | > > keys in it and move ahead. | > | > I'm not sure if this is a good idea. Think of XML and | > Namespaces. Since they're divided into two specs, | > there's two classes of parsers: those that do and | > those that don't support namespaces. We don't want to | > have parser that support the standard colors and | > those that don't, do we? | | That's why I suggested we only promote the version of | the specs to 1.0 when we have a reasonably baked, implemented | set of specs; this way YAML-COLOR-1.0 won't be an optional, | come-later spec, it would be part of the YAML 1.0 spec | from day one. Ok. It seems that we are on the same page here. Although, I'd rather it one big file so that it is clear that it is one unified spec rather than two distinct specs. Nothing saying that the spec can't be divided into "Articles". | > date: 2001-JAN-02 ! org.yaml.Date # Today | | This, of course, means that simple scalars can't contain any | indicator characters. No more being able to write: | | text: Alas! This is now illegal! Might be worth it, as one can always use the quoted form: text: "Alas! This is still legal!" The only problem is that I'd like HTML to be pastable literally... html: <HTML><HEAD><TITLE>Embedded HTML!</TITLE></HEAD> <BODY COLOR="FFFF">This is YAML & HTML</BODY> </HTML> To do this, we could distinguish between "single line" and "multi line" simple scalars again, excluding the case below: mixed: This would no longer be legal. Thoughts? Clark |
From: Oren Ben-K. <or...@ri...> - 2001-06-20 15:14:37
|
Jason Diamond [mailto:ja...@in...] wrote: > > > ... we should follow XML Schema's example and mandate one > particular > > > encoding for transfer (ISO 8601). User applications are, of > > > course, free to > > > display it in whatever format they choose. > > > > I thought we agreed on UTF-8/16 as the "particular encoding for > > transfer"... > > Yes, but ISO 8601 (and the W3CDTF which is a subset of 8601 at > http://www.w3.org/TR/NOTE-datetime) define the sequence of > characters used > to represent a date/time. Sorry - I mixed it up with the character set specs. Hmmm. Is it really possible/practical/useful to define the "one true encoding" for every data type? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-06-20 15:16:07
|
Clark C . Evans [mailto:cc...@cl...] wrote: > I'm not all that fond of GPL since I'd like our > stuff to be used by proprietary software vendors > as well as free software folks. Yeah, LGPL would be the maximum, I guess. > | That's why I suggested we only promote the version of > | the specs to 1.0 when we have a reasonably baked, implemented > | set of specs; this way YAML-COLOR-1.0 won't be an optional, > | come-later spec, it would be part of the YAML 1.0 spec > | from day one. > > Ok. It seems that we are on the same page here. Although, > I'd rather it one big file so that it is clear that it > is one unified spec rather than two distinct specs. Nothing > saying that the spec can't be divided into "Articles". It is easier to manage a set of files... Links, prints, etc. > To do this, we could distinguish between "single line" > and "multi line" simple scalars again, excluding the > case below: > > mixed: This would no > longer be legal. > > Thoughts? I don't think it is worth it. Also, we didn't resolve the main problem we had with the shorthand notation - it makes a map look like a scalar, and hence surprise people. At the time this was deemed to be a serious problem. Are we comfortable with it again? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-06-19 14:11:23
|
On Tue, Jun 19, 2001 at 10:11:10AM +0200, Oren Ben-Kiki wrote: | Let's finalize the YAML 1.0 spec purely as a file format (plus requirements | from the data model, of course). Drop all the API stuff, the discussions of | color, etc. Give up on the !class shorthand, but we'll reserve all | single-indicator-character keys for "special semantics" to be defined in | separate API specs. Call this spec YAML-CORE. I could probably finalize it | in a weekend (given we settle the binary/Unicode issue). I'd rather have this all be one-spec to give the sense of unity. That being said, there is nothing preventing this spec from having multiple sections, and adopting each section independently. So... let's finalise on the binary/unicode issue. There does not seem to be a good solution at the syntax level, so let's punt the issue for now. Perhaps the API may have a binary/unicode distinction, but let us leave the YAML 1.0 file format unicode only. As for dropping the class notation. I spoke with Brian on the phone yesterday and he seemed resistant to this idea. So, a bit more thought may have to go into this. | There are going to be two types of APIs. Let's call them "Native" and | "Incremental" APIs. I suggest we spec them in a separate document from the | core YAML spec. First, we'll close specs faster. Second, including just one | API in the core spec would give it more importance then the other, which | would be wrong. Each of them has its place. Ok. | The "Native" API is a simple load/save API using the language native data | structures. We'll spec this quickly (once per language?), and probably | implement it as quickly. We'll end up with a working, very useful, YAML | support for Perl (Data::Denter), Python, C#, Java, JavaScript, and possibly | C++ (based on stl; a bit of a challenge, unless we use RTTI). I suggest that | to make quick progress, we just include a simple definition of v/w in such | specs (as a library function), but not waste too much time hammering out the | details of color behavior. Call this the YAML-NATIVE-API spec. Ok. | We then move our focus on defining the "Incremental" API, which is a | (mostly) language-independent push/pull API spec which fully supports color | etc. This would be the YAML-INCREMENTAL-API spec. Much fun here... (I admit | to not catching up on the work you've done there yet). Ok. Your feedback on the yaml.h file would be good... | In a fourth spec we'll go through the color idiom, what the standard colors | are, what standard filters an implementation should/could provide, etc. Call | this YAML-COLOR. Right. | Quite a job, right? But it drives home the point we simply can't try to do | it all in one spec. We'll never finalize it; we'll be debating an | YAML-INCREMENTAL-API issue, say, and in the meanwhile we are not releasing a | YAML-CORE spec - because both are included in the same document. Well... I think I'd like to have them in separate "sections" of the same specification. We can finish the serialization format and label this section FINAL, while leaving the other sections open. Also, this would allow for another section (very small) called "information model". | I'd want v/w to be a "must" part of the Native API, in the sense | that "you aren't YAML if you don't provide them" (they are trivial | enough to implement). But they are clearly optional in the sense | that you don't have to use them. Right. | > On the subject of encoding. I'm starting to like the | > idea of using a ^ indicator for this purpose. | | OK, wait a sec. You've written: | | > I was taking a walk with my g.f. and discussing this item, | > it seems that there is a triparted situation: | > | > | Java / C# | Python | Perl | | > ----------+------------+---------+----------------------------+ | > ASCII | String | String | Scalar (UTF8 or Otherwise) | | > UNICODE | String | Unicode | UTF8 Marked Scalar | | > BINARY | Byte [] | String | Not UTF8 Marked Scalar | | | So, if I'm reading this right, you are suggesting we distinguish between | "ASCII text" and "Unicode text". In the information model. Hmmm. I respectfully withdraw the ASCII suggestion. | I predict that with time the problem will slowly fade away. | Perl and Python will come to terms with Unicode. So, with | time the accuracy of our heuristic (16-bit == Text, | 8-bit == binary) would rise. The "override" capabilities | would be used less and less, and eventually we'll put the | problem behind us. Ok. | I've no problem with defining a canonical form (4 space indentation, | multi-lines values start on a line of their own, use the simplest possible | text style, put a single space between the key and the following ':', one | space between the ':' and the value, use "most folded" white space for text, | break lines after 76 characters but include at least 32 text characters, use | prefix ':' in lists only for multi-line simple scalars, align multi-line | quoted string text by one space as in our examples - I think that covers | it). The problem is that at least some of these decisions are a matter of | taste. It would be different if we defined the canonical form as "the form | with the minimal number of characters" - but then it may not end up to be | very readable. And of course we may simply not define one at | all. Thoughts? I think we can discuss the canonical form when looking at the writer code in the impl I'm (slowly) working on. We can codify this... and then debate if the right balance was chosen. As far as the prefix ":" this will definately not be in the canonical form as Brian has objected to it. For multi-line scalars in a list either the block or quoted form will be used ... | As for 32-bit characters, I haven't done my homework on that. | What does this mean, exactly? Is it OK to say that YAML uses | 16-bit characters and that the 32-characters are always | represented as a "surrogate pair"? Yes. We are using Unicode, so one may use UTF8, UTF16, or UTF32 as an internal character model; they are all encodings of the same character set, UCS4. Recently USC4 has been restricted to those code values expressable by Unicode, appx 2^21 characters. Thus the ISO universal character set is now the same character set used by Unicode. So much of this "confusion" is now gone. UTF8, UTF16, and UTF32 all encode exactly the Char production in the YAML spec. Nothing more, nothing less. Of course we cannot dictate what Unicode encoding a language binding uses, it could be UTF8 (Perl). In C, wchar_t is often either UTF16 or UTF32, depending upon the unix platform. I understand that the unicode/ucs values over 0xFFFF have not been assigned, although I have heared that many Chinese characters which are not yet represented by the universal character set will be put into this region. Best, Clark |