From: Oren Ben-K. <or...@ri...> - 2001-05-29 17:06:00
|
Hi Guys, I'm back at last from my convention (which was lots of fun, and educational too). I finally had the time to catch up on the 50 messages in this list since I left... You two have sure been busy, and doing a great job. I also went through the YAML spec draft - great going, Clark - first "on its own" and second after reviewing the messages for context. So, at the risk of this becoming a long message :-), here's my view of the current state. === Rationale === A good rationale is harder to write than the original spec. On the other hand, a spec with a good rationale is much more useful. I think we should give this some thought. We have the advantage we don't meet face-to-face, so this list provides a good record of how each point was decided... === Encoding/Binary === It seems as though the problem here is that we are trying to handle two separate "data types" in the same way. A string of (Unicode) characters is a different beast than a binary blob of bytes. We have decided that YAML will be "printable" - that is, a YAML file is a string, using only the "printable" subset of the Unicode characters. Implicitly this means it may contain characters beyond 7-bit ASCII - but these are still Unicode characters and not arbitrary binary bytes. Therefore, when a binary blob is written into YAML format it must be encoded. Clark suggested using base64 as the universal way to achieve that. This seems reasonable, with the objection that it is a horribly wasteful way to encode binary blobs under UTF-16. I wonder if there a base4096 or something for that case? I never heard of one... At any rate, note that you can't achieve the same effect by using something "\xXX" or "\uUUUU". These escape sequences still denote text *characters*, not blob *bytes*. This distinction isn't automatic to old-time C programmers (like myself) used to "char == byte". But it is essential to make it, otherwise things get very messy. Clark suggested that all scalar syntaxes are to be equivalent - it doesn't matter if one is using base64, quoted string, block or "simple value"; the in-memory result is the same. As shown above, this can't be true for *writing* values; a binary blob may only be written in base64. Why, therefore, not make the same distinction when *reading* values? Supposing the information model does make this distinction, then Brian's (wonderful) YAR format becomes possible, regardless of encoding. If a file contains only or "mostly" printable characters, then emit it as a text block. Otherwise, emit it as a base64 blob. This requires a two-pass algorithm through the file, but I think there's no helping that. In the Mac and Be, if the MIME type is text/*, emit it as a block, otherwise as a base64 blob. How to make this distinction in the data model is an open issue, of course. In Java, which was born Unicode-aware, there's no problem distinguishing between a String and a byte[]. I assume something similar may be done in Python and Perl. C is trickier - I don't quite see how to work it into the API - but C programmers already know that "char = byte" doesn't mean "string = blob"; a string ends with a \0, a blob has a length. === Top Level Production === I see we are back to "list of maps, separated by blank lines". Great. The API basics given in the YAML spec aren't explicit on how this translates to actuall calls. I'd expect that the "next()" calls at the top-level of the parser will return a map node, one per each "map block" in the file. As for the emitter, I'd expect that repeatedly calling "begin()" and "end()" on the top-level cursor will emit multiple map blocks, but the text doesn't seem to support this. I think that this should be made explicit. === Classes and Color === Clark made a good case against comments - we can't use them because either they aren't part of the information model, and won't round-trip; or they round-trip and hence must be a part of the data model. He's right. The same argument carries over to classes. The current spec states that if a class is not recognized, a warning is emitted and the class is ignored. This is unacceptable - consider a YAML pretty-printer, it will recognize a very small set of classes, if any. Yet it must preserve the class names. The problem seems a classical place to use the color idiom. We have a piece of information - the class name - which we want to attach to any nodes in the YAML document, including scalar nodes. Some applications (the pretty printer) aren't interested in this information, but must preserve it. Other applications (e.g., a YAML-based application server) rely on this information. There's no problem for attaching such information to map nodes. Use some special key for the class info - say, '#' - and you are done. For scalar and list nodes, the color idiom suggests you wrap them in a map node. Use the same key for specifying your "color", and another special key (say, '=') to specify the *value* of the node: delivery: % = : 2000-JAN-10 # : date The trick is that the random-access APIs should allow you to call "getValue()" on the 'delivery' node and obtain the date value. This is acceptable to an interface such as XML's DOM. It probably could work in languages dynamic enough to hack the type system into doing this implicitly: de-serialize 'delivery' into something which *behaves like* the string '2000-JAN-10' but is *at the same time* a map with two keys. Complex? Sure, but Brain said, "don't worry about the implementation" :-) Seriously, I think there's cause to worry here. The color idiom has turned out to be the solution for many practical problems - not the least of which is schema evolution. Maybe you guys can come up with a better way to do it then wrapping scalar/list nodes in a map; I couldn't find any. But I strongly feel that YAML should address this somehow - and that classes are the perfect way to "eat our own dog food" in this regard. === Text syntax === Do we really need so many different formats? *4*? I see why we'd want base64 - that's not really a text format, anyway. I can see why we'd want one format for cut-and-paste and one format for allowing escaping. That would be the block format and either the string or the simple scalar format. There's also the issue of multi-line values allowed in a map but not in a list. So we actually have 4.5 formats (3.5 for text, one for blobs). Do we really need all of these? Can't we make due with just one block format and one stream format? I also strongly dislike the fact that indentation rules are suspended in a string value. Why is that? I see the pain - one won't be able to, say, quickly skip a sub-tree by simply looking for a properly indented line; readability suffers, etc. I don't see the gain: Cut&paste by itself isn't a good enough answer - if you are doing it by hand, any editor will easily indent the new block of text, and if you are doing it by a tool, well just add the indentation while you are at it. In E-mail messages Clark said that base64 blocks are indented, but the spec says otherwise. Which is right? What's the harm in indenting blobs? We already allow long lines in blocks... in fact, there's no longer a way to break a long line in a block into two shorter ones. I realize you had just about enough of this syntax issue, and that I'm fresh out of a 10 day vacation from it, but I think that either some strong rationale or some more work on this are due. I'll try to see if I can come up with any new ideas... === E-mail address === I'd rather you list me as or...@be... and not or...@ri... since YAML isn't very related to my day job and also because the other address is more stable - I hope to keep it for a long time to come, while companies come and go. Sorry for the long message - I had a lot of catching up to do. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-05-30 18:16:38
|
Clark C . Evans [mailto:cc...@cl...] wrote: > On Tue, May 29, 2001 at 07:06:42PM +0200, Oren Ben-Kiki wrote: > | Supposing the information model does make this distinction > > Ok. Let's make the binary/unicode distinction in > the information model. Brian? Can you say a word on how this would work in Perl? > | === Classes and Color === > ... > I'd like to hear brian's perspective on this one. I actually > kinda like it, although I'm partial since we had talked about it > for a long time on the sml-dev list. We would need to push forward > on the API notion of "getValue()" as returning the node with > a blank key. Building this mechanism in is very cool since it > allows for "schema substitutability". However, I'm not sure > how well it works for data serilization.... Hmmm. If one is serializing a built-in map/list/scalar, you do it normally; when serializing an "Object" it probably gets serialized into a map, with a class attribute. It is only "typed scalars" which are a problem (e.g., a number, a date, etc.). One problem about the proposed class-as-color syntax is that it is rather cumbersome for something like a simple number... > (on a technical note, I think "", a blank key, should be the > default value for a node, and perhaps "__class__" could be > the class name). I rather like '=' for 'value' - it has the right intuitive semantics. '__class__' for type is too verbose for my taste. > The primary rationale for the [ and " formats breaking > the indentaiton rules is to allow for easy cut and paste. I don't buy that. Cut&paste implies a tool - an editor or a program. Most editors allow you to trivially indent a group of lines - certainly any editor used for writing YAML had better support it. And a program doing cut&paste can add indentation easily enough. Besides, if "raw" cut&paste is that important, doesn't it apply to blocks as well? we can simply use: block: |===arbitrary marker line=== block text, raw, not indented at all, no prefix for the lines, cut & paste into it whichever way you want. |===arbitrary marker line=== \===arbitrary marker line (if no trailing newline)=== Perl people know this as the '<<EOF' approach :-) This would leave simple scalars as the only type of value which must be indented - and given that most of these are single-line, that hardly matters. You'd end up with files where every multi-line value is not indented, which I think will really ruin readability. I'd rather we stick with strict indentation for everything, including blocks, blobs and double-quoted streams. Separate issue: having both double-quoted and simple scalar values. Here's a proposal to use just one form: - Block: as today, but use ` instead of \ to mark the end line. - Scalar value: as today, allow escaping using \, always allow it to be multi-line, but the continuation lines must be more indented then the first line. - List: as today (no special marker), but allow : as an optional prefix for scalar values. Example: @ Ugly multi line text Pretty single line text with NL\n : Ugly single line text w/o NL : Pretty multi line text |multi line |block with NL ` |multi line `block w/o NL\n [base64 blob] Everything is always strictly python-indented. > I'm not going to be able to put much more time into > this in the next few months, as I'm a key player > in a start-up, xgenda.com, which hopes to publicly > launch our product by September. Hmmm. As I see it, we have at least one major issue to resolve (the class issue) plus another where a "dictator hat" may be required (the text syntax). Then we can all start doing development at our own pace... Have fun, Oren Ben-Kiki |
From: Brian I. <briani@ActiveState.com> - 2001-05-30 20:23:20
|
Oren Ben-Kiki wrote: > Sorry guys. I'm a little busy this week. I started to reply to Oren's message yesterday, but abandoned it because I couldn't give it the required brain power. I'll try to comment quickly. > Clark C . Evans [mailto:cc...@cl...] wrote: > > On Tue, May 29, 2001 at 07:06:42PM +0200, Oren Ben-Kiki wrote: > > | Supposing the information model does make this distinction > > > > Ok. Let's make the binary/unicode distinction in > > the information model. > > Brian? Can you say a word on how this would work in Perl? Perl is a little up in the air on Unicode. I believe that recent versions use UTF8 by default for *most* string operations. A byte mode can be set to default back to the good ole days. FWIW, I suggest using the single quote for Unicode data. (Get it? *Uni* code :-). Using Clark's backtick for binary base-64 data. Using '|,\' for blocks. Using double quotes for everything else (folded streams). Unquoted streams are a convenience option for single line values only. > > > | === Classes and Color === > > ... > > I'd like to hear brian's perspective on this one. I actually > > kinda like it, although I'm partial since we had talked about it > > for a long time on the sml-dev list. We would need to push forward > > on the API notion of "getValue()" as returning the node with > > a blank key. Building this mechanism in is very cool since it > > allows for "schema substitutability". However, I'm not sure > > how well it works for data serilization.... > > Hmmm. If one is serializing a built-in map/list/scalar, you do it normally; > when serializing an "Object" it probably gets serialized into a map, with a > class attribute. It is only "typed scalars" which are a problem (e.g., a > number, a date, etc.). One problem about the proposed class-as-color syntax > is that it is rather cumbersome for something like a simple number... Classes are a can of worms. Although I'm not against the color idea, Perl will never embrace the idea of having everything be an object. Using getValue() as the normal method of retrieving a simple value is just too much work for Perl people. A map should just become a hash, plain and simple. I can come up with some creative ways of preserving YAML classes in Perl transparently, but my plate's too full right now. (I'm trying to be a famous Perl guy in other realms, ya know ;) > > > (on a technical note, I think "", a blank key, should be the > > default value for a node, and perhaps "__class__" could be > > the class name). > > I rather like '=' for 'value' - it has the right intuitive semantics. > '__class__' for type is too verbose for my taste. Agree. > > The primary rationale for the [ and " formats breaking > > the indentaiton rules is to allow for easy cut and paste. > > I don't buy that. Cut&paste implies a tool - an editor or a program. Most > editors allow you to trivially indent a group of lines - certainly any > editor used for writing YAML had better support it. And a program doing > cut&paste can add indentation easily enough. > > Besides, if "raw" cut&paste is that important, doesn't it apply to blocks as > well? we can simply use: > > block: |===arbitrary marker line=== > block text, > raw, not indented at all, > no prefix for the lines, > cut & paste into it whichever way you want. > |===arbitrary marker line=== > \===arbitrary marker line (if no trailing newline)=== > > Perl people know this as the '<<EOF' approach :-) This would leave simple > scalars as the only type of value which must be indented - and given that > most of these are single-line, that hardly matters. You'd end up with files > where every multi-line value is not indented, which I think will really ruin > readability. I'd rather we stick with strict indentation for everything, > including blocks, blobs and double-quoted streams. I think it's nice to allow the cut-and-paste and let emitters always reformat. But I'll support strict for now. > > Separate issue: having both double-quoted and simple scalar values. Here's a > proposal to use just one form: > > - Block: as today, but use ` instead of \ to mark the end line. > - Scalar value: as today, allow escaping using \, always allow it to be > multi-line, but the continuation lines must be more indented then the first > line. > - List: as today (no special marker), but allow : as an optional prefix for > scalar values. > > Example: > > @ > Ugly multi > line text > Pretty single line text with NL\n > : Ugly single line text w/o NL > : Pretty multi > line text > |multi line > |block with NL > ` > |multi line > `block w/o NL\n > [base64 > blob] > > Everything is always strictly python-indented. Whatever :( If you are once again suggesting dropping double quotes, you're needlessly going down a road of pain. Double quotes are simple, intuitive and will always work for folded text (indented or not). > > > I'm not going to be able to put much more time into > > this in the next few months, as I'm a key player > > in a start-up, xgenda.com, which hopes to publicly > > launch our product by September. > > Hmmm. As I see it, we have at least one major issue to resolve (the class > issue) plus another where a "dictator hat" may be required (the text > syntax). Then we can all start doing development at our own pace... I'll volunteer as dictator :) I want to start doing something on this after June 15th or so. I hope we can resolve things by then. We were very close last week. Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-30 20:53:02
|
On Wed, May 30, 2001 at 01:23:41PM -0700, Brian Ingerson wrote: | Perl is a little up in the air on Unicode. I believe that recent | versions use UTF8 by default for *most* string operations. A byte mode | can be set to default back to the good ole days. | | FWIW, I suggest using the single quote for Unicode data. (Get it? *Uni* | code :-). Using Clark's backtick for binary base-64 data. Using '|,\' | for blocks. Using double quotes for everything else (folded stream). | Unquoted streams are a convenience option for single line values only. Hmm. This is interesting, instead of marking which nodes are "binary" we mark which nodes are "unicode". Not a bad compromise, as it is far easier with the language to know if something is unicode or not.... I like this, it closely mirrors what Python and C does. It is the unicode which is treated differently, allowing regular "char" strings and binary to still be used interchangeably. This has one *big* impact, though. A YAML document cannot be encoded using UTF-16 (via the BOM), although leaves can be encoded with UTF-8. Also, I'm not sure that I like limiting unquoted streams to a single line. Being able to cut/paste in a HTML text without having to escape the quotes and such is very valueable. Further, I thought we had agreed on [base64] instead of the back tick. I *really* like the unicode idea though... it solves alot of problems rather cleanly. | Classes are a can of worms. | | Although I'm not against the color idea, Perl will never embrace the | idea of having everything be an object. Using getValue() as the normal | method of retrieving a simple value is just too much work for Perl | people. A map should just become a hash, plain and simple. I can come up | with some creative ways of preserving YAML classes in Perl | transparently, but my plate's too full right now. A global "class map" could work. It's not pretty... Alternatively, we can drop the class idea for now. Along this same line of thought, I was asking myself if class is equivalent to encoding for scalars... just wondering. Are they similar constructs? | > readability. I'd rather we stick with strict indentation for everything, | > including blocks, blobs and double-quoted streams. | | I think it's nice to allow the cut-and-paste and let emitters always | reformat. But I'll support strict for now. ... | If you are once again suggesting dropping double quotes, you're | needlessly going down a road of pain. Double quotes are simple, | intuitive and will always work for folded text (indented or not). I think the double quote mechanism stays in... just about everyone knows how to use them. | I want to start doing something on this after June 15th or so. I hope we | can resolve things by then. We were very close last week. I think we are still very close. Open Issues ~~~~~~~~~~~ A. Classes Problem: Round-tripping 1. Drop classes for now 2. Let the implemenation worry about them (class map) B. Unicode vs Binary 1. Introduce a isBinary scalar flag and keep all scalars unicode. This causes problems with YAR where we cannot know if a given node is binary or not, but would like it readable if it is ASCII. 2. Introduce an isUnicode scalar flag. Limit encoding to ASCII for regular strings and to UTF-8 within single quoted strings. ... I think I'd pick A2 and B2 at this time. Clark P.S. One of the problems often sited with UTF-8 is that it is verbose. We could clearly offer the option that YAML texts are "gziped" and require that a parser know how to gunzip. |
From: Brian I. <briani@ActiveState.com> - 2001-05-30 21:13:32
|
"Clark C . Evans" wrote: > Also, I'm not sure that I like limiting unquoted streams to > a single line. Being able to cut/paste in a HTML text without > having to escape the quotes and such is very valueable. Further, > I thought we had agreed on [base64] instead of the back tick. I see your point. But I think disallowing them in lists is confusing. I'll go either way here. Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-30 23:44:36
|
On Wed, May 30, 2001 at 02:13:52PM -0700, Brian Ingerson wrote: | "Clark C . Evans" wrote: | > Also, I'm not sure that I like limiting unquoted streams to | > a single line. Being able to cut/paste in a HTML text without | > having to escape the quotes and such is very valueable. | | I see your point. But I think disallowing them in lists is confusing. | I'll go either way here. At first, I was very weary of the "many ways to do it". Given that we have different types of scalar's with different types of constraints the current four have very nice coverage: Unquoted: Good for single or multi-line folded content lacking significant whitespace and having all printables. Good for map scalars. Good for non-escaped content (as long as the first character does not begin with an indicator) Quoted: "Good for single or multi-line folded content where some of the whitespace is significant and non-printables may be escaped. Good for list scalars. Not good when alot of whitespace must be escaped or with content with frequent quote usage." Blocked: | Good where leading and intermediate | w h i t e s p a c e | is important to preserve and also good | where " $ and other special characters | need not be escaped. \ Binary: [BASE-64-IS-GOOD-FOR-BINARY-DATA] In short, I see each one as having a particular class of data it is good at representing. And I think the normalization rules could nicely choose among them! The quoted format is the most flexible, but it is also, probably the least readable. Is our focus here on "readability"? If so, then I think all four of the above forms could be important. Best, Clark |
From: Brian I. <briani@ActiveState.com> - 2001-05-31 06:29:21
|
"Clark C . Evans" wrote: > At first, I was very weary of the "many ways to do it". YAML will give rest to your weary soul. > Given that we have different types of scalar's with > different types of constraints the current four five :) > have very nice coverage: > > Unquoted: Good for single or multi-line folded > content lacking significant whitespace > and having all printables. Good for map > scalars. Good for non-escaped content > (as long as the first character does not > begin with an indicator) > > Quoted: "Good for single or multi-line folded > content where some of the whitespace is > significant and non-printables may be > escaped. Good for list scalars. Not good > when alot of whitespace must be escaped > or with content with frequent quote usage." > > Blocked: | Good where leading and intermediate > | w h i t e s p a c e > | is important to preserve and also good > | where " $ and other special characters > | need not be escaped. > \ > > Binary: [BASE-64-IS-GOOD-FOR-BINARY-DATA-THAT-CAN- OBVIOUSLY-SPAN-MORE-THAN-ONE-LINE] Unicode: 'Good for unicode data. It''s just cool to use the "single quote" for this!' If this is YAML; I like it :) Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-30 20:24:49
|
On Wed, May 30, 2001 at 02:56:00PM +0200, Oren Ben-Kiki wrote: | Hmmm. If one is serializing a built-in map/list/scalar, you do | it normally; when serializing an "Object" it probably gets | serialized into a map, with a class attribute. It is only | "typed scalars" which are a problem (e.g., a number, a date, etc.). | One problem about the proposed class-as-color syntax is that it | is rather cumbersome for something like a simple number... Right... this is half the problem. The other half of the problem is that the class should be known before the map's content should be loaded. I think I like the class solution as it is... I think the round-tripping problems are problematic; but perhaps they are not that bad. | Cut&paste implies a tool - an editor or a program. Most editors | allow you to trivially indent a group of lines - certainly any | editor used for writing YAML had better support it. And a program | doing cut&paste can add indentation easily enough. Ok. I'd rather have it always indented as well. | I'd rather we stick with strict indentation for everything, | including blocks, blobs and double-quoted streams. Did you consider that an emitter can indent the blob and double quoted streams? Thus, indentation isn't prevented. And with machine generated YAML (99%), the non-intendented case is minor. Further, by running YAML through a program which puts the file in canonical form indenting will definately be the norm. So... I didn't see the harm in allowing this. It doesn't really impact the parsing complexity. | Separate issue: having both double-quoted and simple scalar | values. Here's a proposal to use just one form: | | - Block: as today, but use ` instead of \ to mark the end line. I like the \, is there a reason why you wanted to change it? | - Scalar value: as today, allow escaping using \, always allow | it to be multi-line, but the continuation lines must be more | indented then the first line. I understand, however, I think the current division of scalar types is rather nice balance of concerns. Most programmers will expect \ style escaping within quotes. | - List: as today (no special marker), but allow : as an | optional prefix for scalar values. Well, if you leave the quote type in, then the : optional marker won't be needed: Example: @ Single line scalar "Pretty multi line text without multiline." | |Block with leading and |trailing new line. \ [Base 64 blob] | Hmmm. As I see it, we have at least one major issue to resolve (the class | issue) plus another where a "dictator hat" may be required (the text | syntax). Then we can all start doing development at our own pace... I still don't know what to do about the class issue, but I think that there are solutions for this. Let us take a language with only maps, lists, and scalars. An external "class map" could be put in place by the YAML load/save mechanism. Thus, as maps/lists/scalars are loaded, entries in this class map could be made. Then when the objects are serialized, the class could be written back out. Therefore, I think that a round-trip ability is possible, it just may not be the most obvious solution. As for the scalar types... I'm inclined to leave them as they are now. It does seem like a nice balance of concerns. What I'd like to focus on (and what should be added to the spec) is the canonoical form. Here are some base line suggestions: 1. Indenting always occurs, i.e. quoted or binary scalars are indented. 2. The tab setting for indents is 4 characters 3. When possible the text is word-wrapped to 76 characters; leaving for a minimum of 20 characters for scalar's content. Thus, after 14 levels of indentation, text may go beyond 20 characters. 4. If leading whitespace occurs on any line within a scalar, then the block format is used. 5. If a character string is longer than 20 characters without having intermediate whitespace, then the quoted format is used. etc. Best, Clark |
From: Brian I. <briani@ActiveState.com> - 2001-05-30 20:59:04
|
"Clark C . Evans" wrote: > > On Wed, May 30, 2001 at 02:56:00PM +0200, Oren Ben-Kiki wrote: > | Hmmm. If one is serializing a built-in map/list/scalar, you do > | it normally; when serializing an "Object" it probably gets > | serialized into a map, with a class attribute. It is only > | "typed scalars" which are a problem (e.g., a number, a date, etc.). > | One problem about the proposed class-as-color syntax is that it > | is rather cumbersome for something like a simple number... > > Right... this is half the problem. The other half of the > problem is that the class should be known before the map's > content should be loaded. I think I like the class solution > as it is... I think the round-tripping problems are > problematic; but perhaps they are not that bad. Since this is up in the air, I'd like a crack at implementing things as they stand. I think I can deal with round tripping issues; from Perl's side anyway. > > | Cut&paste implies a tool - an editor or a program. Most editors > | allow you to trivially indent a group of lines - certainly any > | editor used for writing YAML had better support it. And a program > | doing cut&paste can add indentation easily enough. > > Ok. I'd rather have it always indented as well. > > | I'd rather we stick with strict indentation for everything, > | including blocks, blobs and double-quoted streams. > > Did you consider that an emitter can indent the blob > and double quoted streams? Thus, indentation isn't > prevented. And with machine generated YAML (99%), > the non-intendented case is minor. Further, by running > YAML through a program which puts the file in canonical > form indenting will definately be the norm. So... I didn't > see the harm in allowing this. It doesn't really impact > the parsing complexity. Thanks for stating this clearly Clark. I assumed it was understood. > > | Separate issue: having both double-quoted and simple scalar > | values. Here's a proposal to use just one form: > | > | - Block: as today, but use ` instead of \ to mark the end line. > > I like the \, is there a reason why you wanted to change it? Agree > > | - Scalar value: as today, allow escaping using \, always allow > | it to be multi-line, but the continuation lines must be more > | indented then the first line. > > I understand, however, I think the current division > of scalar types is rather nice balance of concerns. > Most programmers will expect \ style escaping > within quotes. I'm completely on board with the current proposal (except that I would not allow the absence of double quotes in a multi-line folded stream.) > > | - List: as today (no special marker), but allow : as an > | optional prefix for scalar values. > > Well, if you leave the quote type in, then > the : optional marker won't be needed: > > Example: > @ > Single line scalar > "Pretty multi line > text without multiline." > | > |Block with leading and > |trailing new line. > \ > [Base 64 blob] Love this! Um is '[' or ']' used in mime-base64? > > | Hmmm. As I see it, we have at least one major issue to resolve (the class > | issue) plus another where a "dictator hat" may be required (the text > | syntax). Then we can all start doing development at our own pace... > > As for the scalar types... I'm inclined to leave them > as they are now. It does seem like a nice balance of > concerns. What I'd like to focus on (and what should > be added to the spec) is the canonoical form. Here > are some base line suggestions: > > 1. Indenting always occurs, i.e. quoted or binary > scalars are indented. Fine. > > 2. The tab setting for indents is 4 characters Yup. > > 3. When possible the text is word-wrapped to 76 > characters; leaving for a minimum of 20 characters > for scalar's content. Thus, after 14 levels of > indentation, text may go beyond 20 characters. But never on blocks, right? > > 4. If leading whitespace occurs on any line within > a scalar, then the block format is used. Good. > > 5. If a character string is longer than 20 characters > without having intermediate whitespace, then > the quoted format is used. And also without having any newlines or YAML meta-chars. 20 - 30 seems fine. Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-30 23:35:34
|
On Wed, May 30, 2001 at 01:59:24PM -0700, Brian Ingerson wrote: | > | - Scalar value: as today, allow escaping using \, always allow | > | it to be multi-line, but the continuation lines must be more | > | indented then the first line. | > | > I understand, however, I think the current division | > of scalar types is rather nice balance of concerns. | > Most programmers will expect \ style escaping | > within quotes. | | I'm completely on board with the current proposal (except that | I would not allow the absence of double quotes in a multi-line | folded stream.) Ok. This prevents easy way to enter folded text that is not escaped (like HTML), but perhaps can make the whole proposal cleaner. Oren, what do you think? You seem to want to eliminate one of the forms as well -- would reducing the un-quoted form to a single-line variant work for you? Why not just eliminate all unquoted forms? ... After some additional background thought on the unicode string proposal, I think I'm in favor. Would it have the similar escape sequences as the double quoted string? | > What I'd like to focus on (and what should be added | > to the spec) is the canonoical form. Here | > are some base line suggestions: | > | > 1. Indenting always occurs, i.e. quoted or binary | > scalars are indented. | | Fine. | | > | > 2. The tab setting for indents is 4 characters | | Yup. | | > | > 3. When possible the text is word-wrapped to 76 | > characters; leaving for a minimum of 20 characters | > for scalar's content. Thus, after 14 levels of | > indentation, text may go beyond 20 characters. | | But never on blocks, right? Right. | > | > 4. If leading whitespace occurs on any line within | > a scalar, then the block format is used. | | Good. | | > | > 5. If a character string is longer than 20 characters | > without having intermediate whitespace, then | > the quoted format is used. | | And also without having any newlines or YAML meta-chars. | 20 - 30 seems fine. This one lost me a bit, I'm talking about: key: this-is-a-long-value-that-can't-be-word-wrapped canonical form... key: "this-is-a-long-value-that-can't\ be-word-wrapped" ... I think the canonical form is where we need to spend a bit more time... but unforatunately, I'm too busy to push this further... perhaps later next week. Best, Clark |
From: Brian I. <briani@ActiveState.com> - 2001-05-31 06:21:56
|
"Clark C . Evans" wrote: > On Wed, May 30, 2001 at 01:59:24PM -0700, Brian Ingerson wrote: > | I'm completely on board with the current proposal (except that > | I would not allow the absence of double quotes in a multi-line > | folded stream.) > > Ok. This prevents easy way to enter folded text > that is not escaped (like HTML), but perhaps can > make the whole proposal cleaner. Oren, what do > you think? You seem to want to eliminate one of > the forms as well -- would reducing the un-quoted > form to a single-line variant work for you? > Why not just eliminate all unquoted forms? Ya know... If we just want to make the rule "Double quotes are optional unless ambiguity results", that's fine with me. > After some additional background thought on the unicode > string proposal, I think I'm in favor. Good. > Would it have the > similar escape sequences as the double quoted string? Sure. Why not? (read, "no strong opinion") > I think the canonical form is where we need to > spend a bit more time... but unforatunately, I'm > too busy to push this further... perhaps later > next week. I disagree. *Forget* the canonical form for now, we don't really need it. In fact, it's not even the appropriate time for us to make such decisions. Let's get a couple reasonable implementations out and decide canonical forms during a beta period. It's just not right for us to be the ones determining what the proper ways to use YAML are. Let the users decide. Power to the people! Just relax :) Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Oren Ben-K. <or...@ri...> - 2001-05-31 07:02:54
|
Brian Ingerson [mailto:briani@ActiveState.com] wrote: > > Given that we have different types of scalar's with > > different types of constraints the current four > > five :) Sigh :-) > > have very nice coverage: > > > > Unquoted: Good for single or multi-line folded > > content lacking significant whitespace > > and having all printables. Good for map > > scalars. Good for non-escaped content > > (as long as the first character does not > > begin with an indicator) I see the use of "the simplest possible syntax" without any escaping but with folding. But I find it rather arbitrary that you can't use it (the multi-line version of it) in lists. Using " forces you to escape any \ characters in it... Is there a chance to allow ':' as an optional prefix for scalars in lists? That would solve this issue. Alternatively, it seems we had a notion to allow escaping in Unquoted text as well. That would unify it with Quoted text - make the " optional, to be used when there's potential ambiguity, such as multi line text in lists. I'd rather do that than have a format which is only usable in maps and not in lists. Under this proposal, I suggested we switch the block terminating character to ` instead of \ to remove one case of ambiguity: is this: \na single line block or an escaped newline? Of course one could just use surrounding quotes to disambiguate it, I just thought it would be nice to eliminate one more case of requiring them. At any rate, I think we've trashed this to death. I'd rather use one of the above two forms (optional ':' or unifying quoted and unquoted), but I'll go with whatever you decide. I promise not to bug you about it again :-) > > Quoted: "Good for single or multi-line folded > > content where some of the whitespace is > > significant and non-printables may be > > escaped. Good for list scalars. Not good > > when alot of whitespace must be escaped > > or with content with frequent quote usage." OK. > > Blocked: | Good where leading and intermediate > > | w h i t e s p a c e > > | is important to preserve and also good > > | where " $ and other special characters > > | need not be escaped. > > \ OK. We're keeping it simple - so there's no way at all to break long lines in a block. > > Binary: [BASE-64-IS-GOOD-FOR-BINARY-DATA-THAT-CAN- > OBVIOUSLY-SPAN-MORE-THAN-ONE-LINE] OK. > Unicode: 'Good for unicode data. It''s just cool > to use the "single quote" for this!' I'm confused. I suggested we make the distinction between "text" (Unicode characters) and "binary" (byte array). Base64 indicates "binary", everything else is text. Are you suggesting that instead we make the distinction between "Unicode" and "byte array which may be either binary blob or ASCII text"? Where the only syntax for Unicode is 'single quoted text'? I don't like it because: - There's no way to write Unicode blocks. - You are mixing up ASCII text and binary data in the same data type, - And separating ASCII and Unicode text for no good reason. Languages are evolving towards "text = Unicode", as they should, and UTF-8 makes it very easy to deal with Unicode text even in languages such as C. As someone working in Israel, and having worked a lot with European and Japanese clients, I'm rather sensitive to this issue... It is a pain to use second-quality language features just because you aren't an English speaker. Have fun, Oren Ben-Kiki |
From: Brian I. <briani@ActiveState.com> - 2001-05-31 07:49:59
|
Oren Ben-Kiki wrote: > > Unicode: 'Good for unicode data. It''s just cool > > to use the "single quote" for this!' > > I'm confused. I suggested we make the distinction between "text" (Unicode > characters) and "binary" (byte array). Base64 indicates "binary", everything > else is text. > > Are you suggesting that instead we make the distinction between "Unicode" > and "byte array which may be either binary blob or ASCII text"? Where the > only syntax for Unicode is 'single quoted text'? To be honest, I lack any real savvy in unicode issues. I merely suggested the single quote as a preferred syntax, if we were going to separate ascii from unicode. I probably misunderstood the intent. If it doesn't make sense, then by all means let's consider all text to be utf8. > > I don't like it because: > > - There's no way to write Unicode blocks. > - You are mixing up ASCII text and binary data in the same data type, > - And separating ASCII and Unicode text for no good reason. Languages are > evolving towards "text = Unicode", as they should, and UTF-8 makes it very > easy to deal with Unicode text even in languages such as C. Although Perl's unicode support isn't the best, I do believe that UTF8 is the way strings are now assumed to be by default. > As someone working in Israel, and having worked a lot with European and > Japanese clients, I'm rather sensitive to this issue... It is a pain to use > second-quality language features just because you aren't an English speaker. That makes sense. I willfully step aside on this issue. Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-31 10:43:36
|
On Thu, May 31, 2001 at 09:03:37AM +0200, Oren Ben-Kiki wrote: | I'm confused. I suggested we make the distinction between | "text" (Unicode characters) and "binary" (byte array). | Base64 indicates "binary", everything else is text. Right. Let's call this option "binary switch" | Are you suggesting that instead we make the distinction | between "Unicode" and "byte array which may be either | binary blob or ASCII text"? Yes. This was Brian's suggestion, let's call this option the "unicode switch". It is clear that we need a switch. Let's go over the use cases... Python - Has a separate ASCII string and Unicode string. Unicode strings are specially marked and conversion from ASCII to Unicode is relatively easy. Java - Has byte and String, where string is Unicode. Perl - (no clue) C++ - Separate ascii (char) and unicode (wchar_t) strings. Python and C/C++ fit more strongly with the "Unicode Switch" where Java is better with "Binary Switch". Perl? ... On a related note, consider the YAR use case. For the Unicode switch things works pretty well... ASCII files are shown as ASCII and UTF-16 nodes are shown as unicode. Every once and a while a binary file will show up as a readable ASCII file. But this will be rare. However, for the Binary switch... YAR cannot operate without a list of "known text extensions" where all other files (regardless about how ASCII they look) must be treated as binary to avoid possible mangling! Thus, from a "yar" perspective, I like the Unicode Switch much better... | Where the only syntax for Unicode is 'single quoted text'? | | I don't like it because: | | - There's no way to write Unicode blocks. | - You are mixing up ASCII text and binary data in the same data type, | - And separating ASCII and Unicode text for no good reason. Languages are | evolving towards "text = Unicode", as they should, and UTF-8 makes it very | easy to deal with Unicode text even in languages such as C. Yep. I can see these problems. However, then our YAR use case must have a ".yar" file which includes the list of "text" extensions. Furthermore, it will have to also note if the file was stored as UTF-8 or UTF-16. Hmmm. | As someone working in Israel, and having worked a lot | with European and Japanese clients, I'm rather sensitive | to this issue... It is a pain to use second-quality language | features just because you aren't an English speaker. I agree here... but let's work through the YAR use case a bit more; as it seems to be our first really good application that is easily expressable and self-contained. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-05-31 10:49:25
|
On Thu, May 31, 2001 at 09:03:37AM +0200, Oren Ben-Kiki wrote: | > > Unquoted: Good for single or multi-line folded | > > content lacking significant whitespace | > > and having all printables. Good for map | > > scalars. Good for non-escaped content | > > (as long as the first character does not | > > begin with an indicator) | | I see the use of "the simplest possible syntax" without any escaping but | with folding. But I find it rather arbitrary that you can't use it (the | multi-line version of it) in lists. I don't see why we need to forbid it's usage in lists... although you may consider it ugly... list: @ This is a multi line scalar value. This is the second entry. | At any rate, I think we've trashed this to death. I'd rather use one of the | above two forms (optional ':' or unifying quoted and unquoted), but I'll go | with whatever you decide. I promise not to bug you about it again :-) | > > Blocked: | Good where leading and intermediate | > > | w h i t e s p a c e | > > | is important to preserve and also good | > > | where " $ and other special characters | > > | need not be escaped. | > > \ | | OK. We're keeping it simple - so there's no way at all to break long lines | in a block. Right -- 4 simple "styles" (mod unicode) rather than one big complicated mechanism. Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-05-31 11:45:20
|
> I don't see why we need to forbid it's usage in lists... > although you may consider it ugly... > > list: @ > This is a multi line > scalar value. > This is the second entry. First, it is specifically forbidden by the current spec. Second, yes, it is ugly, a prefix ':' would do wonders here. I've promised not to ask again for allowing it, so I won't :-) > | I don't like it because: > | > | - There's no way to write Unicode blocks. > | - You are mixing up ASCII text and binary data in the same > | data type, > | - And separating ASCII and Unicode text for no good reason. > | Languages are > | evolving towards "text = Unicode", as they should, and > | UTF-8 makes it very > | easy to deal with Unicode text even in languages such as C. > > Yep. I can see these problems. However, then our YAR > use case must have a ".yar" file which includes the list > of "text" extensions. Furthermore, it will have to also > note if the file was stored as UTF-8 or UTF-16. Hmmm. Let's see. YAR needs to satisfy the following requirements: 1. Always round-trip the file correctly, byte-to-byte identical. 2. Subject to (1), use a human-readable representation of a file. 3. Carry over any meta-data about the file which is relevant. Solution: - Represent each file as a map: file-name: % permissions: ... owner: ... group: ... character-set: ... content: ... - The "character set" attribute explains how to convert the value of the 'content' entry to bytes to write. It should be one of ascii/utf-8/utf-16be/utf-16le. If the content is a binary blob, either there is no "character set" (the content isn't textual), or it could just say "binary" (I know that's not a IANA recognized character set...) - The yar file creator chooses which syntax format to give the content according to the following heuristic. In each case, it ensures the encoding will be such that the file will be re-created byte-to-byte identical to the original: If the file contains... 1. ... only 7-bit printable ASCII characters, plus newlines => syntax: block, charset: ascii. 2. ... only utf-8 printable characters, plus newlines => syntax: block, charset: utf-8. 3. ... like 1, with the odd 8-bit character (but not 2) => syntax: quoted string, charset: ascii. 4. ... like 2, with the odd unprintable character => syntax: quoted string, charset: utf-8. 5. ... what looks to be as a utf-16 file with only printable characters plus newlines => syntax: block charset: utf-16be/utf-16le. 6. ... like 5 with the odd non-printable character => syntax: quoted string, charset: utf-16be/utf-16le. 7. ... anything else => syntax: blob, charset: binary (or missing). This works, is safe, is extensible, and doesn't require us to bend YAML out of shape. The only things which bothers me is handling newlines when transferring between DOS/Windows and the rest of the world. Question: in a block, does YAML preserve whether a line ended with a \n or a \r\n? If not we are OK. And I'm still worried about the class marker... Can you say something about what you mean by "class map"? I didn't get it. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-05-31 16:34:09
|
yOn Thu, May 31, 2001 at 01:46:01PM +0200, Oren Ben-Kiki wrote: | > I don't see why we need to forbid it's usage in lists... | > although you may consider it ugly... | > | > list: @ | > This is a multi line | > scalar value. | > This is the second entry. | | First, it is specifically forbidden by the current spec. It is? It was in a previous spec, but not the recent one (the one dated the 26th) which was updated in a major way on the 26th after it was released (my bad). Where explicitly, as I actually remember working this through in my head so that it wasn't an exception. | Second, yes, it is ugly, a prefix ':' would do wonders here. Hmm. Considering. | Let's see. YAR needs to satisfy the following requirements: | | 1. Always round-trip the file correctly, byte-to-byte identical. | 2. Subject to (1), use a human-readable representation of a file. | 3. Carry over any meta-data about the file which is relevant. Good. | - Represent each file as a map: | file-name: % | permissions: ... | owner: ... | group: ... | character-set: ... | content: ... | | - The "character set" attribute explains how to convert the value of the | 'content' entry to bytes to write. It should be one of | ascii/utf-8/utf-16be/utf-16le. If the content is a binary blob, either there | is no "character set" (the content isn't textual), or it could just say | "binary" (I know that's not a IANA recognized character set...) Ok. So we've made an explicit node attribute called "character set". Interesting. Should this be added using a special indicator? ^utf-8 | This works, is safe, is extensible, and doesn't require us to bend YAML out | of shape. The only things which bothers me is handling newlines when | transferring between DOS/Windows and the rest of the world. Question: in a | block, does YAML preserve whether a line ended with a \n or a \r\n? If not | we are OK. Yep. I was worring about this one too. I'm not sure what the solution is. Normalizing new lines is a perfectly reasonable thing to do... except in this case. | And I'm still worried about the class marker... Can you | say something about what you mean by "class map"? I | didn't get it. Assume you only have a object which is a map, list, or scalar. The YAML parser/emitter pair could keep a global "classmapping" which associated all objects created via reading from the serialized format with it's optional class map. Then, when writing the YAML file, the global variable could be used to reconstruct the class map attribute. Yes, this opens up a small can of worms... garbage collection among them. However, it does make round-tripping possible even when class isn't supported by the system. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-06-01 05:18:01
|
On Thu, May 31, 2001 at 01:46:01PM +0200, Oren Ben-Kiki wrote: | - The "character set" attribute explains how to convert the value of the | 'content' entry to bytes to write. It should be one of | ascii/utf-8/utf-16be/utf-16le. If the content is a binary blob, either there | is no "character set" (the content isn't textual), or it could just say | "binary" (I know that's not a IANA recognized character set...) I was thinking a bit differently. Perhaps either we should use "class" to indicate character set or introduce another indicator. Your thoughts. Consider the current "built-in" classes, "_int", "_real", etc. Are not they encodings? | The only things which bothers me is handling newlines when | transferring between DOS/Windows and the rest of the world. This was bugging me, but I like the YAML spec as it is. | Question: in a block, does YAML preserve whether a line | ended with a \n or a \r\n? No. This is one of the nice problems XML cleared up, let's not roll back the clock. I've yet to hear anyone complain that XML's folding of \r\n and \r into just \n was anything but helpful. The only place it gets us into problems is for the YAR use case... where the serializer doesn't know if a given value is character or binary. In this case, I think the behavior should be dependent upon the platform. On unix platforms, files with \r\n are treated as binary (to preserve the line endings), and on DOS boxes, files ending with \n (without \r) are treated as binary. This gives the expected behavior when moving files between platforms. By the way, with the current treatment, any YAML saved on a Windows box will have \r\n line endings. And likewise, on a Unix box it will have \n line endings. Since the parser doesn't care, one can move the files back and forth without affecting the canonical form. This may not give exactly the same behavior as "tar", since the exact line endings won't be preserved when switching platforms... but this is always a sore spot anyway! As for "diff", there are (or should be) flags to not report line ending differences. Therefore, I'm pretty certain that I'd like to keep the line ending folding as it is in the YAML spec. Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-05-31 11:51:16
|
I meant to write: > Question: in a > block, does YAML preserve whether a line ended with a \n or a > \r\n? If not ^^^^^^^^^ yes > we are OK. Oren. |
From: Brian I. <briani@ActiveState.com> - 2001-05-31 18:35:04
|
Oren Ben-Kiki wrote: > > I meant to write: > > > Question: in a > > block, does YAML preserve whether a line ended with a \n or a > > \r\n? If not > ^^^^^^^^^ yes > > we are OK. Then by all means, preserve. -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Oren Ben-K. <or...@ri...> - 2001-06-03 06:54:23
|
Clark C . Evans [mailto:cc...@cl...] wrote: > Ok. So we've made an explicit node attribute > called "character set". Interesting. Should > this be added using a special indicator? ^utf-8 Eeek. What next, a special notation for permissions (!), owner (~), and group (?)? I thought YAML was to use the minimal number of special properties - data type, and reference handling, and (maybe!) classes. Anything else can and should be done by using normal map keys. > | And I'm still worried about the class marker... Can you > | say something about what you mean by "class map"? I > | didn't get it. > > Assume you only have a object which is a map, list, or scalar. > The YAML parser/emitter pair could keep a global "classmapping" > which associated all objects created via reading from the > serialized format with it's optional class map. Then, when > writing the YAML file, the global variable could be used to > reconstruct the class map attribute. Yes, this opens up > a small can of worms... garbage collection among them. > However, it does make round-tripping possible even when > class isn't supported by the system. In essence you are proposing an alternate color mechanism - one based on a separate store, "to the side of" the document, which contains the "color" info for each node. In other words, admitting that YAML isn't good enough by itself to handle this data and therefore a separate mechanism must be added to it. I strongly dislike this. Just like YAR is a great, concrete example of an application for resolving syntax issues etc. I feel that classes are a great, concrete example of an application for resolving the color issue. Let's "eat our own dog food" here. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-06-03 07:11:31
|
Clark C . Evans [mailto:cc...@cl...] wrote: > On Thu, May 31, 2001 at 01:46:01PM +0200, Oren Ben-Kiki wrote: > | - The "character set" attribute explains how to convert the > | value of the > | 'content' entry to bytes to write. It should be one of > | ascii/utf-8/utf-16be/utf-16le. If the content is a binary > | blob, either there > | is no "character set" (the content isn't textual), or it > | could just say > | "binary" (I know that's not a IANA recognized character set...) > > I was thinking a bit differently. Perhaps either > we should use "class" to indicate character set > or introduce another indicator. Your thoughts. > Consider the current "built-in" classes, > "_int", "_real", etc. Are not they encodings? It seems I'm making a distinction here that you aren't. Issue 1: How to convert YAML text to an in-memory object and vice-versa (YAML issue). Issue 2: How to convert YAR file content represented in a YAML scalar to a file on the disk and vice versa (YAR issue). Character set, as I described it, belongs to issue 2. It seems to me completely outside the scope of YAML to include, as part of the object model, the answer to the question "if this Unicode string was to be written as the sole content of an on-disk file, which of the many possible encodings to use for it". Your questions about the built-in types, however, relate to issue 1: When reading the YAML file into memory in, say, Java, should I create a String in-memory object, a Float in-memory object (or maybe a Double?), or an Integer in-memory object (or maybe a Long?). Great question, to which my answer is: "determine how YAML will support color, and do it that way". > | Question: in a block, does YAML preserve whether a line > | ended with a \n or a \r\n? > > No. This is one of the nice problems XML cleared up, let's > not roll back the clock. I've yet to hear anyone complain > that XML's folding of \r\n and \r into just \n was anything > but helpful. > > The only place it gets us into problems is for the YAR > use case... where the serializer doesn't know if a given > value is character or binary. In this case, I think > the behavior should be dependent upon the platform. > On unix platforms, files with \r\n are treated as binary > (to preserve the line endings), and on DOS boxes, files > ending with \n (without \r) are treated as binary. A "binary" file will be represented as a base64 blob in the YAR file. You don't have to go that far to handle an occasional \r\n or \n in a file; you can just use a quoted string and escape them properly. That's still "mostly text". > This gives the expected behavior when moving files > between platforms. By the way, with the current > treatment, any YAML saved on a Windows box will > have \r\n line endings. And likewise, on a > Unix box it will have \n line endings. Since the > parser doesn't care, one can move the files back > and forth without affecting the canonical form. That's neat. > This may not give exactly the same behavior > as "tar", since the exact line endings won't > be preserved when switching platforms... but this > is always a sore spot anyway! As for "diff", > there are (or should be) flags to not report > line ending differences. So, YAR is more like "shar" then "tar". That makes a lot of sense. > Therefore, I'm pretty certain that I'd like to keep > the line ending folding as it is in the YAML spec. OK. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-06-03 07:58:32
|
Clark C . Evans [mailto:cc...@cl...] wrote: > yOn Thu, May 31, 2001 at 01:46:01PM +0200, Oren Ben-Kiki wrote: > | > I don't see why we need to forbid it's usage in lists... > | > although you may consider it ugly... > | > > | > list: @ > | > This is a multi line > | > scalar value. > | > This is the second entry. > | > | First, it is specifically forbidden by the current spec. > > It is? It was in a previous spec, but not the recent > one (the one dated the 26th) which was updated in a major > way on the 26th after it was released (my bad). > Where explicitly, as I actually remember working this > through in my head so that it wasn't an exception. I just checked and you are right - I have a printed version of the "older" 26th draft. Sorry. Things about the latest one which we agreed to fix: - Indentation in quoted strings; - Separate text from binary in the information model; - Clearing up how the API handles "multi-map" files (with blank lines). Other things which I noticed: - The draft requires a blank line at the end of the document; - There are some formatting issues; - There's a special standing for ISO=8859-1 which I don't follow. Isn't everything in Unicode? What does it mean to write an ISO-8859-1 escape within a quoted string? And, there's the class issue... Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-06-03 16:08:52
|
On Sun, Jun 03, 2001 at 09:44:47AM +0200, Oren Ben-Kiki wrote: | Things about the latest one which we agreed to fix: | | - Indentation in quoted strings; | - Separate text from binary in the information model; | - Clearing up how the API handles "multi-map" files (with blank lines). | | Other things which I noticed: | | - The draft requires a blank line at the end of the document; | - There are some formatting issues; Could you fix the above and send me the draft? I'm totally swamped with my day job... | - There's a special standing for ISO=8859-1 which I don't follow. Isn't | everything in Unicode? What does it mean to write an ISO-8859-1 escape | within a quoted string? Ahh. There are two escapes, \xXX and \uXXXX, where \x takes a 8 bit value and \u takes a 16 bit value. You need both since \x20BABE Is a space followed by BABE, where \u20BABE is an oriental character (20BA) followed by BE. | And, there's the class issue... A "clean" proposal would help. I know you made a few in the past, but perhaps if we put this on the table we could come to a conclusion. ;) Clark |
From: Oren Ben-K. <or...@ri...> - 2001-06-03 13:46:15
|
> Things about the latest one which we agreed to fix: > > - Indentation in quoted strings; And blobs... > - Separate text from binary in the information model; > - Clearing up how the API handles "multi-map" files (with > blank lines). > > Other things which I noticed: > > - The draft requires a blank line at the end of the document; > - There are some formatting issues; > - There's a special standing for ISO=8859-1 which I don't > follow. Isn't > everything in Unicode? What does it mean to write an ISO-8859-1 escape > within a quoted string? > > And, there's the class issue... Have fun, Oren Ben-Kiki |