From: Oren Ben-K. <or...@ri...> - 2002-02-17 07:55:25
|
Clark C . Evans [mailto:cc...@cl...] wrote: > Website updated and the main page has been changed > (with a few fixes as Brian requested). Thanks. BTW, whatever these changes were, they aren't listed in the 'changes' section; I trust they were insignificant (wording etc.)? > | So... How about we post a message in SML-DEV (and maybe > | even XML-DEV) asking people to take a look at it? > > This sounds great, let's make this our "Last Call Working Draft". > This gives us 3 months to make a full-blown "C" implementation! > How should the announcement look? Speaking of "Last call" there's one unavoidable issue I found with the current spec. I don't propose any changes, just that we'll be aware of the situation. Consider this: [ text <1M spaces and tabs> text ] Versus this: [ text <1M spaces and tabs> ] Ugh. No way around it that I can see. In my scanner implementation the tactics I'm using is that if the <spaces and tabs> is too large for comfort, I mark it as a special case and let the parser break its head on how to treat it. Probably the best tactics there would be to assume it is content and report a (belated) error if it isn't - nobody sane would put a huge amount of insignificant spacing, right? I've grown to *hate* white space. Anyway, about the announcement... I suggest we rename the version to "call-for-comments-1" and change the status section as follows: "This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list. While this is a draft version, it is intended that no changes be made for a period of three months, during which comments would be sought and initial implementations would be created. Reviewers and implementers are encouraged to use this draft as reference and to send their comments to the mailing list." We should also rename the "changes from previous versions" section to "versions history" and only list links to the previous versions. The full list of changes would be confusing to newcomers; a link to previous versions should satisfy anyone who is really interested. Once we do that, we should send a message along the following lines (basically lifting the text from the abstract and status sections): Subject: First YAML "Call for comments" draft released YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. The latest version of the YAML specification is available at http://www.yaml.org/spec/call-for-comments-1.html. While this is a draft version, it is intended that no changes be made for a period of three months, during which comments would be sought and initial implementations would be created. Reviewers and implementers are encouraged to use this draft as reference and to send their comments to the <link-to>mailing list</link-to>. We can post this in SML-DEV and XML-DEV, and go about the implementation business... Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-02-18 08:30:29
|
Some other typos in the spec: - In section 4.3.5/Shorthands, all occurrences of "transfer method" should be replaced by "type family". - In section 5.1, 2nd example, "identical sequence:" should be changed to "equal sequence:". Speaking of 5.1 and 5.2, I think we should allow the following: this: !map - Value for the integer key '0'. - Value for the integer key '1'. equal to: 0 : Value for the integer key '0'. 1 : Value for the integer key '1'. This would complete the unification of series/keyed sequence/map branches/collection. This would require an addition to section 5.2 along the same lines as section 5.1. > > | I'd love to have the C implementation done by summer. I > > | think I'm going to > > | push for a YAML tutorial at the O'Reilly Open Source Conference. > > > > Same here. I'm going to start throwing one day a week at it, > > it won't be huge but till my organization makes some money, I > > can't really let YAML sit on the sidelines. An outline... I agree with the phases but think the schedule is wildly optimistic. FWIW, I've been working bottom-up. I'm focusing on the Java YAML scanner first. This scanner should be trivial to port to C/C++ or any other language: it has one main method, "size = read(buf, offset, length)", it doesn't do any memory operations, it is lightning-fast, and it provides the full syntax model in a nice way. Next would be a parser, converting the above into the tree model, and the emitter which would do the opposite. I expect the API to both would be consistent with the C/C++ APIs. Finally I'd tackle the loader/dumper, ideally by converting the tree model into/from Java's serialization format. I'm still debating whether it would make sense to have a second loader/dumper pair which would work "DOM-like" using direct access to Java's Hash and Vector classes instead of going through the serialization format. The latter would be essentially identical to a C/C++ implementation using DOM-like objects. I meant it to be a sort of "reference implementation" people could consult in order to see how the full YAML architecture works, rather than the quickest way to get something working for Java. That's the opposite of what Brian did in Perl. I think we are both making the right decision... I'd recommend a quick loader/dumper (graph model level) implementation for Perl and Python, ignoring the tree and syntax model levels at first. But for languages such as C/C++/Java, I think we should "properly" implement the full architecture. Also, I think that it is important to keep the C/C++ and Java APIs as compatible as possible (that is, essentially identical in all the above layers except for the Java-specific serialization-based loader/dumper implementation). Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-02-18 18:01:47
|
On 18/02/02 08:31 -0000, Oren Ben-Kiki wrote: > Some other typos in the spec: > > - In section 4.3.5/Shorthands, all occurrences of "transfer method" should > be replaced by "type family". Fixed. > - In section 5.1, 2nd example, "identical sequence:" should be changed to > "equal sequence:". Fixed. I've also taken the liberty of reindenting the examples to various levels. > > Speaking of 5.1 and 5.2, I think we should allow the following: > > this: !map > - Value for the integer key '0'. > - Value for the integer key '1'. > equal to: > 0 : Value for the integer key '0'. > 1 : Value for the integer key '1'. Sure. +1 > > This would complete the unification of series/keyed sequence/map > branches/collection. This would require an addition to section 5.2 along the > same lines as section 5.1. I'll try and get to that :) > > > > | I'd love to have the C implementation done by summer. I > > > | think I'm going to > > > | push for a YAML tutorial at the O'Reilly Open Source Conference. > > > > > > Same here. I'm going to start throwing one day a week at it, > > > it won't be huge but till my organization makes some money, I > > > can't really let YAML sit on the sidelines. An outline... > > I agree with the phases but think the schedule is wildly optimistic. +1 > > FWIW, I've been working bottom-up. > > I'm focusing on the Java YAML scanner first. This scanner should be trivial > to port to C/C++ or any other language: it has one main method, "size = > read(buf, offset, length)", it doesn't do any memory operations, it is > lightning-fast, and it provides the full syntax model in a nice way. A scanner is like a lexer? What are the tokens it returns? > > Next would be a parser, converting the above into the tree model, and the > emitter which would do the opposite. I expect the API to both would be > consistent with the C/C++ APIs. > > Finally I'd tackle the loader/dumper, ideally by converting the tree model I think you should get some help. If you do all this in order it may takes a *long* time. I would urge you to seek out some other interested Java programmers. I could probably point some your way. > into/from Java's serialization format. I'm still debating whether it would > make sense to have a second loader/dumper pair which would work "DOM-like" > using direct access to Java's Hash and Vector classes instead of going > through the serialization format. The latter would be essentially identical > to a C/C++ implementation using DOM-like objects. > > I meant it to be a sort of "reference implementation" people could consult > in order to see how the full YAML architecture works, rather than the > quickest way to get something working for Java. That's the opposite of what > Brian did in Perl. I think we are both making the right decision... I'd > recommend a quick loader/dumper (graph model level) implementation for Perl > and Python, ignoring the tree and syntax model levels at first. But for > languages such as C/C++/Java, I think we should "properly" implement the > full architecture. I only wish you were pouring your precious time into a C reference implementation first. No other languages can benefit from a Java one. The cornerstone of YAML will be libyaml written in C. > > Also, I think that it is important to keep the C/C++ and Java APIs as > compatible as possible (that is, essentially identical in all the above > layers except for the Java-specific serialization-based loader/dumper > implementation). True. I suggest we assign Clark as the project lead for C libyaml, but I will try to put together a team of contributors so that we can get it done sooner. Cheers, Brian > > Have fun, > > Oren Ben-Kiki > > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Oren Ben-K. <or...@ri...> - 2002-02-18 16:53:58
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | these are necessarily different: > | - foo > | - 'foo ' > | > | However whitespace is significant on multiline forms. > > Any reason why it's _in_sigificant on single-line scalars > but significant for multiline forms? Why not just have > it significant everwhere. Same as leading white space... Otherwise you couldn't write: matrix: - [ 1 , 2 , 3 ] - [ 45 , 56 , 78 ] Leading and trailing white space are not part of 'simple' scalars. Have fun, Oren Ben-Kiki |
From: Steve H. <sh...@zi...> - 2002-02-18 17:00:53
|
> Clark C . Evans [mailto:cc...@cl...] wrote: > > | these are necessarily different: > > | - foo > > | - 'foo ' > > | > > | However whitespace is significant on multiline forms. > > > > Any reason why it's _in_sigificant on single-line scalars > > but significant for multiline forms? Why not just have > > it significant everwhere. > > Same as leading white space... Otherwise you couldn't write: > > matrix: > - [ 1 , 2 , 3 ] > - [ 45 , 56 , 78 ] > > Leading and trailing white space are not part of 'simple' scalars. > Aren't inlines sort of a third case here? You would allow trailing white space in single-line and multi-line scalars, but not in inlined scalars? |
From: Oren Ben-K. <or...@ri...> - 2002-02-18 17:17:12
|
Steve Howell [mailto:sh...@zi...] wrote: > > > Any reason why it's _in_sigificant on single-line scalars > > > but significant for multiline forms? Why not just have > > > it significant everwhere. > > > > Same as leading white space... Otherwise you couldn't write: > > > > matrix: > > - [ 1 , 2 , 3 ] > > - [ 45 , 56 , 78 ] > > > > Leading and trailing white space are not part of 'simple' scalars. > > > > Aren't inlines sort of a third case here? You would allow > trailing white space > in single-line and multi-line scalars, but not in inlined scalars? Nope. Currently, the same simple scalars are used for all in-line forms, whether or not they are in an in-line list/map or not. Which makes sense; there's no reason to have even more scalar styles (there are quite a few as it is). If trailing white space was precluded one would have to write it as: matrix: - [ 1, 2, 3] - [ 45, 56, 78] And the original example would be a matrix of strings '1 ', '2 ' etc. - that is a rather unintuitive application error: 'matrix elements aren't integers'. I think anyone seeing the above matrix would expect it to contain integers... It is pretty strange to trim leading white space and not trailing spaces. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-02-18 17:28:11
|
| > > | > > matrix: | > > - [ 1 , 2 , 3 ] | > > - [ 45 , 56 , 78 ] | > > | > > Leading and trailing white space are not part of 'simple' scalars. | > > | > | > Aren't inlines sort of a third case here? You would allow | > trailing white space in single-line and multi-line scalars, | > but not in inlined scalars? | | Nope. Currently, the same simple scalars are used for all in-line forms, | whether or not they are in an in-line list/map or not. Which makes sense; | there's no reason to have even more scalar styles (there are quite a few as | it is). | | If trailing white space was precluded one would have to write it as: | | matrix: | - [ 1, 2, 3] | - [ 45, 56, 78] Hmm. I woudn't mind having yet another scalar type (in the productions) if it allowed us to dispatch with the large-lookahead case, single-line case. | It is pretty strange to trim leading white space and not trailing spaces. Right. Hmm. Clark |
From: Steve H. <sh...@zi...> - 2002-02-18 17:40:09
|
> | It is pretty strange to trim leading white space and not trailing spaces. > > Right. Hmm. > Agreed that it's strange, but here's my take: Leading white spaces are usually for aesthetics/alignment of the YAML, so YAML's best guess is to trim them. Trailing white spaces would only be there for part of the data, so you should treat them as data. Except that they could also be inadvertent, which is why you warn for them. |
From: Steve H. <sh...@zi...> - 2002-02-18 17:35:34
|
> Steve Howell [mailto:sh...@zi...] wrote: > > > > Any reason why it's _in_sigificant on single-line scalars > > > > but significant for multiline forms? Why not just have > > > > it significant everwhere. > > > > > > Same as leading white space... Otherwise you couldn't write: > > > > > > matrix: > > > - [ 1 , 2 , 3 ] > > > - [ 45 , 56 , 78 ] > > > > > > Leading and trailing white space are not part of 'simple' scalars. > > > > > > > Aren't inlines sort of a third case here? You would allow > > trailing white space > > in single-line and multi-line scalars, but not in inlined scalars? > > Nope. Currently, the same simple scalars are used for all in-line forms, > whether or not they are in an in-line list/map or not. Which makes sense; > there's no reason to have even more scalar styles (there are quite a few as > it is). > > If trailing white space was precluded one would have to write it as: > > matrix: > - [ 1, 2, 3] > - [ 45, 56, 78] I totally agree forcing them to write it this way is uncool. > > And the original example would be a matrix of strings '1 ', '2 ' etc. - that > is a rather unintuitive application error: 'matrix elements aren't > integers'. I think anyone seeing the above matrix would expect it to contain > integers... It is pretty strange to trim leading white space and not > trailing spaces. I guess what I was thinking is that inline scalars would get white space trimmed from both sides. Single line and multi-line scalars would not. Once the space got trimmed, you would use the same regexes for all forms of a scalar. Maybe another way of thinking it is that a carriage return can delimit trailing white space, but a comma can't, so you have to quote strings when you're inline. |
From: Steve H. <sh...@zi...> - 2002-02-20 04:11:58
|
Just wondering. |
From: Brian I. <in...@tt...> - 2002-02-20 06:09:11
|
On 19/02/02 23:02 -0500, Steve Howell wrote: > Just wondering. Well it turns out that my Perl implementation *parses* them just fine, although it can't correctly *load* them yet. (due to the limitations of Perl). Here's an example from the YAML Shell: > ysh Welcome to the YAML Test Shell. Type ':help' for more information. ysh > --- yaml> {foo: bar}: [fee, fie, foo] yaml> ... $VAR1 = { 'HASH(0x207ab4)' => [ 'fee', 'fie', 'foo' ] }; ysh > Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-02-19 07:41:09
|
Brian Ingerson [mailto:in...@tt...] wrote: > > Some other typos in the spec: ... > Fixed. Thanks. > A scanner is like a lexer? What are the tokens it returns? AFAIK it is a rather unique way of tokenizing an input stream. The main function is currently: class YamlScanner { int readAtom(char[] buffer, int offset, int length); // And some other utilities, such as accessing the // current position (line, char) in the input, // the error description if any, and maybe even: // void skip(int indentLevel); } The readAtom method returns a size, the number of input characters in the read atom, or -1 for EOF. buffer[offset + size] contains an atom type code. In addition, if the most significant bit of the code is '1', the read atom is partial and the next call will return more characters for the same atom. An "atom" is the smallest unit which has defined semantics in the syntax model. I didn't want to use the words "lexer" and "token" because some people (e.g. Clark) mentioned scanning is a somewhat lower level operation. For example, each and every character of the input file is reported as-is as part of some atom. This usually isn't the case with lexers. I have a program which takes a YAML file as input, say: --- #YAML:1.0 !!type some value --- #NO-VALUE And emits: --- #YAML:1.0 !!type "some\nvalue!" SSSWITTTTITTTWIITTTTWITTTT\\TTTTTTI --- #NO-VALUE SSSWITTTTTTTTE Input:2/12: Expected ':' separating directive name from value Now that I think of it perhaps it had better emit it as: --- # YAML : 1.0 ! ! type " some \n value! " SSS W I TTTT I TTT W I I TTTT W I TTTT \\ TTTTTT I --- # NO-VALUE SSS W I TTTTTTTT E Input:2/12: Expected ':' separating directive name from value So atom boundaries would be clearer (e.g., in the case of the '!!' case). I'll guess I'll make that a flag... At any rate, the scanner is basically a state machine (OK, with an indentation level stack and some other tweaks, but still mostly a state machine). It doesn't allocate token objects in memory (another difference from most lexers) so it should be extremely fast. And, the implementation should be very portable; specifically conversion to C/C++ should be trivial. The scanner verifies the input stream is a valid YAML stream (syntactically). Its basic error handling strategy is to report an error token (with zero or more input characters) and continue as if the error characters weren't seen. Of course, in some cases this means treating all characters until the next space or end of the line as "error"... Or in extreme cases even until the next line which is less-indented than some threshold. Lookahead is only done within the boundary of the buffer given to the readAtom method. If this buffer is filled and the atom type isn't deduced yet, the scanner throws an Error complaining the buffer is too small. In practice, other than the trailing white space case, lookahead is just one character. Actually, when a document separator is used, the lookahead is as long as the separator: ---some-separator ] Unindented text ---some-separato-NOT! Sigh. More text So, just call the method with a large enough buffer - say, 1K characters - and all will be well. As for trimmed trailing spaces, if they overflow the buffer I'll report them as 'w' instead of 'W' and let the higher level break its head - treat them as error, guess they are text, or maybe read some more and then decide. The purpose of the scanner is to provide full access to the "syntax model". Editors etc. would really benefit from such a tool. > > Next would be a parser, converting the above into the tree > > model, and the > > emitter which would do the opposite. I expect the API to > > both would be > > consistent with the C/C++ APIs. > > > > Finally I'd tackle the loader/dumper, ideally by converting > > the tree model > > I think you should get some help. If you do all this in order > it may takes a *long* time. I would urge you to seek out some > other interested Java programmers. I could probably point some > your way. I think I should get it to a stage where concurrent work is practical, first. The scanner, being based on one large state machine, isn't very suitable for concurrent development. After that, obviously writing a parser and an emitter can be done in parallel, and also the loader/dumper pairs can allow up to 4 people to work at once without stepping on each other toes. > I only wish you were pouring your precious time into a C reference > implementation first. No other languages can benefit from a > Java one. The cornerstone of YAML will be libyaml written in C. Converting the Java scanner to C would be trivial. I'm even crazy enough to consider a single source file with some CPP macros for generating Java/C++/C implementations :-) Probably it isn't worth the effort... but the fact it is conceivable shows how close the different languages implementations would be. > ... I suggest we assign Clark as the project lead for C > libyaml, but I will try to put together a team of contributors > so that we can get it done sooner. I think now that the spec is "done", we should start talking about APIs. The above describes what I see as the best syntax-level API. We still need to define the tree-level API and for C/C++ also a graph-level one... Clark, could you (re)post your latest thoughts about this? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-02-19 07:51:21
|
Brian Ingerson [mailto:in...@tt...] wrote: > > > Trailing white space is evil. If you put it in files, I > > > think yaml should > > > complain loudly. > > > > Perhaps we should make this a warning and either decide to > > keep it or trim > > it. I'm starting to agree with Clark and Steve. I got > > bitten by trailing > > whitespace in my test suite the other day. Oren? > > I should have read ahead. Oren's comments on inline > collections seem to > indicate we should strip trailing whitespace. Let's leave the > spec like this > and see how it works over the next 3-6 months of implementation. Right. The current spec says trailing white space is ignored (trimmed) where it isn't content, and that emitters "should" avoid it. I agree with Brian that we should keep it that way. The one other place where trailing white space may cause a problem is here: this: ] Is text indented 4 spaces. The next line contains 6 spaces. The latter 2 are content. The next line contains 4 spaces. It is an empty line (encodes an LF). The next line contains 2 spaces. It as an error. Should it be considered an empty line instead? The next line contains no spaces. It is an empty line (encodes an LF). The 2 spaces is troublesome... Should I change the productions so that indent(<=n) line_break would be an empty line instead of the current indent(n)? line_break Thoughts? Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-02-19 13:58:16
|
On 19/02/02 07:51 -0000, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > > > > Trailing white space is evil. If you put it in files, I > > > > think yaml should > > > > complain loudly. > > > > > > Perhaps we should make this a warning and either decide to > > > keep it or trim > > > it. I'm starting to agree with Clark and Steve. I got > > > bitten by trailing > > > whitespace in my test suite the other day. Oren? > > > > I should have read ahead. Oren's comments on inline > > collections seem to > > indicate we should strip trailing whitespace. Let's leave the > > spec like this > > and see how it works over the next 3-6 months of implementation. > > Right. The current spec says trailing white space is ignored (trimmed) where > it isn't content, and that emitters "should" avoid it. I agree with Brian > that we should keep it that way. > > The one other place where trailing white space may cause a problem is here: > > this: ] > Is text indented 4 spaces. > The next line contains 6 spaces. > > The latter 2 are content. > The next line contains 4 spaces. > > It is an empty line (encodes an LF). > The next line contains 2 spaces. > > It as an error. Should it be > considered an empty line instead? > The next line contains no spaces. > > It is an empty line (encodes an LF). > > The 2 spaces is troublesome... Should I change the productions so that > > indent(<=n) line_break > > would be an empty line instead of the current > > indent(n)? line_break > > Thoughts? My understanding (and implementation) has always been not to penalize the document for having too few indentation chars. You should definitely make the change above. Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-02-19 14:11:23
|
Brian Ingerson [mailto:in...@tt...] wrote: > > The 2 spaces is troublesome... Should I change the > > productions so that > > > > indent(<=n) line_break > > > > would be an empty line instead of the current > > > > indent(n)? line_break > > > > Thoughts? > > My understanding (and implementation) has always been not to > penalize the > document for having too few indentation chars. You should > definitely make the > change above. OK then. What version should I start with - the 18th? Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-02-17 18:29:40
|
On 17/02/02 07:56 -0000, Oren Ben-Kiki wrote: > Clark C . Evans [mailto:cc...@cl...] wrote: > > Website updated and the main page has been changed > > (with a few fixes as Brian requested). > > Thanks. BTW, whatever these changes were, they aren't listed in the > 'changes' section; I trust they were insignificant (wording etc.)? I noticed that the explicit types are not correct in the examples in the tutorial section. I can fix these and also scan for typos if you wish. > > > | So... How about we post a message in SML-DEV (and maybe > > | even XML-DEV) asking people to take a look at it? BTW Oren, great job on keeping the spec up to date. I'm glad it's coming to a close for a while. I'll bet you are too. :) > > > > This sounds great, let's make this our "Last Call Working Draft". > > This gives us 3 months to make a full-blown "C" implementation! > > How should the announcement look? > > Speaking of "Last call" there's one unavoidable issue I found with the > current spec. I don't propose any changes, just that we'll be aware of the > situation. > > Consider this: [ text <1M spaces and tabs> text ] > Versus this: [ text <1M spaces and tabs> ] I don;t really understand the problem. If I were going to do this in C I would always copy the leaf bytes into a buffer until I reached a newline. At this point I would remove the trailing whitespace. There is no lookahead. You just have to grow your buffer to accomodate the whole string (and then at the end discover you've been ripped off :). Big deal. > > Ugh. No way around it that I can see. In my scanner implementation the > tactics I'm using is that if the <spaces and tabs> is too large for comfort, > I mark it as a special case and let the parser break its head on how to > treat it. Probably the best tactics there would be to assume it is content > and report a (belated) error if it isn't - nobody sane would put a huge > amount of insignificant spacing, right? > > I've grown to *hate* white space. > > Anyway, about the announcement... > > I suggest we rename the version to "call-for-comments-1" and change the > status section as follows: > > "This specification is a working draft and reflects consensus reached by the > members of the yaml-core mailing list. Any questions regarding this draft > should be raised on this list. While this is a draft version, it is intended > that no changes be made for a period of three months, during which comments > would be sought and initial implementations would be created. Reviewers and > implementers are encouraged to use this draft as reference and to send their > comments to the mailing list." > > We should also rename the "changes from previous versions" section to > "versions history" and only list links to the previous versions. The full > list of changes would be confusing to newcomers; a link to previous versions > should satisfy anyone who is really interested. > > Once we do that, we should send a message along the following lines > (basically lifting the text from the abstract and status sections): > > Subject: First YAML "Call for comments" draft released > > YAML(tm) (rhymes with "camel") is a straightforward machine parsable data > serialization format designed for human readability and interaction with > scripting languages such as Perl and Python. YAML is optimized for data > serialization, configuration settings, log files, Internet messaging and > filtering. > > The latest version of the YAML specification is available at > http://www.yaml.org/spec/call-for-comments-1.html. While this is a draft > version, it is intended that no changes be made for a period of three > months, during which comments would be sought and initial implementations > would be created. Reviewers and implementers are encouraged to use this > draft as reference and to send their comments to the <link-to>mailing > list</link-to>. > > We can post this in SML-DEV and XML-DEV, and go about the implementation > business... > I'd love to have the C implementation done by summer. I think I'm going to push for a YAML tutorial at the O'Reilly Open Source Conference. Cheers, Brian > Have fun, > > Oren Ben-Kiki > > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Clark C . E. <cc...@cl...> - 2002-02-17 18:51:36
|
| I noticed that the explicit types are not correct in the examples in the | tutorial section. I can fix these and also scan for typos if you | wish. +1 | > Consider this: [ text <1M spaces and tabs> text ] | > Versus this: [ text <1M spaces and tabs> ] | | I don;t really understand the problem. If I were going to do this in C I | would always copy the leaf bytes into a buffer until I reached a newline. At | this point I would remove the trailing whitespace. There is no lookahead. You | just have to grow your buffer to accomodate the whole string (and then at the | end discover you've been ripped off :). Big deal. Hmm. I thought all trailing whitespace was significant. | > I mark it as a special case and let the parser break its head on how to | > treat it. Probably the best tactics there would be to assume it is content | > and report a (belated) error if it isn't - nobody sane would put a huge | > amount of insignificant spacing, right? I'd report it as content and then don't bother with an error. | > I suggest we rename the version to "call-for-comments-1" and change the | > status section as follows: Nice. I'll put this together in the next day or so. I'll wait for Brian's changes. | I'd love to have the C implementation done by summer. I think I'm going to | push for a YAML tutorial at the O'Reilly Open Source Conference. Same here. I'm going to start throwing one day a week at it, it won't be huge but till my organization makes some money, I can't really let YAML sit on the sidelines. An outline... 1. Write up a boot-strap architecture document (small) describing all of the components and how they would interact. (one day) 2. Work out a first pass interface between the components. (one day) 3. Go breath first, implement stubs of all of the components and minimal functionality for each one so that the trivial YAML document makes it through. (one day) 4. Cycle around to between the archtecture and each component "tightening the screws", each component may be visited several times in this refinement period. Components may be added/dropped as the intial archtecture comes to grips with reality. Thoughts? ;) Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Brian I. <in...@tt...> - 2002-02-17 20:05:43
|
On 17/02/02 14:10 -0500, Clark C . Evans wrote: > | I noticed that the explicit types are not correct in the examples in the > | tutorial section. I can fix these and also scan for typos if you > | wish. > > +1 > > | > Consider this: [ text <1M spaces and tabs> text ] > | > Versus this: [ text <1M spaces and tabs> ] > | > | I don;t really understand the problem. If I were going to do this in C I > | would always copy the leaf bytes into a buffer until I reached a newline. At > | this point I would remove the trailing whitespace. There is no lookahead. You > | just have to grow your buffer to accomodate the whole string (and then at the > | end discover you've been ripped off :). Big deal. > > Hmm. I thought all trailing whitespace was significant. Not on unquoted inline scalars. these are necessarily different: - foo - 'foo ' However whitespace is significant on multiline forms. > | > I mark it as a special case and let the parser break its head on how to > | > treat it. Probably the best tactics there would be to assume it is content > | > and report a (belated) error if it isn't - nobody sane would put a huge > | > amount of insignificant spacing, right? > > I'd report it as content and then don't bother with an error. In places where we can get away with it, we need to follow "If you can't see it, it ain't there". In multilines we can't get away with it unless we want to start escaping whitespace, which I most definitely do not. > > | > I suggest we rename the version to "call-for-comments-1" and change the > | > status section as follows: > > Nice. I'll put this together in the next day or so. I'll wait > for Brian's changes. OK. I'll do this this afternoon. > > | I'd love to have the C implementation done by summer. I think I'm going to > | push for a YAML tutorial at the O'Reilly Open Source Conference. > > Same here. I'm going to start throwing one day a week at it, > it won't be huge but till my organization makes some money, I > can't really let YAML sit on the sidelines. An outline... > > 1. Write up a boot-strap architecture document (small) describing > all of the components and how they would interact. (one day) > > 2. Work out a first pass interface between the components. (one day) > > 3. Go breath first, implement stubs of all of the components and > minimal functionality for each one so that the trivial YAML > document makes it through. (one day) > > 4. Cycle around to between the archtecture and each component > "tightening the screws", each component may be visited several > times in this refinement period. Components may be added/dropped > as the intial archtecture comes to grips with reality. > > Thoughts? Sounds good so far. I know there will be a lot of issues. I think that I'll let you guys work on the C parser while I work on a C loader for Perl in parallel. That way we have a real application driving the process. The parser needs to be the top priority. I think we can release libyaml with just a parser. An emitter interface can always come later on. Emitters are easy to implement in native languages like Perl, because you don't need to account for the entire spec to make something fairly useful. > > ;) Clark > > > -- > Clark C. Evans Axista, Inc. > http://www.axista.com 800.926.5525 > XCOLLA Collaborative Project Management Software |
From: Clark C . E. <cc...@cl...> - 2002-02-18 16:00:01
|
On Sun, Feb 17, 2002 at 12:05:32PM -0800, Brian Ingerson wrote: | | > Hmm. I thought all trailing whitespace was significant. | | Not on unquoted inline scalars. | | these are necessarily different: | - foo | - 'foo ' | | However whitespace is significant on multiline forms. Any reason why it's _in_sigificant on single-line scalars but significant for multiline forms? Why not just have it significant everwhere. | > I'd report it as content and then don't bother with an error. | | In places where we can get away with it, we need to follow "If you can't see | it, it ain't there". In multilines we can't get away with it unless we want | to start escaping whitespace, which I most definitely do not. In general, I agree. However, since we are making whitespace significant for multiline forms, I don't see the harm in making it significant for single-line forms. Certainly 'foo ' is the best way to write trailing whitespace... Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Steve H. <sh...@zi...> - 2002-02-18 16:17:27
|
> On Sun, Feb 17, 2002 at 12:05:32PM -0800, Brian Ingerson wrote: > | > | > Hmm. I thought all trailing whitespace was significant. > | > | Not on unquoted inline scalars. > | > | these are necessarily different: > | - foo > | - 'foo ' > | > | However whitespace is significant on multiline forms. > > Any reason why it's _in_sigificant on single-line scalars > but significant for multiline forms? Why not just have > it significant everwhere. > > | > I'd report it as content and then don't bother with an error. > | > | In places where we can get away with it, we need to follow "If you can't see > | it, it ain't there". In multilines we can't get away with it unless we want > | to start escaping whitespace, which I most definitely do not. > > In general, I agree. However, since we are making whitespace > significant for multiline forms, I don't see the harm in > making it significant for single-line forms. Certainly > 'foo ' is the best way to write trailing whitespace... > How about? 1) Emitters quote strings with trailing white space by default, unless quote_strings_with_trailing_white_space turned off. 2) Parsers respect unquoted strings with trailing white space, but warn user. Users can turn off those warnings as needed. |
From: Brian I. <in...@tt...> - 2002-02-18 17:47:22
|
On 18/02/02 11:10 -0500, Steve Howell wrote: > > On Sun, Feb 17, 2002 at 12:05:32PM -0800, Brian Ingerson wrote: > > | > > | > Hmm. I thought all trailing whitespace was significant. > > | > > | Not on unquoted inline scalars. > > | > > | these are necessarily different: > > | - foo > > | - 'foo ' > > | > > | However whitespace is significant on multiline forms. > > > > Any reason why it's _in_sigificant on single-line scalars > > but significant for multiline forms? Why not just have > > it significant everwhere. > > > > | > I'd report it as content and then don't bother with an error. > > | > > | In places where we can get away with it, we need to follow "If you can't see > > | it, it ain't there". In multilines we can't get away with it unless we want > > | to start escaping whitespace, which I most definitely do not. > > > > In general, I agree. However, since we are making whitespace > > significant for multiline forms, I don't see the harm in > > making it significant for single-line forms. Certainly > > 'foo ' is the best way to write trailing whitespace... > > > > How about? > > 1) Emitters quote strings with trailing white space by default, unless > quote_strings_with_trailing_white_space turned off. The YAML spec does not dictate how emitters work as long as they are compliant with the spec. You are free to implement pragmas like this in Python, however... > 2) Parsers respect unquoted strings with trailing white space, but warn user. > Users can turn off those warnings as needed. I don't see this buys us any functionality. It just opens the door for confusion and hard to debug YAML document errors. Cheers, Brian |
From: Brian I. <in...@tt...> - 2002-02-18 17:43:37
|
On 18/02/02 11:19 -0500, Clark C . Evans wrote: > On Sun, Feb 17, 2002 at 12:05:32PM -0800, Brian Ingerson wrote: > | > | > Hmm. I thought all trailing whitespace was significant. > | > | Not on unquoted inline scalars. > | > | these are necessarily different: > | - foo > | - 'foo ' > | > | However whitespace is significant on multiline forms. > > Any reason why it's _in_sigificant on single-line scalars > but significant for multiline forms? Why not just have > it significant everwhere. In multiline forms you can't help it because there are no quotes. The only alternative would be to escape spaces and tabs which I don't think anybody wants. In single line forms stripping trailing whitespace is a feature. We strip leading whitespace. Why would I ever expect the following two lines to be different: - foo - foo # the second one has trailing spaces This stuff was decided a *long* time ago. > > | > I'd report it as content and then don't bother with an error. > | > | In places where we can get away with it, we need to follow "If you can't see > | it, it ain't there". In multilines we can't get away with it unless we want > | to start escaping whitespace, which I most definitely do not. > > In general, I agree. However, since we are making whitespace > significant for multiline forms, I don't see the harm in > making it significant for single-line forms. Certainly > 'foo ' is the best way to write trailing whitespace... > > Clark > > -- > Clark C. Evans Axista, Inc. > http://www.axista.com 800.926.5525 > XCOLLA Collaborative Project Management Software |
From: Steve H. <sh...@zi...> - 2002-02-18 17:55:08
|
> In single line forms stripping trailing whitespace is a feature. We strip > leading whitespace. Why would I ever expect the following two lines to be > different: > > - foo > - foo > # the second one has trailing spaces > > This stuff was decided a *long* time ago. > Hmm, when you put it like that. :) Trailing white space is evil. If you put it in files, I think yaml should complain loudly. Leading white space is more harmless. Leading white space is usually there for alignment or due to space-bar-happy data entry clerks. If you put leading white space in files, I think yaml should just quietly trim it off for you. |
From: Brian I. <in...@tt...> - 2002-02-18 18:11:08
|
On 18/02/02 12:48 -0500, Steve Howell wrote: > > In single line forms stripping trailing whitespace is a feature. We strip > > leading whitespace. Why would I ever expect the following two lines to be > > different: > > > > - foo > > - foo > > # the second one has trailing spaces > > > > This stuff was decided a *long* time ago. > > > > Hmm, when you put it like that. :) > > Trailing white space is evil. If you put it in files, I think yaml should > complain loudly. Perhaps we should make this a warning and either decide to keep it or trim it. I'm starting to agree with Clark and Steve. I got bitten by trailing whitespace in my test suite the other day. Oren? > > Leading white space is more harmless. Leading white space is usually there for > alignment or due to space-bar-happy data entry clerks. If you put leading white > space in files, I think yaml should just quietly trim it off for you. > > > > > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Brian I. <in...@tt...> - 2002-02-18 18:19:38
|
On 18/02/02 10:11 -0800, Brian Ingerson wrote: > On 18/02/02 12:48 -0500, Steve Howell wrote: > > > In single line forms stripping trailing whitespace is a feature. We strip > > > leading whitespace. Why would I ever expect the following two lines to be > > > different: > > > > > > - foo > > > - foo > > > # the second one has trailing spaces > > > > > > This stuff was decided a *long* time ago. > > > > > > > Hmm, when you put it like that. :) > > > > Trailing white space is evil. If you put it in files, I think yaml should > > complain loudly. > > Perhaps we should make this a warning and either decide to keep it or trim > it. I'm starting to agree with Clark and Steve. I got bitten by trailing > whitespace in my test suite the other day. Oren? I should have read ahead. Oren's comments on inline collections seem to indicate we should strip trailing whitespace. Let's leave the spec like this and see how it works over the next 3-6 months of implementation. > > > > > Leading white space is more harmless. Leading white space is usually there for > > alignment or due to space-bar-happy data entry clerks. If you put leading white > > space in files, I think yaml should just quietly trim it off for you. > > > > > > > > > > _______________________________________________ > > Yaml-core mailing list > > Yam...@li... > > https://lists.sourceforge.net/lists/listinfo/yaml-core > > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |