You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(101) |
Jun
(157) |
Jul
(89) |
Aug
(135) |
Sep
(17) |
Oct
(86) |
Nov
(410) |
Dec
(311) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(76) |
Feb
(100) |
Mar
(139) |
Apr
(138) |
May
(234) |
Jun
(178) |
Jul
(271) |
Aug
(286) |
Sep
(816) |
Oct
(50) |
Nov
(28) |
Dec
(137) |
2003 |
Jan
(62) |
Feb
(25) |
Mar
(97) |
Apr
(34) |
May
(35) |
Jun
(32) |
Jul
(32) |
Aug
(57) |
Sep
(67) |
Oct
(176) |
Nov
(36) |
Dec
(37) |
2004 |
Jan
(20) |
Feb
(93) |
Mar
(16) |
Apr
(36) |
May
(59) |
Jun
(48) |
Jul
(20) |
Aug
(154) |
Sep
(868) |
Oct
(41) |
Nov
(63) |
Dec
(60) |
2005 |
Jan
(59) |
Feb
(15) |
Mar
(16) |
Apr
(14) |
May
(19) |
Jun
(16) |
Jul
(25) |
Aug
(19) |
Sep
(7) |
Oct
(12) |
Nov
(18) |
Dec
(41) |
2006 |
Jan
(16) |
Feb
(65) |
Mar
(51) |
Apr
(75) |
May
(38) |
Jun
(25) |
Jul
(23) |
Aug
(16) |
Sep
(24) |
Oct
(3) |
Nov
(1) |
Dec
(10) |
2007 |
Jan
(4) |
Feb
(5) |
Mar
(7) |
Apr
(29) |
May
(38) |
Jun
(3) |
Jul
(1) |
Aug
(17) |
Sep
(1) |
Oct
|
Nov
(11) |
Dec
(16) |
2008 |
Jan
(11) |
Feb
(4) |
Mar
(7) |
Apr
(48) |
May
(17) |
Jun
(9) |
Jul
(6) |
Aug
(12) |
Sep
(5) |
Oct
(7) |
Nov
(4) |
Dec
(11) |
2009 |
Jan
(15) |
Feb
(28) |
Mar
(12) |
Apr
(44) |
May
(6) |
Jun
(16) |
Jul
(6) |
Aug
(37) |
Sep
(107) |
Oct
(24) |
Nov
(30) |
Dec
(22) |
2010 |
Jan
(8) |
Feb
(16) |
Mar
(11) |
Apr
(28) |
May
(9) |
Jun
(26) |
Jul
(7) |
Aug
(25) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
2011 |
Jan
(5) |
Feb
(6) |
Mar
(3) |
Apr
(2) |
May
(10) |
Jun
(44) |
Jul
(11) |
Aug
(8) |
Sep
(6) |
Oct
(42) |
Nov
(19) |
Dec
(5) |
2012 |
Jan
(23) |
Feb
(8) |
Mar
(9) |
Apr
(11) |
May
(2) |
Jun
(11) |
Jul
|
Aug
(18) |
Sep
(1) |
Oct
(15) |
Nov
(14) |
Dec
(8) |
2013 |
Jan
(5) |
Feb
(13) |
Mar
(2) |
Apr
(10) |
May
|
Jun
(6) |
Jul
(17) |
Aug
(2) |
Sep
(3) |
Oct
|
Nov
(11) |
Dec
|
2014 |
Jan
|
Feb
(1) |
Mar
(10) |
Apr
(12) |
May
(1) |
Jun
(9) |
Jul
(27) |
Aug
(5) |
Sep
(13) |
Oct
(9) |
Nov
(9) |
Dec
|
2015 |
Jan
(8) |
Feb
(5) |
Mar
(1) |
Apr
(10) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(2) |
Oct
(14) |
Nov
(1) |
Dec
(6) |
2016 |
Jan
(12) |
Feb
(12) |
Mar
(133) |
Apr
(7) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
(3) |
Nov
(5) |
Dec
|
2017 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
|
May
(1) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
(8) |
Oct
(2) |
Nov
(8) |
Dec
(1) |
2018 |
Jan
(1) |
Feb
(2) |
Mar
(6) |
Apr
|
May
(1) |
Jun
(4) |
Jul
(1) |
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2019 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2020 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(6) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2021 |
Jan
(5) |
Feb
(2) |
Mar
(6) |
Apr
(1) |
May
(1) |
Jun
(3) |
Jul
|
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(4) |
2022 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(1) |
Dec
(1) |
2023 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(2) |
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Brian I. <briani@ActiveState.com> - 2001-05-19 15:30:35
|
"Clark C . Evans" wrote: > There are three syntax forms for scalar values, > "block", "stream", and "mime" which are used to > accomidate differing needs for expressing content. Sounds like a good idea... > > Examples of the block form include: These are my ideas from earlier, so I like them, of course. > > Examples of stream form include: > > a: This is a simple one liner ok > b: This has intermediate spaces. ok i think. > c: This is a multi line > value that wraps\taround and\n > uses an end of line marker. uh huh > d: > \n X > \n X X > \n X X X > \n > \n This one is just like "c" above. > \n Except that \\ escaping is required. Why doesn't whitespace get folded between \n and X. I'm just not perfectly clear on this. And no emitter would ever *produce* this by default. Right? In other words: \n X\n X X\n X X X\n\n This one is just like "c" above.\n Except that \\ escaping is required. Also, you haven't answered how to escape these simple values. e: "%" f: "@" g: "~" h: "% = 1.2" I have a solution. See my next post... > > An example of the mime form is... > No changes there. Good. -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Oren Ben-K. <or...@ri...> - 2001-05-19 08:30:42
|
Brian Ingerson wrote: > I was chatting on IRC to some Perl gurus who suggested that YAML is no > more a Markup Language than MIDI. Or than a kayak is an automobile. (To > quote them) I agree. But everyone knows YAML stands for YAML Ain't a Markup Language :-) We could also switch the name... though we had better do that fast. Clark's hat again, I guess. Oren. |
From: Oren Ben-K. <or...@ri...> - 2001-05-19 08:25:15
|
Brian Ingerson wrote: > > ... this makes it difficult to provide an extra > > "class" property to list/scalar. > > But you don't need a class name to deserialize into these three. They're > automatic. Just for other classes. Right? I'd agree for lists, but for scalars you may want the type to be "float" instead of "string", or some such. > I need to go to sleep 3 hours ago. I just woke up, full of vim and vigor :-) > I'm trying to follow this, and I > think I'm agreeing with the logic. But I'm puzzled as to why the above > code represents a point object. I don't see a class name of "point". Is > this a typo? Or am I just as spent as I feel? Probably :-) A 'point' class is rather useful if you are dealing with geometry. You can pick another example - would "invoice" be better (id, price, description, date, ...)? Happy dreams ;-) Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-05-19 08:18:25
|
"There's always one more"... I got a way around the need to "further indent" the second line in a list block value, given Clark's new white space handling rules (which I like): list @ value1 value2 |block |value3 \ |value4 - block with an empty | \line followed by a longer on |e which was broken \ |value5 - verbatim single line value6 That is, in a list, a block ends by an empty \ line, at least if followed by another block. It may be simpler/more readable to require "a block ends with an empty \" everywhere in lists: |verbatim single line value5 \ value6 What do you think? At any rate, no scalar marker required, ever. How about it, Brian? Can we agree on the 'minimal' option, based on this? Also, speaking of Clark's latest E-mails about whitespace handling, maybe we should re-think the whole RFC822 issue. One benefit is that a YAML parser won't have to implement the strange RFC822 text parsing rules, ever. I'm catching Brian's headache when trying to figure these out, and I have no confidence at all in being able to define a subset which is (1) simple and (2) will work on most existing such headers. Clark is being optimistic here, I think; definitely his current white space rules aren't such a subset. If we give up on it we could also use '=' for separating keys and values. Using Clark's notion that the default key is the empty one: point % x % = 1.2 accuracy = 0.1 Maybe even allow a shorthand: point % x % = 1.2 accuracy = 0.1 Everyone wins: Brian gets his '=', I get my '= default', Clark gets his "default key is empty", and we all gain from a simpler implementation. That's the neatest solution so far, I think... I'm signing out for the next day or so - flying away to my XP conference. I'll try to catch up on my E-mail Sunday evening (my time). Until then, Oren Ben-Kiki |
From: Brian I. <briani@ActiveState.com> - 2001-05-19 08:13:28
|
Oren Ben-Kiki wrote: > > Brian Ingerson wrote: > > I can live without class names for Scalars and Arrays, but Hashes are a > > must. Most perl objects map into Hashes. > > Yes. > > > Even Array and Scalar ones can > > be serialized into Hashes. > > Sounds interesting. How? Hacked in at least: key : #class % __type__ : ARRAY 0 : first elem 1 : 2nd elem 2 : 3rd elem Or something like that. > > > Hashes work for all Python objects and > > Javascript too, AFAIK. I'll bet they could work for Java and C++ as > > well. > > In C++ (or Java) you'd want to de-serialize map/list/scalar as standard > library hash/vector/string; this makes it difficult to provide an extra > "class" property to list/scalar. But you don't need a class name to deserialize into these three. They're automatic. Just for other classes. Right? > > Clark wrote: > > Classes will be in. I think we should allow these > > permutations since the syntax seems to allow them. > > We will need a disclaimer stating that not every > > YAML system will be able to preserve the round-trip > > information for some combinations, namely: > > > > a) Having & and * on the same line. > > b) Using # with @ or with a scalar > > I'd rather say - "NO strict YAML 1.0 system" instead of "Not every YAML > system". These constructs simply aren't YAML 1.0, period. Otherwise, people > will be very surprised when they write: > > point: % > x: #float 1.2 > y: #float 3.4 > > And notice the YAML pretty printer strips away the '#float's. > > Let's think a sec about *why* we want to add classes. It seems they are > intended to allow a de-serialization into application-specific native data > types (that is, something other then Hash/Vector/String). > > I think that 99% of the time, the application specific data type is an > object class (like "point", above). In which case it is represented in YAML > as a map, with a key for each data member. It follows that the data type for > each key is already determined by the map's class; e.g., class "point" > expects float coordinate values. So there's no need to explicitly declare > it. I need to go to sleep 3 hours ago. I'm trying to follow this, and I think I'm agreeing with the logic. But I'm puzzled as to why the above code represents a point object. I don't see a class name of "point". Is this a typo? Or am I just as spent as I feel? Good night, Brian |
From: Oren Ben-K. <or...@ri...> - 2001-05-19 07:36:42
|
Brian Ingerson wrote: > I can live without class names for Scalars and Arrays, but Hashes are a > must. Most perl objects map into Hashes. Yes. > Even Array and Scalar ones can > be serialized into Hashes. Sounds interesting. How? > Hashes work for all Python objects and > Javascript too, AFAIK. I'll bet they could work for Java and C++ as > well. In C++ (or Java) you'd want to de-serialize map/list/scalar as standard library hash/vector/string; this makes it difficult to provide an extra "class" property to list/scalar. Clark wrote: > Classes will be in. I think we should allow these > permutations since the syntax seems to allow them. > We will need a disclaimer stating that not every > YAML system will be able to preserve the round-trip > information for some combinations, namely: > > a) Having & and * on the same line. > b) Using # with @ or with a scalar I'd rather say - "NO strict YAML 1.0 system" instead of "Not every YAML system". These constructs simply aren't YAML 1.0, period. Otherwise, people will be very surprised when they write: point: % x: #float 1.2 y: #float 3.4 And notice the YAML pretty printer strips away the '#float's. Let's think a sec about *why* we want to add classes. It seems they are intended to allow a de-serialization into application-specific native data types (that is, something other then Hash/Vector/String). I think that 99% of the time, the application specific data type is an object class (like "point", above). In which case it is represented in YAML as a map, with a key for each data member. It follows that the data type for each key is already determined by the map's class; e.g., class "point" expects float coordinate values. So there's no need to explicitly declare it. Can anyone come up with a use case where it makes sense to assign a class to something which isn't a data member of an object, and isn't an object by itself? The only thing I can think of is top-level keys - when the whole YAML file is de-serialized into a single object and there's no top-level element to declare the type in. I think that declaring a top-level element is good form in such a case, and that it isn't onerous to require that. In the remaining 1% of the cases, there is a workaround. Use the good old color idiom - that's exactly the sort of problem we invented it for, after all. So, taking my "# :" syntax, using the "default value" concept Clarl put in YAML for this explicit purpose (:-), and spicing it with using '=' as the key for the "default value" (pretty intuitive), you get: point: % x: % =: 1.2 #: float y: % =: 3.4 #: float OK, that's more verbose, but remember we are talking about 1% of the cases. Besides, we should eat our own dog food - either we believe in the color idiom, or we don't. A word on the color idiom: This is a concept which was raised a while back in SML-DEV, in relation to schema evolution, meta-data vs. data, etc. Take the simple 'point' example: point: % x: 1.2 y: 3.4 That's a simple 'object' with two data members, and you can write YAML code handling it. Then, one day, you decide you want to add an accuracy designation to each. Or maybe you round-trip data from XML people, and you want to round-trip the information of whether 'x' and 'y' were attributes or sub-elements. At any rate, you want to add meta-data to 'x' and 'y'. The color idiom says: "color" the 'x' and 'y' elements with sub-elements containing this meta data. Take the original value and place it in a sub-element as well (this would be the "default" sub-element Clark talked about. If you are stumped for a name for it, just use '=' by convention). You get: point: % x: % =: 1.2 accuracy: 0.1 xml-syntax: attribute y: % =: 3.4 accuracy: 0.2 xml-syntax: attribute xml-syntax: empty-tag The point is that using YAML's "default" rules, old code will continue to work unbroken on this new data structure. The value of 'x' is still '1.2'. If you aren't interested in accuracy, you just ignore it. This may effect the API in languages such as Perl/Python/JavaScript (i'll have to think of this a bit). In Perl, for example, you'll have to define a conversion to scalar context which retrieves the default value (this is possible, right? It is a bit beyond my Perl mastery). It turns out the "color idiom" solves a great many problems - schema evolution, layering of processing modules, round-tripping to other syntax forms, versioning, you name it. So Clark and I are rather fond of it :-) Have fun, Oren Ben-Kiki |
From: Brian I. <briani@ActiveState.com> - 2001-05-19 07:23:13
|
I was chatting on IRC to some Perl gurus who suggested that YAML is no more a Markup Language than MIDI. Or than a kayak is an automobile. (To quote them) Their point is that a Markup Language marks up text documents. XML is a data language that can reasonably markup text as well. YAML is not. (This was their assertion.) For instance, how would YAML markup: <QUOTE>I went to <A HREF="http://www.webvan.com">Webvan</A> to get my groceries</QUOTE> I guess their point is that YAML is a misnomer. I would probably YAML it like: QUOTE : I went to <A HREF="http://www.webvan.com" </A> to get my groceries But that still doesn't invalidate their point. Just a thought. Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-19 07:11:26
|
Since we seem to be making such wonderful progress on the language, perhaps it is time to address the requirements of class names. I see a few goals: 0. To have a brief mechanism. 1. To allow the unpacking/packing mechanism to associate particular structures with a validation techinque. 2. To allow the unpacking/packing mechanism to associate operational code with a given structure in addition to loading the data directly. 3. To enable both of these features to work in the same language environment but perhaps with different environment or "classpath" settings. 4. Ideally, to allow these same identifiers to operate across multiple languages. Item 3 and 4 imply that a class name must be globally unique, in other words, it must be accompanied by a namespace mechanism. Before we go into solutions, I want to quickly say that XML's namespace mechanism is seriously broke for a few reasons. First, they let the namespace be anything without providing any discovery or resolution mechanism... the W3C was far too flexible here. Second, the namespace mechansim is coupled with a prefix abbreviation system; originally it was specified that this prefix is not-informational but XSLT used the prefixes as informatino and thus pushed the prefixes into the canonical form.. Ouch! Third, they had several classes of how names relate to elements. Three strikes. Bad W3C. Anyway, As I'm tried, I'm not going to enumerate the options. Oren and I beat this one up one side and down the other before we came up with pretty much what Sun had already implemented with Java, only our version was more restrictive and more like RFC 882's solution. The core problem is that goal 0 (briefness) conflicts with the need for global uniqueness. Here is the proposal as I remember... There are three classes of names. 1. The first class is globally unique based on reverse DNS packages. Thus, "com.clarkevans.timesheet" is a valid class name which the user of the domain clarkevans.com has authority and control. Note that all of these names have at least one period. 2. The second class is globally unique names (without periods) as reserved through a central registry... perhaps a semi-automated mechanism at yaml.org? Thus "date" may be an example of a class in this arena. 3. The third class is not globally unique but can be used for temporary development purposes. These names start with "x-" and can be safely used *within* an organization. YAML texts using "x-" are said to be "local YAML" and should be discouraged beyond development. Notes: A. This mechanism is very well supported well by Java's package mechanism. B. This technique is possible through Python's package mechanism, but won't be perfect since they don't follow the religous DNS based system. C. I'm not sure how this works with Perl, however, "org.cpan." prepended to any CPAN module may be a very good start. D. In any case, this isn't perfect, so each YAML system will have to have some sort of "catalogue" system which associates the global identifiers with a local identifier. This is common in SGML land. (XML unsuccessfully tried to duck it) This may seem ugly, but the local catalog can be a simple YAML config file! Anyway, it is definately something to chew on for a while. However, I think it's probably as good as we can get. Perhaps Brian may have some new insights into the process given that he's not from SGML/XML land. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-05-19 06:46:32
|
Brian Ingerson wrote: | This is correct: | | text: | |this is my | |multiline? | | this is not: | | text: | |this is my | |multiline? And he wrote... | > key4 @ | > * id | > : value | > % | > key : value | Why switch indent width? | > @ | > : value Ok. Here are the indenting rules as I understand them (and have demonstrated through example, but as of yet have not laid out as a spec). Call the whitespace leading up to the first printable character of a node it's "indent" For one node c to be considered a child of another node p it must: 1. The child must be indented further than it's parent. Specifically... Node c must have an indent i formed from the concatination of the indent for p plus at least one additional whitespace character. 2. There are not any intermediate nodes in the hierarchy. Specifically... There does not exist another node b, such that b occurs after p and b occurs before c and the indent for p occurs in the indent for b and the indent for b occurs in the indent for c and the indent of b and c are not equal and the indent for p and b are not equal. 3. Indents compare literally. An indent with a tab in position n can only be equivalent to another indent with a tab in position n. 4. The indentation is consistent among siblings. If a and b are both children of p, then the indent for a is equal to the indent for b. So to your question "why switch indent with" I answer: "beacuse you can as long as it is consistent". And to your correct/incorrect item above... I don't understand. Both are legal. In particular you want to allow more than one indentation style so that fragments can be concatinated as needed and/or indented. I hope this makes sence. Based on my experiences, this is the Python method. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-05-19 06:21:46
|
It seems that the *only* place we may now need a scalar indicator is when the scalar has multiple lines and occurs within a list. I must say... this should be rather rare... | key: @ | Harry the flea | is not really hairy | |Harry the flea | |is not really hairy As Oren and Brian agree, possible, but UGLY. | 0) ':' is a key/value separator (We could change to '=' if it weren't | for self-imposed 822 restrictions) | 1) ':' also is the scalar marker (Could be still be '$') | 2) The scalar marker is optional, except to resolve ambiguity. (Lists of | single/multi-lines mixed together is the only example I can think of. If | you can think of a way out of that one <without shifting the '|' | paragraphs> then we can get rid of the scalar marker) | 3) Any instance of ': :' can be collapsed to ':' without loss of | meaning. I see four options: a) We can deal with the ugliness. b) We can re-introduce the scalar marker ($) that is optional pretty much everwhere excepting this case. c) We introduce a special marker only for this case (no optional stuff). d) We introduce indexes (did Oren propose this?) I think "a" is out beacuse it is ugly. I think "b" is out since adding a new item in the list would require re-numbering... or nastly BASIC 10, 20, 30 style uglyness. I'd also eliminate "b". If we re-introduced a scalar indicator, I strongly feel it should not be the same as the key/value separator. Also, I'm not fond of optional stuff... as it causes exception handling in people's head. Thus, if we did option "b", I'd like the indicator to be mandatory for consistency. But this is ugly. Therefore, I think "b" is out. This leaves us "c". I say we use ":" to indicate a multi-line scalar in this case and make it a documented exception. No use in trying to rationalize it... Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-05-19 06:00:40
|
Earlier I had thought that RFC822 forced condensation of consecutive white spaces into a single space like HTML. Since I was only 1/2 right (true for structured headers) and since the 1/2 that I was right about matters little to us, this requires some re-thinking. Furthermore, I've been unhappy about the "quoted string" thingy. Yucko. Therefore, below is the new scalar value proposal. ... There are three syntax forms for scalar values, "block", "stream", and "mime" which are used to accomidate differing needs for expressing content. The block form specializes in pre-formatted or raw text, the stream form works well for unformatted content, the mime form is ideal for binary values or very large chunks of formatted text. In the block form: a) trailing whitespace is significant b) end of line marker is also significant c) there is not a quoting or escaping mechanism d) each line must begin with the block symbol, which is currently "|" e) a slash character can be used in place of the block symbol for the very last line to indicate that the block does not terminate with an end-of-line marker. In the stream form: a) leading/trailing whitespace is not significant b) the end of line marker is also not significant c) consecutive whitespaces bounded by printable characters are significant d) standard slash style \n, \t, \\ style escaping is provided e) a trailing slash is used to indicate that the last character of a line and the first character of the next line are ajacent without an intermediate space In the mime form: a) the entire content is included as a seperate MIME section and may be transfer-encoded using base64 or quoted-printable b) the content is assigned an identifier c) this identifier can be used to provide a reference to the content within the YAML proper Examples of the block form include: a: | This line does not need \ escaping. b: |This is a block form |with two end of line markers. c: | | X | X X | X X X | | This one has a leading | carriage return and | does not have a trailing \ end of line marker. Examples of stream form include: a: This is a simple one liner b: This has intermediate spaces. c: This is a multi line value that wraps\taround and\n uses an end of line marker. d: \n X \n X X \n X X X \n \n This one is just like "c" above. \n Except that \\ escaping is required. An example of the mime form is... Date: Sun, 13 May 2001 23:48:04 -0400 MIME-Version: 1.0 Content-Type: multipart/related; boundary="================================" X-YAML-Version: 1.0 --================================ Content-Type: text/plain; id="0001" XX XXX XXXXX XX XX XXX XX X XX X XX XX XXXXXX XX X XX XX X XX XXXXXXX XXXX XXXXX X XX XX --================================ Content-Type: text/x-yaml raw: *(0001) --================================-- |
From: Brian I. <briani@ActiveState.com> - 2001-05-19 05:39:07
|
"Clark C . Evans" wrote: > > On Fri, May 18, 2001 at 04:45:06PM -0700, Brian Ingerson wrote: > | > text: > | > |this is my > | > |multi-line > | > | Hopefully, you saw my last comment on this. I now think the '|' should > | be in the first column *after* indentation. > > Thus allowing for... > > text: > |this is my > |multiline? No. This is correct: text: |this is my |multiline? this is not: text: |this is my |multiline? > | I can live without class names for Scalars and Arrays > > Classes will be in. I think we should allow these > permutations since the syntax seems to allow them. > We will need a disclaimer stating that not every > YAML system will be able to preserve the round-trip > information for some combinations, namely: > > a) Having & and * on the same line. > b) Using # with @ or with a scalar > > I don't like the idea of forbidding them, however. > These constructs just are not supported in every > language... oh well. If someone *has* to make use > of these constructs, they will be available at the > parser level. Sounds great. , Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-19 05:24:38
|
On Fri, May 18, 2001 at 04:45:06PM -0700, Brian Ingerson wrote: | > text: | > |this is my | > |multi-line | | Hopefully, you saw my last comment on this. I now think the '|' should | be in the first column *after* indentation. Thus allowing for... text: |this is my |multiline? Ok. I can rationalize this... | Actually there's no restriction. It would look like this: | | text: |does X align vertically? | | X Right. | > Also, to enable concatination, I think we should | > allow IDs to be shadowed. In other words, | | I support this. Good. | I can live without class names for Scalars and Arrays Classes will be in. I think we should allow these permutations since the syntax seems to allow them. We will need a disclaimer stating that not every YAML system will be able to preserve the round-trip information for some combinations, namely: a) Having & and * on the same line. b) Using # with @ or with a scalar I don't like the idea of forbidding them, however. These constructs just are not supported in every language... oh well. If someone *has* to make use of these constructs, they will be available at the parser level. | Now we could make a distinction between Objects and Hashes. I'd like the distinctions to reflect differences in the YAML syntax... not in how a host language treats the distinctions. Hopefully this makes sense. | Don't leave me without an object model or I'll go Postal % We won't leave you without objects. It is an explicit goal not to have our own object model, like XML has DOM. What a mess. *smile* Best, Clark ----- End forwarded message ----- |
From: Clark C . E. <cc...@cl...> - 2001-05-19 05:04:57
|
I've been reading quite a bit more on RFC822, some of the information that I have stated is partially correct. This is an attempt to clarify. 1. There are two types of header fields in RFC822, structured and unstructured. 2. In both cases, the header name may only include printable characters (33-126) excluding the colon. Thus, a space is not permitted between the end of the name and the colon. 3. In both cases, a header value may contain any sequence of ASCII characters (1-127), although CR and LF are not significant. 4. In both cases, before any whitespace character, a CRLF pair can be inserted or removed as necessary to wrap the line to the required number of columns (76 in old systems, 250 in newer systems) 5. In both cases, each header line is delimited by a CRLF followed by a printable ascii character in line one. Note that this is consistent. 6. Structured fields add significantly more constraints: a. They introduce the notion of "comments" by using the parenthesis b. They have the concept of a domain literal, which is specific to e-mail requirements. c. They have "quoted strings" used to allow the usage of special characters without escaping. d. Read the spec... it's tedious. Note: In particular (I mis-informed earlier...) consecutive whitespace in quotes are not preserved within structured fields! 7. Structured fields also treat multiple linear space characters (tabs and spaces) as a single space. 8. The header terminates with a CRLFCRLF sequence, other wise known as a blank line. 9. For RFC 822 compliance, mandatory headers must include: Date, From and either To or BCC. 10. Interesting to note that vCard uses the semi-colon in names to indicate a type hierarchy, it also includes key=value parameters, for example... TEL;WORK;FORMAT=X: 11. Also interstingly, mbox is concatinated RFC (header+body) messags, where a "From" line (without the colon) containing e-mail address and a date is used to indicate the start of a new message Impacts: A. To remain "consistent", we should use the colon as the magic seperator between the key and value in a map. Further, we should allow the key to be flush against the colon. B. Due to the folding constraints (#4 above), YAML will not be valid RFC 882. No way around this without significant changes, and significant changes are not possible. C. Previous ideas that we needed "quoting" to follow RFC 882 was incorrect. We are free to design what ever meaning we require for "quoted strings". D. The multiple space condensation rules only apply to *Structured* RFC822 headers. Thus, this process may have to be suspended when in the RFC822 headers. E. The mandatory headers can be used as a guide as to if the section is RFC822 or YAML. Summary: A minimal RFC822 support (unstructured only) is going to be a cake walk to implement and will be included in the spec. We may want to re-consider our "a b" technique... although I still like it. I apologize for any mis-conceptions that I may have propigated earlier. Kind Regards, Clark |
From: Brian I. <briani@ActiveState.com> - 2001-05-19 00:32:58
|
Oren Ben-Kiki wrote: > > > > I assume the trailing ':' is a typo? > > > > No. See earlier post message for the reasoning. > > > > That leaves "class" as the only problematic issue... > > > > I've read through this briefly, but don't have time to comment yet. > > Let's stick with the original syntax for now. > > I don't know about that; it is easier not putting something in than taking > it out later. It *is* optional. All ':' are optional except to distiniguish a multi-line from several single lines. > > > In general, keep in mind that YAML 1.0 will *not* be the final YAML > > spec. It will evolve to YAML 2.0 and so on. For now, let's strive for > > maximum sytactic simplicity. > > That's why I'd rather leave #class out if it until proven necessary. * ingy is trying to suppress postal tendencies * ;) > > While your comment on aesthetics may be true, there is a major > > distinction between what you think a ':' means and my intent. > > > > 1) A ':' is always a key value separator. We agree on that, but each > > want it to have one other meaning. > > 2) You want colon to be a "list bullet" in list context. > > 3) I want ':' to mean '$' for scalar values. And I want it to almost > > always be optional (unless there is ambiguity) > > 4) That said. We can make it the canonical/default form for emitters if > > we wish. > > Actually, I thought of differently, since I started with thr RFC822 frame of > mind. The idea was to combine two concepts: Ahh. Whacking everything with the RFC822 hammer ;) > - Unify Clark's Python-like indentation with RFC822 concept that (more) > indented lines continue a value; > - Make each YAML "element" have an RFC822 header line of its own. > > > 3) Minimal > > Nope. Minimal is: > > > key7 : #class3 @ > > Tom the flea > > Dick the flea > Harry the flea > is not really hairy > > % > > foo : bar > > #class4 % > > FOO : BAR > > #class5 A very classy flea > > #class6 > > |My favorite fleas: > > | Jim > > | Bob > > You don't really need it. Since the next line is "more indented", it is a > continuation line. It also works for aligned text: > > |Harry the flea > |is not really hairy Big ouch. > Pretty it isn't, but it is minimal :-) Agree. On both counts. > Here's another option (call it 5): > > If we want to think of ':' as a "scalar marker", we could say that the > syntax is: > > map % > key1 % (id1) > key : value > key2 : value > key3 * (id2) id1 Boggle %l > key4 @ > * id > : value > % > key : value Why switch indent width? > @ > : value > > This is consistent; a value is always prefixed by its marker (@, %, : or *). > No need to write ": %" or for that matter "map: %"; in a map, the syntax is > <key> <value> where the ':' is just one of the options. RFC822 is simply a > top level map with keys having only text values. This might have a chance of general acceptance if you replaced ':' with '$' in all cases. You won't accept that because of the 822 thing. And I couldn't accept it because a key/value separator is visually important if nothing else. > > BTW, I tried switching back to (id) instead of &id - that's consistent with > RFC822's "comment" concept, emphasising the id is not part of the data > model. We could keep it as &id, it doesn't matter much. I can go either way. > Aesthetic is important, or S-expressions would rule over the world. Agreed, but its not a common aesthetic. It's your personal one. My feeling is that it would bother a lot of people, especially if it was required. > > Besides, aesthetics aside, being consistent is also important. This rules > out option 4 for me, even though it looks nice (consistency over aesthetics > :-). Options 1 and 2 are too noise for me... That leaves us with option 3 > (the corrected one, without the ':'), option 5, and my original option 6. I see no inconsistencies in #4, if you accept the premises I laid out: 0) ':' is a key/value separator (We could change to '=' if it weren't for self-imposed 822 restrictions) 1) ':' also is the scalar marker (Could be still be '$') 2) The scalar marker is optional, except to resolve ambiguity. (Lists of single/multi-lines mixed together is the only example I can think of. If you can think of a way out of that one <without shifting the '|' paragraphs> then we can get rid of the scalar marker) 3) Any instance of ': :' can be collapsed to ':' without loss of meaning. > Have fun, I cut myself shaving today. That's about all the fun I've had so far... Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Brian I. <briani@ActiveState.com> - 2001-05-18 23:48:19
|
"Clark C . Evans" wrote: > How about we define indentation as leading whitespace? In this > case, the minimal indentation for this style would be two. > > text: > |this is my > |multi-line > Hopefully, you saw my last comment on this. I now think the '|' should be in the first column *after* indentation. > | > | > F) I'd like to push for this always starting on the next line if it is a > | > map value. It has no relation to RFC822. > | > | What's the harm in allowing: > | > | text: | Spaces and " and \n, oh my! > | > | ('"' and '\' meaning themselves in this text). > | > | I don't see why we need to make this distinction between multi- and single- > | line text at all. It is bad enough we provide two different quoting > | mechanisms... > > Beacuse then we have to answer questions like... > > text: |does X align vertically? > | X > > I like Brian's restriction on this form. Actually there's no restriction. It would look like this: text: |does X align vertically? | X But this would *not* be the default. > Yes. Option C. For the sequential API, we will have > to expose this, but not in the in-memory representation. > And in the sequential API, it can be wrapped as an > opaque handle (without a string representation). > Under no circumstances do we want this ID to become > "data" as the XML prefix has... less we not have a > reasonable canonical form. > > Also, to enable concatination, I think we should > allow IDs to be shadowed. In other words, > > a : &0001 "This is a value" > b : *0001 > c : &0001 "This is another value" > d : *0001 > > In this case, d->c and b->a. After c, there is no > way to access a by reference. Simple solution, > and this way concatination is still well defined. > I support this. > | That leaves "class" as the only problematic issue. We explicitly > | decided not to talk about it in the conference call. It seems to > | me like there's no way around requiring that this data will survive > | round-trips, but I also don't see how it is possible to de-serialize > | "scalar value" into a normal "Java String" if someone attached an > | "unknown" class to it. > > If classes arn't available in the target environment, > or if the class requested can't be found, then we > have a slight problem. A resonable solution is > to notify the user via a warning, and then create > an auxilary yaml-class-map which maps lists/strings/maps > that have been created by the load with their > corresponding class name. In this way we keep the > native structures, but preserve the class names through > a "coloring archive" on the side. > > | So, the idea is to bite the bullet and remove "class" > | as something specified in the "label line" (BTW, we need > | to define some terminology here; I'm using "label lines" > | and "text lines" - or maybe it should be "content lines"?). > | It turns out that we can still achive most of the goals > | of the "class" construct by making the key "#" magical in maps: > | > | center: % > | #: point > | x: 35.3 > | y: 42.1 > > Interesting. However, this prevents class names for > Strings or Lists. Very interesting. What do we do about > Strings and Lists? Move this into the category of "non-portable" > constructs, like a & and * on the same line? I'm not sure. > The "coloring on the side" may be more painful (esp garbage > collecting), but it at least does not get in the way. Hmm. I can live without class names for Scalars and Arrays, but Hashes are a must. Most perl objects map into Hashes. Even Array and Scalar ones can be serialized into Hashes. Hashes work for all Python objects and Javascript too, AFAIK. I'll bet they could work for Java and C++ as well. Now we could make a distinction between Objects and Hashes. Just have Objects be separate regardless of their implementation. Just pick another symbol. '&' is available if we go back to %(0001) notation. Don't leave me without an object model or I'll go Postal %\ Still smiling, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Oren Ben-K. <or...@ri...> - 2001-05-18 22:52:52
|
> > I assume the trailing ':' is a typo? > > No. See earlier post message for the reasoning. > > That leaves "class" as the only problematic issue... > > I've read through this briefly, but don't have time to comment yet. > Let's stick with the original syntax for now. I don't know about that; it is easier not putting something in than taking it out later. > In general, keep in mind that YAML 1.0 will *not* be the final YAML > spec. It will evolve to YAML 2.0 and so on. For now, let's strive for > maximum sytactic simplicity. That's why I'd rather leave #class out if it until proven necessary. > Emailed Damian last night. He's preparing for an 11-week world speaking > tour. I'll see him in June at the YAPC (Yet Another Perl Conference) in > Montreal and I'll be sure to pin him down about YAML. BTW, I mentioned > to Clark that I'll probably be speaking about YAML at YAPC :) Now, if that was early July, I might have been able to attend. Oh well. > While your comment on aesthetics may be true, there is a major > distinction between what you think a ':' means and my intent. > > 1) A ':' is always a key value separator. We agree on that, but each > want it to have one other meaning. > 2) You want colon to be a "list bullet" in list context. > 3) I want ':' to mean '$' for scalar values. And I want it to almost > always be optional (unless there is ambiguity) > 4) That said. We can make it the canonical/default form for emitters if > we wish. Actually, I thought of differently, since I started with thr RFC822 frame of mind. The idea was to combine two concepts: - Unify Clark's Python-like indentation with RFC822 concept that (more) indented lines continue a value; - Make each YAML "element" have an RFC822 header line of its own. That's why I suggested the syntax: map: % key: value list: @ : text : % key: multi line value : @ : value Etc.; Every element is logically a single RFC822 header line, with indentation playing a dual role (both "continue a text value" and "continue a structured value"). It just happens that in some lines the "field name" is empty, that's all. The ':' isn't a "scalar marker", it is a "YAML element marker". I still don't see why you need a "scalar marker" as such, be it $, : or whatever. I realize that Perl makes such a marker a natural thing to think of, but all it does is clutter the text. (BTW, maybe one day we'd want to allow: list: @ 0 : 0th position 10 : 10th position Etc. Admittedly sparse lists aren't that much of a use case... but the above sure beats having to specify 9 null values, doesn't it? :-) > Consider the following four examples. > > 1) Fully qualified with '$'. Rather verbose... > 2) Fully qualified with ':'. The only real gain here is no " for $40.00. Right. Not much of an improvement. > 3) Minimal Nope. Minimal is: > key7 : #class3 @ > Tom the flea > Dick the flea Harry the flea is not really hairy > % > foo : bar > #class4 % > FOO : BAR > #class5 A very classy flea > #class6 > |My favorite fleas: > | Jim > | Bob > > Note that the only required ':' (besides the key/value ones) is for ': > Harry the flea' You don't really need it. Since the next line is "more indented", it is a continuation line. It also works for aligned text: |Harry the flea |is not really hairy (To clark's question about this: All the indentation, up to and including the '|' starting the line, is removed, and the result is the verbatim text. So it is well defined even if the '|' in the first line isn't on the same column as in the rest of the lines). Pretty it isn't, but it is minimal :-) > 4) Suggested canonical form: The difference between it and my original proposal is that ": %" and ": @" were collapsed into "%" and "@", as a shorthand. I'd still put an ID, say, *after* the ':', because it isn't a scalar marker... > The things I don't allow are: > ... > : : a scalar > : #class : a scalar Obviously the second ':' is unnecessary, so I agree with you here. > : % > : @ > : #class % > : #class @ Here's another option (call it 5): If we want to think of ':' as a "scalar marker", we could say that the syntax is: map % key1 % (id1) key : value key2 : value key3 * (id2) id1 key4 @ * id : value % key : value @ : value This is consistent; a value is always prefixed by its marker (@, %, : or *). No need to write ": %" or for that matter "map: %"; in a map, the syntax is <key> <value> where the ':' is just one of the options. RFC822 is simply a top level map with keys having only text values. BTW, I tried switching back to (id) instead of &id - that's consistent with RFC822's "comment" concept, emphasising the id is not part of the data model. We could keep it as &id, it doesn't matter much. Just so we'll have numbers for all of these, call my original proposal option 6: map : % sub map : % &id1 key : value text key : value ref key : * &id2 id1 list key : @ : *id : value : % key : value : @ : value > The problem with ':' as a "list bullet" is that it could not be > optional. And that's too restrictive just to satisfy a personal > aesthetic. Aesthetic is important, or S-expressions would rule over the world. Besides, aesthetics aside, being consistent is also important. This rules out option 4 for me, even though it looks nice (consistency over aesthetics :-). Options 1 and 2 are too noise for me... That leaves us with option 3 (the corrected one, without the ':'), option 5, and my original option 6. I think I like 5 the most... Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-05-18 21:55:55
|
On Fri, May 18, 2001 at 01:54:24PM -0700, Brian Ingerson wrote: | While your comment on aesthetics may be true, there is a major | distinction between what you think a ':' means and my intent. Not to make things worse, but I don't particularly like ":" being used for two different meanings. I'd rather have the $ back, people can always quote currency amounts: total : "$540.00" No problem here. | 1) Fully qualified with '$'. | | key1 : $ my dog has fleas | key2 : $ "$40.00 for veternarian exam" | key3 : $ | The vet said, "Yes Ingy, | Your dog has fleas." | key4 : $ Ingy said, "Wow, | my dog has fleas!" | key5 : #class1 $ I hate fleas | key6 : #class2 $ | What is your viewpoint | about fleas? | key7 : #class3 @ | $ Tom the flea | $ Dick the flea | $ Harry the flea | is not really hairy | % | foo : bar | #class4 % | FOO : BAR | #class5 $ A very classy flea | #class6 $ | |My favorite fleas: | | Jim | | Bob I like the above. It's clean, despite being a bit "noisy". 5) Minimal key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 I hate fleas key6 : #class2 What is your viewpoint about fleas? key7 : #class3 @ Tom the flea Dick the flea $ Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 A very classy flea #class6 |My favorite fleas: | Jim | Bob 6) Canonical key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 I hate fleas key6 : #class2 What is your viewpoint about fleas? key7 : #class3 @ $ Tom the flea $ Dick the flea $ Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 A very classy flea #class6 |My favorite fleas: | Jim | Bob Thoughts? I don't mind escaping "$382.00", as this is a small use case anyway, and I'd rather stick with Perl's type indicators. Best, Clark |
From: Brian I. <briani@ActiveState.com> - 2001-05-18 20:57:39
|
Oren Ben-Kiki wrote: > > > > 1. Brian stated that he would invstigate Oren's Syntax > > > and get back with us if it meets Perl's serilization > > > requirements for hard references. If not, specify > > > what alternatives we can use. > > > > I don't think it's that important to investigate. It will probably > > always be a moot point. I will let Data::Denter use it's current scheme > > to deterministically round-trip all Perl data structures. YAML.pm > > probably will have no need for this. It's all acadenic and I have no > > spare time for academics for three more months. (My guess is, yes it > > could be made to work, but would be suboptimal for Perl people) Let's > > leave it at that for now. > > Does that mean we are giving up on Denter using YAML syntax (extended to > handle pointer-to-pointer)? Just for the record, the Perl component for YAML is called YAML.pm. Data::Denter is only of interest to Perl programmers from this point on. It may fondly be remembered as the catalyst for YAML 1.0. And it may keep a greedy eye on the YAML projects treasures, but that's of no concern here. > > I'm going to go over it with a fine-tooth comb, just to see what is involved > in making YAML a superset of it. I guess I'll also have to look at MIME > while I'm at it, with the same comb :-) Beware of the nits! Nasty buggers. ;) > > > On 4 & 5. I don't really like the blank line at the beginning thing > > because people will mess it up or not understand it. And we have many > > heuristic options. > > > > A) Parse lookahead for X-YAML-Version > > B) Option-A rarely needed because as soon as we see a key that is *not* > > RFC822 compliant, we assume YAML. 99% of the time this is the first > > line! > > C) If there is no whitespace allowed before the colon in RFC822, we > > simply make it a requirement in YAML. Or does this break your RFC > > compatability rules? > > > > Just for my own edification, would you please explain the rationale > > behind making YAML RFC822 compliant. And do so with one of more specific > > examples. Thanks :) > > Well, for example, suppose that YAML was a "good enough" superset of RFC822. > Then we could just adopt my idea that "blank lines separate top-level maps" > and we wouldn't have to say anything further about RFC822 headers, period. > If one wants to read/write a mail message as a YAML document, then it will > simply work (as long as he sticks to the "safe" constructs there). If one > wants to have a YAML document that has nothing to do with RFC822, that also > works. No need for any special statement about them. I like this approach > best. I think that sounds right, if I understand it correctly. My only contention above was the very first blank line, not the ones separating documents. > > > > " this is the hash\n key for this example :-) " : #class : > > I assume the trailing ':' is a typo? > No. See earlier post message for the reasoning. > > |# My Perl Subroutine > > | > > | sub version { > > | if ($_[0] =~ /\n/) { > > | return \ "\to sender"; > > | } > > | } > > > > Sorry for overloading this example with so many weird things. I'll just > > comment on the multiline semantics: > > > > A) Trailing whitespace is preserved if the transporter preserves it. > > B) The content can always be encoded before transport anyway. > > C) Nothing is escaped. The content is truly verbatim. A '\' is a '\'. > > D) An implicit newline is assumed to be at the end of every line. > > We have to decide what our position is about them, BTW. Is a newline a "\n" > or a "\n\r" - the answer may be different in-memory and in the text file > (and thank you, O nameless DOS/CPM programmer, for inflicting this on us :-) Bastard of Bastards. :( But I think the heuristic is quite simple. Since the newline is implicit, just replace whatever is there with the system's native choice. > > > E) Note that the '|' is one column back from the actual indentation > > level. This is intententional. And it will work even if the indent width > > is set to one character wide. (not mandatory, but I like it.) > > Under Python indentation rules, there's no problem indenting the "label" > line by 4 characters and the text lines by 7, or whatever. What you say > about one character indentation, however, implies that the following would > be legal: Yes. It would be legal. > > text: > |multi-line > |text > > I'm not certain I like it. I think Clark should make the call here - > indentation is his baby. I actually don't like it for another subtle reason. Tabs. You couldn't use them properly with this scheme. So let's scrap the backing up one space requirement. And yes, that's my final answer ;) > > I started thinking about it and hit on an issue which Brian may already have > thought about - or will have to very soon, if he's covering YAML.pm :-) The > problem is we haven't defined the data model (or, viewing it differently, > the round-tripping issue). > > In "dynamic" languages such as Perl, JavaScript, Python (and to some extent, > Java), it is natural to map a YAML map to the native hash, a list to a > vector/array, and a scalar value to a simple string. That works admirably > well, as long as the YAML entity hasn't been annotated with an ID or a class > name. > > If one wants to provide a stable-round tripping utility (e.g., suppose I > want to write a YAML pretty printer), where am I to store the ID of a scalar > value? The class of a map? For this use case, it seems my best course of > action is to wrap the native construct (map/list/scalar) in an object which > has an "id", a "class", and a "value". > > There are several options: > > A) Use the native constructs when possible, and only use "wrapper" objects > when there's a need. That makes access pattern unpredictable: do I write > map{key} or map{key}.value? That's my idea. > > B) Always use wrapper objects, and give up on de-serializing YAML into > arbitrary native data structures. Big hit on usefulness - if we do this, > Brian will just give up on us :-) You're getting to know me pretty well ;) > > C) Declare that IDs may be re-written arbitrarily, even by pretty printers. > That is, banish them from the data model. I think I agree... > > That leaves "class" as the only problematic issue. We explicitly decided not > to talk about it in the conference call. It seems to me like there's no way > around requiring that this data will survive round-trips, but I also don't > see how it is possible to de-serialize "scalar value" into a normal "Java > String" if someone attached an "unknown" class to it. I've read through this briefly, but don't have time to comment yet. Let's stick with the original syntax for now. In general, keep in mind that YAML 1.0 will *not* be the final YAML spec. It will evolve to YAML 2.0 and so on. For now, let's strive for maximum sytactic simplicity. I think we can special case the semantics of 1.0 without needing to change the current syntax. > > > > > > 12. Brian mentioned that he'd show YAML to one of > > > his Perl friends. (sorry I didn't catch his name) > > > > Damian Conway http://www.csse.monash.edu.au/~damian/ > > His input will be greatly appreciated. Emailed Damian last night. He's preparing for an 11-week world speaking tour. I'll see him in June at the YAPC (Yet Another Perl Conference) in Montreal and I'll be sure to pin him down about YAML. BTW, I mentioned to Clark that I'll probably be speaking about YAML at YAPC :) > > > 15. Clark agreed to write up the "single vs multi" > > > line controversy and post to the list so that > > > it is clearly understood. > > I thought we settled this... Every scalar value is potentially multi-line. > It doesn't seem to cost us anything, or does it? I agree but see below. > > > > 16. We made little progress on the scalar indicator > > > for lists, to colon or not to colon. It wasn't > > > agreed, but Clark thinks this is someone else's > > > monkey. If Oren and Brian can't agree within > > > 7 days, Clark will put on the dictator cap. > > > > We traded in the '$' for the ':'. '$' as the last character in a line > > I thought ':' was the first one; it is "as if" it is a normal header, with > the key "just happening" to be empty. This seems more consistent. > > > meant a multiline scalar was to follow. Converting this semantic to the > > ':' leaves us with these represntations: > > > > key1 : @ > > single line > > : > > classless folded > > multi line > > another single line > > and another > > #class &0001 : > > : #class &0001 No, not a mistake. > > > classed multi > > line > > #class &0002 classed single line > > % > > key : value > > @ > > This is an empty list, right? Yup. Just to keep you on your toes :) > > > ~ > > And this is a null? Indeed. > > > #classy % > > key : value > > : even this multi line on the same line > > as a colon thingy works because there > > a little bit of indentation imposed by > > colon. (Although I don't love it) > > This means the following: > > : single line > > Will also work, even though you *really* dislike it. I like them :-) Noted :-) > > > : "Another thingy like above that meets" > > "RFC822 wackiness" > > : > > | 1 > > | 1 1 > > | 1 1 1 > > |Just for completeness :-) > > I think we've said everything there's to be said about this, and whether or > not you find either: > > list: > : One > : Two > : Three > and Four > > Or: > > list: > One > Two > : > Three > and Four > > To be beautiful or ugly is, when all is said and done, a matter of taste. To > you, the extra ':'s are an eyesore; to me it seems strange that the > multi-line value is "more indented"; it seems as though there's structure > involved, when there isn't. I also like being able to do /^:/ in VI to get > to the next entry. While your comment on aesthetics may be true, there is a major distinction between what you think a ':' means and my intent. 1) A ':' is always a key value separator. We agree on that, but each want it to have one other meaning. 2) You want colon to be a "list bullet" in list context. 3) I want ':' to mean '$' for scalar values. And I want it to almost always be optional (unless there is ambiguity) 4) That said. We can make it the canonical/default form for emitters if we wish. Consider the following four examples. 1) Fully qualified with '$'. key1 : $ my dog has fleas key2 : $ "$40.00 for veternarian exam" key3 : $ The vet said, "Yes Ingy, Your dog has fleas." key4 : $ Ingy said, "Wow, my dog has fleas!" key5 : #class1 $ I hate fleas key6 : #class2 $ What is your viewpoint about fleas? key7 : #class3 @ $ Tom the flea $ Dick the flea $ Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 $ A very classy flea #class6 $ |My favorite fleas: | Jim | Bob 2) Fully qualified with ':'. The only real gain here is no " for $40.00. key1 : : my dog has fleas key2 : : $40.00 for veternarian exam key3 : : The vet said, "Yes Ingy, Your dog has fleas." key4 : : Ingy said, "Wow, my dog has fleas!" key5 : #class1 : I hate fleas key6 : #class2 : What is your viewpoint about fleas? key7 : #class3 @ : Tom the flea : Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 : A very classy flea #class6 : |My favorite fleas: | Jim | Bob 3) Minimal key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 I hate fleas key6 : #class2 What is your viewpoint about fleas? key7 : #class3 @ Tom the flea Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 A very classy flea #class6 |My favorite fleas: | Jim | Bob Note that the only required ':' (besides the key/value ones) is for ': Harry the flea' 4) Suggested canonical form: key1 : my dog has fleas key2 : $40.00 for veternarian exam key3 : The vet said, "Yes Ingy, Your dog has fleas." key4 : Ingy said, "Wow, my dog has fleas!" key5 : #class1 : I hate fleas key6 : #class2 : What is your viewpoint about fleas? key7 : #class3 @ : Tom the flea : Dick the flea : Harry the flea is not really hairy % foo : bar #class4 % FOO : BAR #class5 : A very classy flea #class6 : |My favorite fleas: | Jim | Bob So in this last example we always use the optional scalar indicator ':' for all scalars in a list (by default). Note that a #class or &id *always* comes before a %, @, or :. It's just that the ':' is usually optional. The things I don't allow are: key1 : @ : % : @ : : a scalar : #class % : #class @ : #class : a scalar The problem with ':' as a "list bullet" is that it could not be optional. And that's too restrictive just to satisfy a personal aesthetic. , Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-18 20:32:33
|
| > Sorry for overloading this example with so many weird things. I'll just | > comment on the multiline semantics: | > | > A) Trailing whitespace is preserved if the transporter preserves it. | > B) The content can always be encoded before transport anyway. | > C) Nothing is escaped. The content is truly verbatim. A '\' is a '\'. | > D) An implicit newline is assumed to be at the end of every line. | | We have to decide what our position is about them, BTW. Is a newline a "\n" | or a "\n\r" - the answer may be different in-memory and in the text file | (and thank you, O nameless DOS/CPM programmer, for inflicting this on us :-) FYI, XML chose the option to convert, at parse time, "\n\r" and "\r" to "\n". In this way, anyone writing XML tools could use \n and not worry about the platform's specific line ending. XML does not specify how these must be written out, so that any of the three common line endings can be used on serilization depending on the platform. | > E) Note that the '|' is one column back from the actual indentation | > level. This is intententional. And it will work even if the indent width | > is set to one character wide. (not mandatory, but I like it.) | | Under Python indentation rules, there's no problem indenting the "label" | line by 4 characters and the text lines by 7, or whatever. What you say | about one character indentation, however, implies that the following would | be legal: | | text: | |multi-line | |text | | I'm not certain I like it. I think Clark should make the call here - | indentation is his baby. How about we define indentation as leading whitespace? In this case, the minimal indentation for this style would be two. text: |this is my |multi-line | | > F) I'd like to push for this always starting on the next line if it is a | > map value. It has no relation to RFC822. | | What's the harm in allowing: | | text: | Spaces and " and \n, oh my! | | ('"' and '\' meaning themselves in this text). | | I don't see why we need to make this distinction between multi- and single- | line text at all. It is bad enough we provide two different quoting | mechanisms... Beacuse then we have to answer questions like... text: |does X align vertically? | X I like Brian's restriction on this form. | Perhaps we should require that a YAML processor must *accept* UTF-16 files. | That goes well with the "superset" idea. If one wants to write a YAML | document which is also a mail message, he'll just write it in the default | UTF-8 encoding (or at least the first MIME part - I still have to read the | RFC properly). So we must accept both UTF-8 and UTF-16. Where UTF-16 support then implies that the RFC 822 headers are not present? Or, do we upgrade the spec and allow the headers as long as the BOM is there? | > > 10. Oren mentioned that he was thinking of doing | > > a Java or Javascript implementation. | | I started thinking about it and hit on an issue which Brian may already have | thought about - or will have to very soon, if he's covering YAML.pm :-) The | problem is we haven't defined the data model (or, viewing it differently, | the round-tripping issue). | | In "dynamic" languages such as Perl, JavaScript, Python (and to some extent, | Java), it is natural to map a YAML map to the native hash, a list to a | vector/array, and a scalar value to a simple string. That works admirably | well, as long as the YAML entity hasn't been annotated with an ID or a class | name. We will officially decree that the ID is NON INFORMATIONAL. We must be very careful about this ID. It smells alot like an XML prefix tar baby... which completely destroyed any hopes at an XML canonical form. | If one wants to provide a stable-round tripping utility (e.g., suppose I | want to write a YAML pretty printer), where am I to store the ID of a scalar | value? The class of a map? For this use case, it seems my best course of | action is to wrap the native construct (map/list/scalar) in an object which | has an "id", a "class", and a "value". I must say, I don't like this option. | A) Use the native constructs when possible, and only use "wrapper" objects | when there's a need. That makes access pattern unpredictable: do I write | map{key} or map{key}.value? | | B) Always use wrapper objects, and give up on de-serializing YAML into | arbitrary native data structures. Big hit on usefulness - if we do this, | Brian will just give up on us :-) | | C) Declare that IDs may be re-written arbitrarily, even by pretty printers. | That is, banish them from the data model. Yes. Option C. For the sequential API, we will have to expose this, but not in the in-memory representation. And in the sequential API, it can be wrapped as an opaque handle (without a string representation). Under no circumstances do we want this ID to become "data" as the XML prefix has... less we not have a reasonable canonical form. Also, to enable concatination, I think we should allow IDs to be shadowed. In other words, a : &0001 "This is a value" b : *0001 c : &0001 "This is another value" d : *0001 In this case, d->c and b->a. After c, there is no way to access a by reference. Simple solution, and this way concatination is still well defined. | That leaves "class" as the only problematic issue. We explicitly | decided not to talk about it in the conference call. It seems to | me like there's no way around requiring that this data will survive | round-trips, but I also don't see how it is possible to de-serialize | "scalar value" into a normal "Java String" if someone attached an | "unknown" class to it. If classes arn't available in the target environment, or if the class requested can't be found, then we have a slight problem. A resonable solution is to notify the user via a warning, and then create an auxilary yaml-class-map which maps lists/strings/maps that have been created by the load with their corresponding class name. In this way we keep the native structures, but preserve the class names through a "coloring archive" on the side. | So, the idea is to bite the bullet and remove "class" | as something specified in the "label line" (BTW, we need | to define some terminology here; I'm using "label lines" | and "text lines" - or maybe it should be "content lines"?). | It turns out that we can still achive most of the goals | of the "class" construct by making the key "#" magical in maps: | | center: % | #: point | x: 35.3 | y: 42.1 Interesting. However, this prevents class names for Strings or Lists. Very interesting. What do we do about Strings and Lists? Move this into the category of "non-portable" constructs, like a & and * on the same line? I'm not sure. The "coloring on the side" may be more painful (esp garbage collecting), but it at least does not get in the way. Hmm. | The idea is that the "point" class knows it has members "x" | and "y" and that their values must be floating-point numbers; | so not being to declare their type is acceptable. | | When serializing this to a language/system which doesn't | recognize "point", well, "center" will just be a run-of-the-mill | map. The only magic here is that we may require '#' to always | be printed first, but this depends on the protocol we'll be using | for constructing "point" objects when the class is recognized: | | C1) If the interface is such that there has to be a factory | method accepting a map of all the keys, and returning the | constructed object, than this restriction is unnecessary. This | is a good way to do things in Perl/JavaScript object creation | interfaces; e.g. in Perl the method will typically merely bless | the map and be done. I suspect in Python things would | be a bit less elegant. Python is very flexible here... | C2) In Java this may be less efficient (all these map and | String objects will have to be created and destroyed per each | point object creation - slooow!). Any more efficient method will | have to involve tighter interaction between parsing the values | and creating the object, which means we have to know its class | when starting to parse each member (e.g., we may be able to | parse "42.1" directly into a float). I can't see, off the top of | my head, a reasonable protocol for this right now, but we may | want to require '#' to be the first key anyway, in case something | does come up. | | My favorite is C2. It's down side is that you can't directly assign | a type for a list or a scalar; you have to assign it to a | surrounding map. I forsee a long discussion coming... And we have not even begun to talk about namespaces, or how to make a name globally unique so that it can be moved across multiple languages. This is especially important for common objects, like Date, Currency, and possibly even Party or Address. | > > 14. Clark introduced a very short discussion on | > > the need for a global mechanism to uniquely | > > identify names in a non-language specific | > > manner. | | Reverse DNS being the easiest way we've ever come up | with for this. It works directly for accessing class names | in Java. In C++ you'll have to "register" classes manually | anyway, so using reverse DNS doesn't gain or cost anything. | In Perl/JavaScript/Python we may want to set up some automatic | way to convert "org.yaml.class" into something the native | type system recognizes... Python has a package mechanism similar to Java. | > > 15. Clark agreed to write up the "single vs multi" | > > line controversy and post to the list so that | > > it is clearly understood. | | I thought we settled this... Every scalar value is potentially multi-line. | It doesn't seem to cost us anything, or does it? Good. I'll just document it in the updated spec... coming soon. Clark |
From: Clark C . E. <cc...@cl...> - 2001-05-18 19:41:11
|
----- Forwarded message from Oren Ben-Kiki <or...@ri...> ----- From: Oren Ben-Kiki <or...@ri...> Subject: RE: Meeting Minutes Date: Fri, 18 May 2001 19:18:14 +0200 > > 1. Brian stated that he would invstigate Oren's Syntax > > and get back with us if it meets Perl's serilization > > requirements for hard references. If not, specify > > what alternatives we can use. > > I don't think it's that important to investigate. It will probably > always be a moot point. I will let Data::Denter use it's current scheme > to deterministically round-trip all Perl data structures. YAML.pm > probably will have no need for this. It's all acadenic and I have no > spare time for academics for three more months. (My guess is, yes it > could be made to work, but would be suboptimal for Perl people) Let's > leave it at that for now. Does that mean we are giving up on Denter using YAML syntax (extended to handle pointer-to-pointer)? > > 2. We agreed on Oren's reference (&*) syntax. > > > > 3. We agreed on having an optiona RFC 822 Header, > > this requires that a YAML text without this > > header must begin on line #2. Furthermore, > > if an RFC 822 Header is present, then it must > > include "X-YAML-Version: 1.0" > > > > 4. Brian said that he'd investiage the RFC a bit > > more specifically relating to the productions. > > (I'm not sure if this is necessary... ) I'm going to go over it with a fine-tooth comb, just to see what is involved in making YAML a superset of it. I guess I'll also have to look at MIME while I'm at it, with the same comb :-) > On 4 & 5. I don't really like the blank line at the beginning thing > because people will mess it up or not understand it. And we have many > heuristic options. > > A) Parse lookahead for X-YAML-Version > B) Option-A rarely needed because as soon as we see a key that is *not* > RFC822 compliant, we assume YAML. 99% of the time this is the first > line! > C) If there is no whitespace allowed before the colon in RFC822, we > simply make it a requirement in YAML. Or does this break your RFC > compatability rules? > > Just for my own edification, would you please explain the rationale > behind making YAML RFC822 compliant. And do so with one of more specific > examples. Thanks :) Well, for example, suppose that YAML was a "good enough" superset of RFC822. Then we could just adopt my idea that "blank lines separate top-level maps" and we wouldn't have to say anything further about RFC822 headers, period. If one wants to read/write a mail message as a YAML document, then it will simply work (as long as he sticks to the "safe" constructs there). If one wants to have a YAML document that has nothing to do with RFC822, that also works. No need for any special statement about them. I like this approach best. > > 5. We talked at length about multi-line scalar text > > nodes. Thus far, we agreed on option D, due to > > assumed compatibility with RFC 822. Clark agreed > > to verify this compatibility. > > > > 6. Brian stated that the quoting mechanism was not > > good enough for source code or ASCII art, and > > backed option F. However, option F does not > > explicitly preserve trailing whitespace on a > > given line. So Brian suggested using > < > > pairs. Oren suggested using single quotes. > > Clark asked Brian to come up with something > > he likes and propose it. > > Neil and I agree that the normal transport mechanism between Perl and > Python serializer/parsers would definitely *not* be a mailer. And if a > mailer was used, most people wouldn't give a darn about the trailing > whitespace. And if they really did, we could just encode the whole > document anyway. So I now definitely think the best-fit answer is: > > " this is the hash\n key for this example :-) " : #class : I assume the trailing ':' is a typo? > |# My Perl Subroutine > | > | sub version { > | if ($_[0] =~ /\n/) { > | return \ "\to sender"; > | } > | } > > Sorry for overloading this example with so many weird things. I'll just > comment on the multiline semantics: > > A) Trailing whitespace is preserved if the transporter preserves it. > B) The content can always be encoded before transport anyway. > C) Nothing is escaped. The content is truly verbatim. A '\' is a '\'. > D) An implicit newline is assumed to be at the end of every line. We have to decide what our position is about them, BTW. Is a newline a "\n" or a "\n\r" - the answer may be different in-memory and in the text file (and thank you, O nameless DOS/CPM programmer, for inflicting this on us :-) > E) Note that the '|' is one column back from the actual indentation > level. This is intententional. And it will work even if the indent width > is set to one character wide. (not mandatory, but I like it.) Under Python indentation rules, there's no problem indenting the "label" line by 4 characters and the text lines by 7, or whatever. What you say about one character indentation, however, implies that the following would be legal: text: |multi-line |text I'm not certain I like it. I think Clark should make the call here - indentation is his baby. > F) I'd like to push for this always starting on the next line if it is a > map value. It has no relation to RFC822. What's the harm in allowing: text: | Spaces and " and \n, oh my! ('"' and '\' meaning themselves in this text). I don't see why we need to make this distinction between multi- and single- line text at all. It is bad enough we provide two different quoting mechanisms... > > This will work the way I intended it 98% of the time. > > > 7. We agreed, after some discussion, that a YAML > > parser must support MIME. We agreed implicity > > that it must support base64 encoding. > > > 8. We didn't discuss this... but it should be > > mentioned that to (a) support unicode and > > (b) support RFC 822, our texts must be UTF-8. > > Thus a YAML parser/writer will default to > > UTF-8, although other encoding support is > > optional. Perhaps we should require that a YAML processor must *accept* UTF-16 files. That goes well with the "superset" idea. If one wants to write a YAML document which is also a mail message, he'll just write it in the default UTF-8 encoding (or at least the first MIME part - I still have to read the RFC properly). > > 9. Clark agreed to make a "boostrap" C program > > and upload to source forge. Brian and Neil > > agreed to download and hack at will. > > As I walked to the train station with Neil, he figured out the C > implementation in his head and said he would try to get it done before > bed. > > > 10. Oren mentioned that he was thinking of doing > > a Java or Javascript implementation. I started thinking about it and hit on an issue which Brian may already have thought about - or will have to very soon, if he's covering YAML.pm :-) The problem is we haven't defined the data model (or, viewing it differently, the round-tripping issue). In "dynamic" languages such as Perl, JavaScript, Python (and to some extent, Java), it is natural to map a YAML map to the native hash, a list to a vector/array, and a scalar value to a simple string. That works admirably well, as long as the YAML entity hasn't been annotated with an ID or a class name. If one wants to provide a stable-round tripping utility (e.g., suppose I want to write a YAML pretty printer), where am I to store the ID of a scalar value? The class of a map? For this use case, it seems my best course of action is to wrap the native construct (map/list/scalar) in an object which has an "id", a "class", and a "value". There are several options: A) Use the native constructs when possible, and only use "wrapper" objects when there's a need. That makes access pattern unpredictable: do I write map{key} or map{key}.value? B) Always use wrapper objects, and give up on de-serializing YAML into arbitrary native data structures. Big hit on usefulness - if we do this, Brian will just give up on us :-) C) Declare that IDs may be re-written arbitrarily, even by pretty printers. That is, banish them from the data model. That leaves "class" as the only problematic issue. We explicitly decided not to talk about it in the conference call. It seems to me like there's no way around requiring that this data will survive round-trips, but I also don't see how it is possible to de-serialize "scalar value" into a normal "Java String" if someone attached an "unknown" class to it. So, the idea is to bite the bullet and remove "class" as something specified in the "label line" (BTW, we need to define some terminology here; I'm using "label lines" and "text lines" - or maybe it should be "content lines"?). It turns out that we can still achive most of the goals of the "class" construct by making the key "#" magical in maps: center: % #: point x: 35.3 y: 42.1 The idea is that the "point" class knows it has members "x" and "y" and that their values must be floating-point numbers; so not being to declare their type is acceptable. When serializing this to a language/system which doesn't recognize "point", well, "center" will just be a run-of-the-mill map. The only magic here is that we may require '#' to always be printed first, but this depends on the protocol we'll be using for constructing "point" objects when the class is recognized: C1) If the interface is such that there has to be a factory method accepting a map of all the keys, and returning the constructed object, than this restriction is unnecessary. This is a good way to do things in Perl/JavaScript object creation interfaces; e.g. in Perl the method will typically merely bless the map and be done. I suspect in Python things would be a bit less elegant. C2) In Java this may be less efficient (all these map and String objects will have to be created and destroyed per each point object creation - slooow!). Any more efficient method will have to involve tighter interaction between parsing the values and creating the object, which means we have to know its class when starting to parse each member (e.g., we may be able to parse "42.1" directly into a float). I can't see, off the top of my head, a reasonable protocol for this right now, but we may want to require '#' to be the first key anyway, in case something does come up. My favorite is C2. It's down side is that you can't directly assign a type for a list or a scalar; you have to assign it to a surrounding map. I forsee a long discussion coming... > > 11. Clark agreed to update the spec with the > > current agreements. > > Send me a note when it's changed. I'll review for you. > > > > > 12. Brian mentioned that he'd show YAML to one of > > his Perl friends. (sorry I didn't catch his name) > > Damian Conway http://www.csse.monash.edu.au/~damian/ His input will be greatly appreciated. > > 13. Clark and Brian discussed the MIME usage. And...? > > 14. Clark introduced a very short discussion on > > the need for a global mechanism to uniquely > > identify names in a non-language specific > > manner. Reverse DNS being the easiest way we've ever come up with for this. It works directly for accessing class names in Java. In C++ you'll have to "register" classes manually anyway, so using reverse DNS doesn't gain or cost anything. In Perl/JavaScript/Python we may want to set up some automatic way to convert "org.yaml.class" into something the native type system recognizes... > > 15. Clark agreed to write up the "single vs multi" > > line controversy and post to the list so that > > it is clearly understood. I thought we settled this... Every scalar value is potentially multi-line. It doesn't seem to cost us anything, or does it? > > 16. We made little progress on the scalar indicator > > for lists, to colon or not to colon. It wasn't > > agreed, but Clark thinks this is someone else's > > monkey. If Oren and Brian can't agree within > > 7 days, Clark will put on the dictator cap. > > We traded in the '$' for the ':'. '$' as the last character in a line I thought ':' was the first one; it is "as if" it is a normal header, with the key "just happening" to be empty. This seems more consistent. > meant a multiline scalar was to follow. Converting this semantic to the > ':' leaves us with these represntations: > > key1 : @ > single line > : > classless folded > multi line > another single line > and another > #class &0001 : : #class &0001 > classed multi > line > #class &0002 classed single line > % > key : value > @ This is an empty list, right? > ~ And this is a null? > #classy % > key : value > : even this multi line on the same line > as a colon thingy works because there > a little bit of indentation imposed by > colon. (Although I don't love it) This means the following: : single line Will also work, even though you *really* dislike it. I like them :-) > : "Another thingy like above that meets" > "RFC822 wackiness" > : > | 1 > | 1 1 > | 1 1 1 > |Just for completeness :-) I think we've said everything there's to be said about this, and whether or not you find either: list: : One : Two : Three and Four Or: list: One Two : Three and Four To be beautiful or ugly is, when all is said and done, a matter of taste. To you, the extra ':'s are an eyesore; to me it seems strange that the multi-line value is "more indented"; it seems as though there's structure involved, when there isn't. I also like being able to do /^:/ in VI to get to the next entry. I think we should just let Clark make the call. OK? > > 17. It is nice to have Neil in on the talk! Welcome aboard, Neil. The more the merrier! Have fun, Oren Ben-Kiki ----- End forwarded message ----- |
From: Brian I. <briani@ActiveState.com> - 2001-05-18 17:03:33
|
"Clark C . Evans" wrote: > > | On 4 & 5. I don't really like the blank line at the beginning thing > | because people will mess it up or not understand it. And we have many > | heuristic options. > | > | A) Parse lookahead for X-YAML-Version > | B) Option-A rarely needed because as soon as we see a key that is *not* > | RFC822 compliant, we assume YAML. 99% of the time this is the first > | line! > | C) If there is no whitespace allowed before the colon in RFC822, we > | simply make it a requirement in YAML. Or does this break your RFC > | compatability rules? > > A&B are good. I don't really care about C, perhaps in the > interest of consistency with both RFC822, but also Python > code, we may not want to require the space before the colon. I see. > > | Just for my own edification, would you please explain the rationale > | behind making YAML RFC822 compliant. And do so with one of more specific > | examples. Thanks :) > > I'm not all that concerned about RFC822 compliance. > > I'm more concerned about consistency since we are going > to allow RFC822 headers. In particular, if someone > sees a few RFC822 lines above and the YAML lines below > the seperating blank line, they will most likely assume > that YAML has the same (or very similar rules). Thus, > those items _common_ in RFC822 should be allowed in YAML. > > There will be a laundry list of RFC822 constructs that > when moved into the YAML section will be illegal. I think I understand better now. Thanks again. > > | Neil and I agree that the normal transport mechanism between Perl and > | Python serializer/parsers would definitely *not* be a mailer. And if a > | mailer was used, most people wouldn't give a darn about the trailing > | whitespace. And if they really did, we could just encode the whole > | document anyway. So I now definitely think the best-fit answer is: > | > | " this is the hash\n key for this example :-) " : #class : > | |# My Perl Subroutine > | | > | | sub version { > | | if ($_[0] =~ /\n/) { > | | return \ "\to sender"; > | | } > | | } > > Nice. Is this fairly "optimal" for your purposes? > > | Sorry for overloading this example with so many weird things. > > Not at all. This is good. > > | I'll just comment on the multiline semantics: > | > | A) Trailing whitespace is preserved if the transporter preserves it. > | B) The content can always be encoded before transport anyway. > | C) Nothing is escaped. The content is truly verbatim. A '\' is a '\'. > | D) An implicit newline is assumed to be at the end of every line. > | E) Note that the '|' is one column back from the actual indentation > | level. This is intententional. And it will work even if the indent width > | is set to one character wide. (not mandatory, but I like it.) > | F) I'd like to push for this always starting on the next line if it is a > | map value. It has no relation to RFC822. > | > | This will work the way I intended it 98% of the time. > > One question. How are trailing new lines handled? You may > want to modify "D" so that there is a new line on every | > line, except the last one. Thus to get a trailing new-line, > you'd have to do: I had pretty much given up on that. Since this method isn't foolproof anyway, I'd just have the caveat that *all* lines are assumed to have a trailing newline. > > after : > | this has a > | trailing new line > | > before : > | > | This has a leading new > | line, but not a trailing > | new line. > both: > | > | This has both a leading > | and a trailing new line > | > another : > | this does not have > | a trailing new line, > | nor a leading new line. > > Clear? I think it beats :- as far as readability. Yes it does. And I understand it. But I think it's not obvious enough. Too subtle. It's exactly what I was trying to avoid with the :- thingy. People would think that there's just an extra blank line. I'd much prefer something like this: after : |this has a |trailing new line before : | |This has a leading new |line, but not a trailing \new line. both: | |This has both a leading |and a trailing new line another : |this does not have |a trailing new line, \nor a leading new line. Also, you can't put a space after the '|'. It doesn't scale well past these examples. > > | > 9. Clark agreed to make a "boostrap" C program > | > and upload to source forge. Brian and Neil > | > agreed to download and hack at will. > | > | As I walked to the train station with Neil, he figured out the C > | implementation in his head and said he would try to get it done > | before bed. > > Great. I'll focus on the specification today then > rather than laying-in-code. The spec is the most important thing IMO. > > | > 16. We made little progress on the scalar indicator > | > for lists, to colon or not to colon. It wasn't > | > agreed, but Clark thinks this is someone else's > | > monkey. If Oren and Brian can't agree within > | > 7 days, Clark will put on the dictator cap. > | > | We traded in the '$' for the ':'. '$' as the last character in a line > | meant a multiline scalar was to follow. Converting this semantic to the > | ':' leaves us with these represntations: > | > | key1 : @ > | single line > | : > | classless folded > | multi line > | another single line > | and another > | #class &0001 : > | classed multi > | line > | #class &0002 classed single line > | % > | key : value > | @ > | ~ > | #classy % > | key : value > | : even this multi line on the same line > | as a colon thingy works because there > | a little bit of indentation imposed by > | colon. (Although I don't love it) > | : "Another thingy like above that meets" > | "RFC822 wackiness" > | : > | | 1 > | | 1 1 > | | 1 1 1 > | |Just for completeness :-) > > Good deal. Your example above, you have two colons: > > " this is the hash\n key for this example :-) " : #class : > > Is the second colon a typo, or is it required per this > proposal? I'm glad you noticed! It's definitely not a typo. I wouldn't go so far as to say it's mandatory but I suggest it for the following reasons: PREMISE: Assuming that the ':' was used to replace the '$': PREMISE: A #class or &id always used to be followed by %, @, $ or value. $ was optional, but strongly suggested if a multiline started on the next line. CONCLUSION: In the *absence* of a #class or &id we'd have: key : $ multi line Translated to ':' speak, that's: key : : multi line Which I suggested should just be collapsed to a single ':'. I think that cover's it. Cheers, Brian -- perl -le 'use Inline C=>q{SV*JAxH(char*x){return newSVpvf ("Just Another %s Hacker",x);}};print JAxH+Perl' |
From: Clark C . E. <cc...@cl...> - 2001-05-18 16:29:48
|
+-------------------------------------------------------------------------+ | Welcome to YAML (tm) -- WORKING DRAFT 0.19a | +-------------------------------------------------------------------------+ | YAML (tm) is a straight-forward data serilization language, offering an | | alternative to XML where markup (named lists and mixed content) are not | | needed. YAML borrows ideas from rfc822, SAX, C, HTML, Perl, and Python. | | | | * YAML texts are brief and readable. | | * YAML uses your language's native data structures. | | * YAML has a simple stream based interface. | | * YAML has a solid information model. | | * YAML is expressive and extensible. | | * YAML is easy to implement. | | | | YAML is a collaboration between Brian Ingerson (author of Data::Denter | | ), Clark Evans, Oren Ben-Kiki, Sjoerd Visscher, and other members of | | the SML-DEV mailing list. YAML explicitly targets the object | | serilization needs of the Python and Perl communities. Implementations | | will be on their way within the next two weeks. | +-------------------------------------------------------------------------+ | News | +-------------------------------------------------------------------------+ | * 17-MAY-2001: YAML now has a mailing list at SourceForge. | | * 18-MAY-2001: YAML has had it's first meeting. The minutes have | | been sent out to the mailing list, and should be appearing in the | | archives soon. | | | +-------------------------------------------------------------------------+ | Key Concepts | +-------------------------------------------------------------------------+ | YAML is founded on several key concepts from very successful languages. | | | | * YAML uses similar type structure as Perl. In YAML, there there | | are three fundamental structures: scalars ($), maps (%), and lists | | (@). YAML also supports references to enable the serilization of | | graphs. Furthermore, each data value can be associated with a class | | name to allow the use of specific data types. | | * YAML uses block scoping similar to Python. In YAML, the extent of | | a node is indicated by its child's nesting level, i.e., what column | | it is in. Block indenting provides for easy inspection of the | | document's structure which helps to identify scope errors. | | * YAML uses similar whitespace handling as HTML. In YAML, sequences | | of spaces, tabs, and carriage return characters are folded into a | | single space during parse. This wonderful technique makes markup | | code readable by enabling indentation and word-wrapping without | | affecting the canonical form of the content. | | * YAML uses similar slash style escape sequences as C. In YAML, \n | | is used to represent a new line, \t is used to represent a tab, and | | \\ is used to represent the slash. In addition, since whitespace is | | folded, YAML uses bash style "\ " to escape additional spaces that | | are part of the content and should not be folded. Lastly, the | | trailing \ is used as a continuation marker, allowing content to be | | broken into multiple lines without introducing unwanted whitespace. | | * YAML allows for a rfc822 compatible header area for comments, | | specific processing instructions, and encoding declarations. This | | provides a flexible and forward looking method to augment the YAML | | parser with other features such as a validator similar to TREX or | | RELAX. Furthermore, this will allow a mail processing system to | | directly use YAML as its input parser. | | * YAML supports binary and formatted text entities with MIME | | multi-part attachments. Each attachment is given an reference | | identifier which can be associated with a location in hierarchical | | YAML content. This allows leaf values which would distrupt the | | in-line structural flow to be handled out of band in a seperate | | block mechanism. | | * YAML has a SAX like sequential "C" API. This C library can be | | used to easily construct native-language representations of a YAML | | serilization. The API also show cases a clever substitutability | | technique which allows schema changes to occur at the leaf nodes in | | a backwards compatible manner without breaking older code. This | | brings resiliance to older code, while allowing the structure of | | your data to grow over time. | | | +-------------------------------------------------------------------------+ | Example: Basic | +-------------------------------------------------------------------------+ | Below is an example of an invoice expressed via YAML. Each value's type | | indicated by either percent (map), or an at (list) sign, or an optional | | dollar sign (scalar). The content for each value follows the indicator | | either on the same line for scalars or on subsequent indented lines. | | The content for a map, which is also the starting production, is a list | | of key value paris. Each key and value are seperated by a colon. | | buyer : % | | address : % | | city : Royal Oak | | line one : 458 Wittigen's Way | | line two : Suite #292 | | postal : 48046 | | state : MI | | family name : Dumars | | given name : Chris | | date : 12-JAN-2001 | | comments : | | Mr. Dumars is frequently gone in the morning | | so it is best advised to try things in late | | afternoon. \nIf Joe isn't around, try his house\ | | keeper, Nancy Billsmer @ (734) 338-4338.\n | | delivery : % | | method : UZS Express Overnight | | price : 45.50 | | invoice : 00034843 | | product : @ | | % | | desc : Grade A, Leather Hide Basketball | | id : BL394D | | price : 450.00 | | quantity : 4 | | % | | desc : Super Hoop (tm) | | id : BL4438H | | price : 2,392.00 | | quantity : 1 | | tax : 0.00 | | total : 4237.50 | | | | Since "product" is a list, it only has values and thus is missing the | | key and colon. Also notice that the "comments" scalar is on multiple | | lines. Since whitespace is folded, the carriage return (\n) is escaped | | and the line ending \ is required to keep housekeeper as a single word. | | By default, the serilizer will sort map keys, although this isn't a | | requirement of the serilization structure. | +-------------------------------------------------------------------------+ | Example: References and Class Names | +-------------------------------------------------------------------------+ | Below is an example of a YAML document which demonstrates the use of | | references and classes. Immediately after an indicator a class name can | | occur and then within parenthesis an optional reference handle. If the | | indicator is a "*", then no further content is allowed, as this | | indicator signifies a reference to another value. The class name may be | | used as a specific language specific binding to a particular object or | | type appropriate class, otherwise it can be considered a comment. The | | production for allowable names and a namespace mechanism have yet to be | | worked out. | | buyer : %person | | comments : | | This is a person object accessable | | through the "buyer" key from the | | top level map. | | family name : Dumars | | given name : Chris | | inline : $(0001) | | This is a folded text entity | | that is associated with a | | reference so that it can be | | re-used later on. | | seller : %person(0002) | | comments: | | This is another person object, only | | that it is given a handle of 0001 as | | well as a class so that it can be | | refered to later. Handles must be | | numeric, and classes cannot start | | with a number. | | family name : Sellers | | given name : Peter | | zzz : | | comments: | | The first two items in this map are references | | The first is to the person object "Peter Sellers". | | The second is to the inline text object "This is..." | | The price scalar below is given a class "price". | | peter : *(0002) | | price : $currency | | 23.34 | | text : *(0001) | +-------------------------------------------------------------------------+ | Example: Block References and Attachments | +-------------------------------------------------------------------------+ | Below is an example of a YAML document which includes the optional | | rfc822 style header, specifically a rfc2046 multipart header. A YAML | | Parser must handle these headers to allow for application specific | | processing instructions, and MIME for raw/binary references. | | Date: Sun, 13 May 2001 23:48:04 -0400 | | MIME-Version: 1.0 | | Content-Type: multipart/related; | | boundary="================================" | | X-YAML-Version: 1.0 | | | | --================================ | | Content-Type: text/plain; id="0001" | | | | XX XXX XXXXX XX XX | | XXX XX X XX X XX | | XX XXXXXX XX X XX | | XX X XX XXXXXXX | | XXXX XXXXX X XX XX | | | | --================================ | | Content-Type: image/gif; id="0002" | | Content-Transfer-Encoding: base64 | | | | DlhGQAAOMAAAICBDaanAJSVAISFP7+/GbOzAJmZAIeHGbMzGbMzGbMzGbMzGbMzGbMzGbM | | CH+Dk1ZGUgd2l0aCBHSU1QACH5BAEKAAYALAAAAAAZAA8AQAR70EgZArlBWHw7Nts1gB6R | | BMlkp4lHJppkNoyW1r5SmcTeV6wUwrFI4VEulSMyRLchhYrYLq4MDKYrm9XuFQuIzLhALA | | +g44FBHybokQGdnivNfhJ8enwFSR12eB4jcWZ3gHeCJQJycXSJEzaIc5SIWz0RADs= | | | | --================================ | | Content-Type: text/x-yaml; id="0003" | | | | an inline : $(0004) | | This is a folded text entity | | that is associated with a | | reference. | | content : | | comment: | | The cyclic item is a reference | | to the top-level map. | | cyclic : *(0003) | | image : *(0002) | | inline : *(0004) | | raw : *(0001) | | title : This contains multiple references | | | +-------------------------------------------------------------------------+ | Information Model | +-------------------------------------------------------------------------+ | A map/list/scalar data structures found in modern programming languages | | such as ML, Python, Perl, and C. This model should also be very | | compatible with relational database tables. Note: This model lacks | | classes and references which are still under consideration. | | Document The the starting production for YAML is a Map. | | Map An un-ordered sequence of zero or more (Key,Value) tuples. | | Where they Key is unique within the sequence and matches the | | Key production. | | Value Exactly one of Scalar, Map, or List | | List An ordered sequence of zero or more Values. | | Scalar Any type directly serilizable through or able to be | | constructed from a sequence of zero or more characters. These | | characters must match the Char production. | | ----------------------------------------------------------------------- | | Default This is a synthesized attribute of every Value. If the Value | | is a Scalar, then the Default property refers to the Value | | itself. If this Value is a List, then the Default refers to | | the Default property of the first Value in its sequence. If | | the Value is a Map, then Default refers to the Default | | property of the Value in its Pair entry lacking a Key. By | | using Default, a Scalar Value can be substituted with a Map | | or a List Value without braking older code. | | | | Take careful note that the information model does not admit a "parent" | | property of each value. Quite the contrary, YAML may be a graph | | structure and is not necessarly a tree. | +-------------------------------------------------------------------------+ | Mapping To Popular Environments | +-------------------------------------------------------------------------+ | For Python, the internal representation has a top-level object is a | | Dictionary, and from there, depending upon each value's indicator, can | | either be a List, Dictionary or a String. It is possible for a schema | | mechansim to be included which affords for more specific decoding into | | classes and types. The default attribute is implemented through a | | stand-alone function. | | | | For Perl, the internal represenation starts with a top-level hash. And | | from there, depending upon the indicators can either be a list, hash, | | or string scalar. Of course, it is also possible for a schema mechanism | | to be included which affords for more specific decoding. The default | | attribute is implemeted through a stand-alone function. | | | | Haven't done Java or Javascript since '98, but I remember Strings, Maps | | and Lists being Objects. So there shouldn't be any problem in Java. | | Javascript is probably in the same boat but I can't veryify since that | | book has mysteriously dissapeared as well. | | | | For ML, C, and C++ all of which lack a built-in, variable type Map and | | List structure require a specific schema to build an internal | | representation. For these languages, a YamlValue type could be created | | with sub-types of Scalar, List, and Map. For C++, STL could make the | | implementation very quick, especially with iterator support. An | | alternative approach would be a class builder... but this, of course, | | requires a bit more smarts and a schema system. | | | | Mapping to a relational database will also require some sort of schema | | to indicate how to pack/unpack. However, given that a tuple (record) is | | easlily associated with a Map, and a relation (table) is easily | | associated with a List, there should not be that much difficulty. | | Mapping NULL values will be represented by a lack of a particular map | | entry. | +-------------------------------------------------------------------------+ | Serilization Format / BNF | +-------------------------------------------------------------------------+ | This section contains the BNF productions for the YAML syntax. Much to | | do... | +-------------------------------------------------------------------------+ | Parser Behavior | +-------------------------------------------------------------------------+ | This section describes how a parser should parse YAML. Much to do... | +-------------------------------------------------------------------------+ | Emitter Behavior / Canonical Form | +-------------------------------------------------------------------------+ | This section describes how an emitter should write YAML into canonical | | form. Includes specific word-wrapping algorithem. Minimal content | | length of 20 chararacters, and does it's best to word-wrap by 76 | | columns. | +-------------------------------------------------------------------------+ | Implementations | +-------------------------------------------------------------------------+ | To do... an implementation in C, C++/STL, Python, Java, and ... | +-------------------------------------------------------------------------+ | Credits | +-------------------------------------------------------------------------+ | This work is the result of long, thoughtful discussions on the SML-DEV | | mailing list. Specific contributors include... (to do) | +-------------------------------------------------------------------------+ | Some thoughts | +-------------------------------------------------------------------------+ | 1. This is very preliminary thoughts on the subject, feedback is very | | welcome. | | 2. Implementations needed... Clark is happy to write the Python,C,& | perhaps even a C++ implementation. Any takers? | | 3. Was thinking hard about using # for a comment indicator, or perhaps | | as a numeric indicator. Benfits? In any case, the BNF should leave | | all of these special characters open to future versions. | | | +-------------------------------------------------------------------------+ | FAQ | +-------------------------------------------------------------------------+ | 1. Don't the indicator characters need to be escaped in the content? | | Answer: No. | | | +-------------------------------------------------------------------------+ | Specific Productions | +-------------------------------------------------------------------------+ | Char :: #x9 | #xA | #xD | [# Any unicode character, | | = x20-#xD7FF] | [#xE000-# excluding surrogate blocks, | | xFFFD] | [#x10000-# FFFE, and FFFF. Where unicode | | x10FFFF] is defined by ISO/IEC | | 10646-2000 | | Characters :: Char* Zero or more characters. | | = | | WhiteChar :: #x20 | #x9 | #xD | #xA A space, tab, new line or | | = carriage return, escaped by \ | | s, \t, \n, and \r | | respectively. | | Whitespace :: WhiteChar+ Any sequence of spaces, tabs, | | = new lines or carriage | | returns. | | Indicator :: '$' | '%' | '@' | '*' The dollar sign indicates a | | = scalar, a percent sign | | indicates a map, an at sign | | indicates a list, and a star | | represents a reference. | | Reserved :: WhiteChar | Indicator | Printable, non-alpha, | | = [#x21-#x2F] | '/' | [# non-numeric ASCII characters | | x3B-#x40] | [#x5B-#x5E] excluding the period, colon, | | | #x60 | [#x7B-#x7F] underscore, and dash. | | Key :: (Char - Reserved)* One or more non-reserved | | = characters. | +-------------------------------------------------------------------------+ |
From: Michael L. <yam...@ya...> - 2001-05-18 16:27:41
|
Just testing to see if the mailing list works.... ===== Michael Lauzon Maximum Linux Project, Founder http://maxlinux.sourceforge.net/ _______________________________________________________ Do You Yahoo!? Get your free @yahoo.ca address at http://mail.yahoo.ca |
From: Clark C . E. <cc...@cl...> - 2001-05-18 16:01:22
|
----- Forwarded message from "Clark C . Evans" <cc...@cl...> ----- Date: Fri, 18 May 2001 10:01:25 -0500 From: "Clark C . Evans" <cc...@cl...> Subject: Re: Meeting Minutes On Fri, May 18, 2001 at 12:41:20AM -0700, Brian Ingerson wrote: | > 9. Clark agreed to make a "boostrap" C program | > and upload to source forge. Brian and Neil | > agreed to download and hack at will. | | As I walked to the train station with Neil, he figured out the C | implementation in his head and said he would try to get it done | before bed. I was wondering if I had been clear about the Iterator and Visitor interfaces, where the Emitter procedure implements the Visitor interface and the Parser implements the Iterator. This is pretty important aspect of the API as it allows for both push and pull based streams. If you have any questions about this, let's start a discussion about the implementation. Further, if Neil wants to tackle the part of the spec excluding RFC 882 and MIME, I can work on our implementation of these specifications. This might be a clean division of work, although who ever implements first will probably set the memory management policies. It'd be cool if we could chat briefly about this so we are on the same page. Also, I think I mentioned, but RFC822 limits us to UTF-8 for our unicode version. I suppose we could do UTF-16 for those files without the RFC822 header, but this should be optional. If Neil would like, I can also handle the encoding mess. Neil, thank you for offering to help! | Damian Conway http://www.csse.monash.edu.au/~damian/ Danke. Best, Clark P.S. Let us all sign up on the source-forge mailing list and start communicating this way. ----- End forwarded message ----- |