From: Oren Ben-K. <or...@ri...> - 2002-05-01 13:41:20
|
Brian Ingerson [mailto:in...@tt...] wrote: > ysh > --- > yaml> empty text: > yaml> empty map: !map > yaml> empty seq: !seq > yaml> : empty key > yaml> empty in in-line map: { : } > yaml> empty in in-line seq: [ , ] > yaml> ... > $VAR1 = { > '' => 'empty key', > 'empty in in-line seq' => [ > '' > ], > 'empty seq' => [], > 'empty text' => '', > 'empty in in-line map' => { > '' => '' > }, > 'empty map' => {} > }; > > Oren, I don't see the ambiguity. The *final* comma (when > followed by no content) is ignored. That's the way I've > always understood it. (obviously) It works like this. empty map: [] map with one entry: [ "" ] map with two entries: [ "", "" ] It is valid to specify an empty string in the simple style, so I can write: another map with two entries: [ , "" ] So, one would expect that it would be possible to use the simple empty string for the second entry: map with ??? entries: [ "", ] Only this causes the second entry to disappear. Or does it? The productions are ambiguous - and so is the intent. We could rule that it is invalid to specify a simple empty string as the last value in a list, unless it has some properties... two entries: [ , !something ] one entry: [ , ] Ugh. I don't like the looks of this much. Hence I didn't consider it as an alternative... > > I vote for the second, because it already works... and all the good > > stuff you mentioned like readability. > > Now honestly, I wouldn't mind getting rid of this property. The extra > comma (as Oren pointed out) is only interesting in multi line > contexts, which YAML doesn't have. OK. > I vote for the first, because it already works... and all the good > stuff you mentioned like simplicity. So, we have one "tend towards explicit" and one "tend towards simple". Which moves the ball to Clark's court... Clark? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-05-02 06:32:58
|
Brian Ingerson [mailto:in...@tt...] wrote: > > So, we have one "tend towards explicit" and one "tend > > towards simple". Which > > moves the ball to Clark's court... Clark? > > Whoa. I feel like you are trying to sneak something in here, Oren. Not my intent... > There are two orthogonal issues here: > > 1) Is a trailing comma allowed in inline series? > > 2) Can an empty string be specified without quotes? (In *any* > context) > > I'd like to discuss them separately. However, the problem arises in the interaction between the two. > > How about... > > > > 1. We allow trailing comma within an inline container. > > 2. We allow blank "values" without quotes > > 4. The last 'blank' within an inline sequence is > > stripped, aka [,,,,] == ['','','',''] (n-1 commas). My problem is with (4). I don't like it... I think it is a confusing special case. > > Since only one item in a key can be a blank anyway, > > the quotes can help provide symmetry. Yea? As for (3): > > 3. We disallow blank "keys" without quotes > > This is how I have things coded now, except for number 3. I > see number 3 as a pretty arbitrary restriction. I wouldn't > emit a key that way, but parsing it is no problem. I agree. The less special cases, the better. Either we allow simple empty strings ("ses") everywhere, or nowhere... We have three options: Current (the above minus #3 is the current spec), Simple (no trailing comma) and Explicit (no ses). I prefer Simple/Explicit to Current, with a slight tendency towards Explicit. It seems both of you prefer Current, with Simple being second and Explicit being third. Can we settle on "Simple", then? Have fun, Oren Ben-Kiki |
From: Neil W. <neilw@ActiveState.com> - 2002-05-02 09:08:16
|
Oren Ben-Kiki [02/05/02 02:34 -0400]: > > > 1. We allow trailing comma within an inline container. > > > 2. We allow blank "values" without quotes > > > 4. The last 'blank' within an inline sequence is > > > stripped, aka [,,,,] == ['','','',''] (n-1 commas). > > My problem is with (4). I don't like it... I think it is a confusing special > case. +1 > We have three options: Current (the above minus #3 is the current spec), > Simple (no trailing comma) and Explicit (no ses). I prefer Simple/Explicit > to Current, with a slight tendency towards Explicit. It seems both of you > prefer Current, with Simple being second and Explicit being third. Can we > settle on "Simple", then? Just to reiterate, I prefer Explicit. That's my vote. This is ugly AND confusing: --- #YAML:1.0 --- #YAML:1.0 : --- #YAML:1.0 : [,,,] This is not: --- #YAML:1.0 '' --- #YAML:1.0 '': '' --- #YAML:1.0 '': ['',''] Later, Neil |
From: Brian I. <in...@tt...> - 2002-05-02 13:23:02
|
On 02/05/02 02:08 -0700, Neil Watkiss wrote: > Just to reiterate, I prefer Explicit. That's my vote. > > This is ugly AND confusing: > > --- #YAML:1.0 > --- #YAML:1.0 > : > --- #YAML:1.0 > : [,,,] > > This is not: > > --- #YAML:1.0 '' > --- #YAML:1.0 > '': '' > --- #YAML:1.0 > '': ['',''] Neil has a pretty compelling argument here. Can anyone think of a situation where this would be a burden? I was thinking: --- - !map '' - !seq '' ... We'd need to be explicit here. But on the otherhand, I think we can safely drop these transfers altogether for empty map and seq. We should use the following in all cases: --- - {} - [] ... Then I can have a much nicer object syntax for Perl: --- - !perl/Foo::Bar [] # instead of - !perl/Foo::Bar:array ... But back to the matter at hand. Oren, how would EXPLICIT affect the productions. Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-02 17:34:46
|
| > '': ['',''] | | Neil has a pretty compelling argument here. I agree, only that I still would prefer not to have to use '' for the most common use case... name: first : last : middle: I like the above "blank" form. If we required quotes then it may make a user think that quotes are needed for each value, which isn't quite true. But alas, I'm not sure if this is a big enough use case, and it only applies to non-inline maps. | --- | - !perl/Foo::Bar [] | # instead of | - !perl/Foo::Bar:array | ... Ouch. According to the "tree model", this would be passed on as a "branch" (not as a map or seq); thus making "!perl/Foo::Bar {}" indistinguishable from "!perl/Foo::Bar []". However, Neil's API makes this distinction and it is "tree model" since comments and styles are not reported, so perhaps the tree model should be fixed to include this distinction. I'm not sure I like this... In any case, either Neil's API needs to be changed, or we need to fix the information model. The problem with letting the syntax level distinction of "keyed" vs "series" into the tree model is that it now enables the application to "change" the type family from the tree model to the graph model; in other words, Brian's loader would be using information not in the family to determine which object is created, in other words, "!perl/Foo::Bar" is not enough information, the additional knowledge of "keyed" vs "series" is also required. I guess we could go all the way, and move the "keyed" vs "series" distinction all the way into the graph model... I've fought against this since I believe that a transformation system should be happily ignorant of the differences since a series can be treated as keyed, indexed by positive integer. However, perhaps I'm being niave here. Thoughts? Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-05-02 23:45:06
|
| --- | - !perl/Foo::Bar [] | # instead of | - !perl/Foo::Bar:array Ok. The root of the problem with this example is that the family "perl/Foo::Bar" does not uniquely tell me what kind of native object is to be used (an array). To make this determination, another peice of information is required, the kind of node (keyed or series). This is a divergence from my current understanding as to how YAML works (aka it puts the distinction in the graph model). Here is an proposal... - Update the graph model to reflect the distinction between keyed/series collections. - Update the description of "family" so that each family implies exactly one kind of node (scalar,keyed,series) and add the restriction so that a given family cannot be used on a node of the wrong kind. Thus, spell out that in normal circumstances (excepting implementation constraints such as out-of-bounds), the family, and _only_ the family may be used to determine what sort of in-memory representation is used for a given node. This is the most restrictive solution; it then allows people to distinguish between keyed/series in the graph model; but then forbids a given family from use as both keyed and series. This partitioning is already implied such that a format used for a scalar is not used for keyed or series. For example "!int" is already limited to only the scalar, and "!map" may not be used on a scalar. As such, it would make the following illegal since !seq is a "series" family and not a "keyed" family. Thus, sparse arrays would have to be a different family (a good idea anyway). --- !seq 0: one 1: two This proposal would also make Brian's short-cut above unworkable since "perl/Foo::Bar" would have to be used only as a keyed, series, xor scalar. So, if it were series (array) then the same family could not be used to mark a keyed (hashtable). In effect, the [] becomes syntax sugar and would not affect the outcome. This is the most restrictive stance; and since there seems to be confusion about our previous ruleset, perhaps this will be the best solution for maximal interoperability? Best, Clark |
From: Neil W. <neilw@ActiveState.com> - 2002-05-03 01:31:00
|
Brian Ingerson [02/05/02 06:22 -0700]: > Then I can have a much nicer object syntax for Perl: > > --- > - !perl/Foo::Bar [] > # instead of > - !perl/Foo::Bar:array I see a flaw with your syntax -- and it's much simpler than the other objections floating around (sorry, Clark :) Consider a blessed sparse array versus a blessed hash: --- #YAML:1.0 !perl/Foo::Bar foo: bar baz: com --- #YAML:1.0 !perl/Sparse::Array 0: one 2: three I _think_ you'll have to keep the ":array" and ":hash" in the transfer to tell those apart. Later, Neil |
From: Brian I. <in...@tt...> - 2002-05-03 04:19:55
|
On 02/05/02 18:30 -0700, Neil Watkiss wrote: > Brian Ingerson [02/05/02 06:22 -0700]: > > Then I can have a much nicer object syntax for Perl: > > > > --- > > - !perl/Foo::Bar [] > > # instead of > > - !perl/Foo::Bar:array > > I see a flaw with your syntax -- and it's much simpler than the other > objections floating around (sorry, Clark :) > > Consider a blessed sparse array versus a blessed hash: > > --- #YAML:1.0 !perl/Foo::Bar > foo: bar > baz: com > --- #YAML:1.0 !perl/Sparse::Array > 0: one > 2: three > > I _think_ you'll have to keep the ":array" and ":hash" in the transfer to > tell those apart. I don't think either of you (Clark and Neil) understand my intent. I'll try to restate with examples: The _Parser_ produces for each node: - A structure (map, seq, or scalar) - A transfer URI - A format (but let's ignore that for this argument) It is up to the _Loader_ to insert (a transformation of) the stucture into the graph in the best way it sees fit. The application using the Loader might even have a say in how the structure gets loaded. Let's consider Neil's examples: --- #YAML:1.0 !perl/Foo::Bar foo: bar baz: com This would be loaded by YAML.pm in one of two ways: 1) If the user (or Foo::Bar module author) has registered a transfer callback, then the Loader will call it passing the map and the transfer URI. The user routine will transform it into *anything* it wants, including: - a hash - an array - a scalar - an opaque object created through a constructor - a C struct - a tribal cave painting on a wall 2) Otherwise I'll punt to something like: - bless {foo=>'bar',bar=>'baz'}, 'Foo::Bar'; or possibly (not probable in this case): - bless {foo=>'bar',bar=>'baz'}, '!http://perl.yaml.org/Foo::Bar'; The latter will be useful for loading non-Perl transfers in Perl. --- #YAML:1.0 !perl/Sparse::Array 0: one 2: three 1) If the owner of Sparse::Array cares, she can register a transfer to ensure this becomes a sparse array. 2) Otherwise YAML.pm will load this as: - bless {0=>'one',2=>'three'}, 'Sparse::Array'; Or I can get heuristical on your ass and just autodetect it: - bless ['one',,'three'], 'Sparse::Array'; Back to Clark's concerns: --- #YAML:1.0 !seq 0: one 2: three The _Parser_ has no idea that this is a sparse array. It's just a map with an explicit transfer. The YAML Loader takes it from there. It is a _convention_ that Loaders try to make this into an array. But a user should be able to override this and do whatever tickles her mittens. In general we need to relax on the graph model a bit. C doesn't even have a graph model for goodness sake. Different languages with have slightly different internal graph structures. Python doesn't have a bless. That doesn't mean I won't exploit bless to the max in Perl. Python will have to figure out its own way to roundtrip my Perl typeglobs! Let's just say that Loaders and Dumpers should be able to round trip internal data structures as well as possible, no matter which language they originated in, or are destined for. All things should be preserved within a language, and most common things should work between languages. For instance, all normal hashes should work as dictionaries in Python. If Perl passes Python a blessed hash of class 'Foo::Bar', the Python code may or may not be able to use it, but it *must* be able to return everything intact back to Perl. This is not anything new to me. It's how I've always expected YAML to work. Does this help? Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-03 04:49:07
|
On Thu, May 02, 2002 at 09:19:43PM -0700, Brian Ingerson wrote: | In general we need to relax on the graph model a bit. C doesn't even | have a graph model for goodness sake. Different languages with have | slightly different internal graph structures. Ok. Let's look at it this way, the graph model provides a general guideline as to what is expected to reasonably roundtrip through a foreign system. Framed in this way, the question is: Should the distinction between keyed and series kinds round-trip? For a tangable example, in your average system, must the following structure remain a "keyed" structure given a round-trip? --- !seq 0: one 1: two Or can a complant system change the "keyed" structure to a "series"? --- !seq - one - two | If Perl passes Python a blessed hash of class 'Foo::Bar', the Python | code may or may not be able to use it, but it *must* be able to return | everything intact back to Perl. Ok. Assume that your perl module wrote the following... --- !perl/Foo::Bar 0: one 1: two Would it be acceptable if a python program "round-tripped" this structure and wrote it as... --- !perl/Foo::Bar - one - two Sorry if this seems irritating, but I'm a consistency freak when it comes to the underlying model (I'm very flexible on the actual syntax, however). Best, Clark |
From: Brian I. <in...@tt...> - 2002-05-03 05:31:08
|
On 03/05/02 00:54 -0400, Clark C . Evans wrote: > On Thu, May 02, 2002 at 09:19:43PM -0700, Brian Ingerson wrote: > | In general we need to relax on the graph model a bit. C doesn't even > | have a graph model for goodness sake. Different languages with have > | slightly different internal graph structures. > > Ok. Let's look at it this way, the graph model provides a general > guideline as to what is expected to reasonably roundtrip through > a foreign system. Framed in this way, the question is: > > Should the distinction between keyed and series kinds round-trip? I think that a self-respecting Loader should be able to Load a (darn near) identical graph to any graph it Dumps. (I say darn near, because some internal details of a scripting language structure are to low level for YAML to be concerned with). *But*, if down the road, someone else write a different Perl loader than YAML.pm, it may dump some structures differently. It may use a different URI scheme for Perl structures, for instance. This might break compatibility between our processes on this particular realm, but we would still roundtrip each others data. I really don't see more than one Dumper/Loader per language happening anyway. Now back to the first point. What makes Dumping/Loading harder is that you need to defer some operations to the user application. Say I have an opaque (C-struct) Perl object whose values are only available through accessors and whose creation depends on a constructor. Say the object class is 'Xyz'. It is up to Xyz to register both a Dumping and Loading transfer callback routine to YAML.pm. The dumping transfer choses which attributes to serialize and puts them into a hash; it returns the hash along with a transfer URI of it's choosing to YAML.pm. YAML.pm then emits that info. ... On load the map and the URI get passed back to Xyz's Loading-transfer routine. The routine calls the constructor with the proper info, sets any other attributes it needs to and returns the resulting object back to YAML.pm. The object is then added to the graph. We have no control over what Xyz does internally. The loader just passes on the info. > For a tangable example, in your average system, must the following > structure remain a "keyed" structure given a round-trip? > > --- !seq > 0: one > 1: two > > Or can a complant system change the "keyed" structure to a "series"? > > --- !seq > - one > - two I would say "Yes it can". See, Clark. This whole thing stems from you being a streaming XML geek, and me being a Dump/Load serialization freak. I assume that 90% of the time, the same program that Dumps the YAML will Load it. I'm never going to Dump something in a format that would cause me to load it into a different graph. So there really is no problem. Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-03 06:18:58
|
| > Should the distinction between keyed and series kinds round-trip? | | I think that a self-respecting Loader should be able to Load a (darn near) | identical graph to any graph it Dumps. (I say darn near, because some | internal details of a scripting language structure are to low level for YAML | to be concerned with). Ahh. There are two types of round-trip: NYN round trip (native language -> YAML serialization -> native langauge), and YNY round trip (YAML serialization -> native language -> YAML serialization). I'm asking, should the keyed vs series distinction in a YAML serialization round trip YNY since this is the type of round tripping that allows for a YAML data structure to move through different native languages. | Now back to the first point. What makes Dumping/Loading harder is that | you need to defer some operations to the user application.. Yep, we are on the same page here. | We have no control over what Xyz does internally. The loader just | passes on the info. Right. But I'm concerned about YNY round-tripping, that is my Python structures can make it through a load/save cycle with your Perl tool. This is what leads me to discuss the graph model; as this is the guideline for YNY round-tripping. Of course, you are right, how each native environment handles the URI is subject to the constraints of that environment. However, this doesn't mean that further YNY constraints, which help encourage interoperability between langauges, shouldn't hold. | > Or can a complant system change the "keyed" structure to a "series"? | I would say "Yes it can". Ok. With this take, this implies that the "keyed" / "series" distinction is not signficant and can be dropped in a YNY round-trip. However, you skipped my second example, which was the opposite, where I was betting you'd answer: "No it can't." This example implies that the "keyed" / "series" distinction is significant and should not be changed during a YNY round trip. I'm having "kittens" due to this discrepency. Either the distinction is important and should always be preserved (given a simple YNY round trip) or it should not be important and its preservation should never be assumed. I don't want the this to be an ad-hoc convention that does it one way for some URIs and another way for other URIs. This would make writing generic tools very difficult. | See, Clark. This whole thing stems from you being a streaming XML geek, | and me being a Dump/Load serialization freak. I assume that 90% of the | time, the same program that Dumps the YAML will Load it. I'm never going | to Dump something in a format that would cause me to load it into a | different graph. So there really is no problem. Yes, but our partnership here has been successful since we've worked hard to ballance our concerns. Right now I have a concern based on your usage. In some cases you are not preserving the "keyed"/"series" distinction and in other cases you imply that preservation of this distinction is important. We should do it one way or the other. Three options: 1. We have it such that "keyed"/"series" distinction should always be preserved in a YNY round-trip. In this case, our trusty !seq {0:one, 1:two} example must be set aside and we should instead introduce !sparse {0:one, 1:two} for the use case of a sparse array. 2. We have it such that "keyed"/"series" distinction need not be preserved in a YNY round-trip. In this case, your example of !perl/Foo::Bar {} is ambiguous and you must make your type family more explicit. 3. We be very restrictive. In this case, we state not only that "keyed"/"series" distinction must be preserved in a YNY round-trip, but also that a given type family can only be used for keyed, series, or scalar kinds and not for any other kind. For this case, !perl/Foo::Bar would need a more explicit family to distinguish between "hashtable" and "array". Further, in this case, we would also need a !sparse type. I like option #3 the best. It is the least flexible, but I think that this will maximize the chances for round-tripping. I guess #1 is better than #2, I never really liked a !seq making a sparse array anyway. Wonkers, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Neil W. <neilw@ActiveState.com> - 2002-05-03 08:31:41
|
Hi guys, You know what? I need more examples. Brian's last-but-one email convinced me he was right again. We have Python and Perl YAML bindings. Would it be possible to come up with an example showing how this affects YAML interoperability? I haven't seen an example of exactly _why_ the "YNY" matters all that much. Let's label some YAML data "Y", and say that Y[i] represents Y at some point through the following system: 0 1 2 3 -------- ---------- -------- input -> | Perl | -> | Python | -> | Perl | -> output -------- ---------- -------- At the beginning, Y[0], is the original YAML stream. In my mind, the following statements MUST be true if all components in this system are YAML compliant: *1) Y[i] !=? Y[j] for i & j such that i != j **2) When loaded by a particular loader, Y[i] and Y[j] are "equal" for all i and j * : !=? means "not necessarily equal to". **: By equal, I mean that the data structures must be identical. In other words, by being loaded and dumped, I can't have an integer turn into a string or a floating point; arrays must stay arrays; "sparse arrays" (whatever those may be) must remain sparse arrays. The obvious thing to notice here is that YAML does not _require_ the Python implementation to load YAML in the same way as the Perl implementation, as long as the result of loading Y[i] is the same, no matter what stage it was grabbed from. Examples: Y[0]: --- !seq 0: 1 2: 3 Y[1]: --- [1,~,3] Y[2]: --- - 1 - ~ - 3 Y[3]: --- #YAML:1.0 !seq - !int|dec "1" - !null|tilde "~" - !int|hex "0x03" See whatI mean? These all represent the exact same structure in memory (for the same loader, that is -- PyYAML differs from Perl of course). But each stream is totally different at the Y level. Later, Neil Clark C . Evans [03/05/02 02:24 -0400]: > Three options: > > 1. We have it such that "keyed"/"series" distinction should > always be preserved in a YNY round-trip. In this case, > our trusty !seq {0:one, 1:two} example must be set aside > and we should instead introduce !sparse {0:one, 1:two} for > the use case of a sparse array. > > 2. We have it such that "keyed"/"series" distinction need not > be preserved in a YNY round-trip. In this case, your example > of !perl/Foo::Bar {} is ambiguous and you must make your > type family more explicit. > > 3. We be very restrictive. In this case, we state not only > that "keyed"/"series" distinction must be preserved in a YNY > round-trip, but also that a given type family can only be used > for keyed, series, or scalar kinds and not for any other kind. > For this case, !perl/Foo::Bar would need a more explicit > family to distinguish between "hashtable" and "array". Further, > in this case, we would also need a !sparse type. > > I like option #3 the best. It is the least flexible, but I think > that this will maximize the chances for round-tripping. I guess > #1 is better than #2, I never really liked a !seq making a sparse > array anyway. |
From: Clark C . E. <cc...@cl...> - 2002-05-03 15:55:25
|
| -------- ---------- -------- | input -> | Perl | -> | Python | -> | Perl | -> output | -------- ---------- -------- --- !perl/Foo::Bar [] If the Python loader follows the current "graph" model (which does not distinguish between keyed and series), it is only required to preserve "perl/Foo::Bar", and since this node is 'blank' it may write out the node as follows: --- !perl/Foo::Bar '' This would then violate your constraint #2 "When loaded by a particular loader, Y[i] and Y[j] are "equal" for all i and j." I'm not saying that the current "graph" model makes sense... but it is the basis for the next example: | Y[0]: | --- !seq | 0: 1 | 2: 3 | | Y[2]: | --- | - 1 | - ~ | - 3 Ok. This equivalence here is "unobvious"; it depends upon a specific understanding and convention of how this !seq transfer method works and it also depends upon the current graph model, which allows for a keyed node to be equivalent to a series node. Thus, what you have above is more or less the exception and not the rule. As such, I'm not sure if this exception "convention" is a good idea since it deviates so far from the other examples. | --- #YAML:1.0 !seq | - !int|dec "1" | - !null|tilde "~" | - !int|hex "0x03" By contrast, the equivalence above with Y[2] is clearly spelled out in the specification... and it not problematic. | See whatI mean? These all represent the exact same structure in memory (for | the same loader, that is -- PyYAML differs from Perl of course). But each | stream is totally different at the Y level. What I'm worried about is the "significance" of keyed/series distinction in one context (!perl/Foo::Bar []) and its "insignficance" in other contextsi, (!seq) -- this inconsistency is problematic. I'll spell out the options once more: 1. Make the distinction between keyed/series part of the formal graph model; in this case, we need another type, say !sparse, for sparse arrays and the first two above become non-equivalent. That is "--- !seq []" is _never_ equivalent to "--- !seq {}" as is currently the convention. 2. Clearly say that the distinction between keyed/series need not be preserved; in this option, Brian's latest syntax sugar, "!perl/Foo::Bar []" is unworkable. 3. The strict union of the above options, where keyed/series is part of the graph model, but where family is only applicable to one particular kind and not t0 other kinds. In this way we neither assume that !xxx [] is the same as !xxx {} (as in the !seq case) nor do we assume that !xxx [] is different from !xxx {} (as in the !perl/Foo::Bar case). I prefer #3, and #1 (getting rid of all of the !seq conventions) is preferrable to #2. This is a very tough thing to grok since both ways of using the keyed/series distinction are resonable given their context. The problem is that we really don't want both ways or it will cause all kinds of problems for us down the road. So, the compromise position is #1: - I fix the information model to include keyed/series in the distinction - We forget about (and remove from the spec) all of the cuteness regarding the !seq transfer method for sparse arrays. - We introduce !sparse for sparse arrays. Ok? Best, Clark |
From: Brian I. <in...@tt...> - 2002-05-03 16:15:28
|
On 03/05/02 12:00 -0400, Clark C . Evans wrote: > | Y[0]: > | --- !seq > | 0: 1 > | 2: 3 > | > | Y[2]: > | --- > | - 1 > | - ~ > | - 3 > > Ok. This equivalence here is "unobvious"; it depends upon a specific > understanding and convention of how this !seq transfer method works > and it also depends upon the current graph model, which allows for a > keyed node to be equivalent to a series node. Thus, what you have > above is more or less the exception and not the rule. As such, I'm > not sure if this exception "convention" is a good idea since it > deviates so far from the other examples. THERE IS NO GRAPH MODEL! dude ;) At what level is the equivalence of the above documents (at the serialization level) important. I think we all agree that good YAML loaders would Load them the same. > I'll spell out the options once more: > > 1. Make the distinction between keyed/series part of the formal > graph model; in this case, we need another type, say !sparse, > for sparse arrays and the first two above become non-equivalent. > That is "--- !seq []" is _never_ equivalent to "--- !seq {}" as is > currently the convention. > > 2. Clearly say that the distinction between keyed/series need not be > preserved; in this option, Brian's latest syntax sugar, > "!perl/Foo::Bar []" is unworkable. > > 3. The strict union of the above options, where keyed/series is part of > the graph model, but where family is only applicable to one particular > kind and not t0 other kinds. In this way we neither assume that > !xxx [] is the same as !xxx {} (as in the !seq case) nor do we assume > that !xxx [] is different from !xxx {} (as in the !perl/Foo::Bar case). > > I prefer #3, and #1 (getting rid of all of the !seq conventions) is > preferrable to #2. This is a very tough thing to grok since both > ways of using the keyed/series distinction are resonable given their > context. The problem is that we really don't want both ways or it > will cause all kinds of problems for us down the road. What problems? That is *specifically* what I want to know. Pretend I'm an idiot. Spell it out for me... > > So, the compromise position is #1: > > - I fix the information model to include keyed/series in the distinction > - We forget about (and remove from the spec) all of the cuteness > regarding the !seq transfer method for sparse arrays. > - We introduce !sparse for sparse arrays. I can live with #1. But I still don't get it, (or maybe you don't)... Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-03 16:55:07
|
| What problems? That is *specifically* what I want to know. Pretend I'm an | idiot. Spell it out for me... Ok. Define "same" if they load into identical native structures. - Are these the same? - !perl/Foo::Bar [] - !perl/Foo::Bar {} - Are these the same? - !seq [] - !seq {} Right now, according to the graph model in the current specification, (which is a guideline for native bindings) the answer should be "yes,yes" since the graph model currently does not distinguish between keyed and series nodes. I think Brian would answer "no,yes". And he would assert that each loader is free to make this decision on its own, regardless of what other loaders may choose to do. This "no,yes" answer is where I'm having kittens. If each loader decides if the difference between keyed and series nodes affects the binding (and thus what can be round-tripped YNY) we have inconsistency. Yes, as Brian points out, it would be a small inconsistency. But I loath inconsistency, regardless of how small it appears to be. Thus, I suggest we be a bit more concervative, and dispatch with the !seq cuteness (for sparse arrays) and answer "no,no". To me, the consistency is more important, thus here is the proposal: | > - I fix the graph model to include keyed/series | > in the distinctio | > - We forget about (and remove from the spec) all of the cuteness | > regarding the !seq transfer method and keyed (for sparse arrays). | > - We introduce !sparse for sparse arrays if necessary. |
From: Brian I. <in...@tt...> - 2002-05-03 18:01:55
|
On 03/05/02 13:00 -0400, Clark C . Evans wrote: > | What problems? That is *specifically* what I want to know. Pretend I'm an > | idiot. Spell it out for me... > > Ok. Define "same" if they load into identical native structures. > > - Are these the same? > - !perl/Foo::Bar [] > - !perl/Foo::Bar {} > > - Are these the same? > - !seq [] > - !seq {} > > Right now, according to the graph model in the current specification, > (which is a guideline for native bindings) the answer should be "yes,yes" > since the graph model currently does not distinguish between keyed and > series nodes. > > I think Brian would answer "no,yes". And he would assert that each > loader is free to make this decision on its own, regardless of what > other loaders may choose to do. Right. > This "no,yes" answer is where I'm having kittens. If each loader decides > if the difference between keyed and series nodes affects the binding (and > thus what can be round-tripped YNY) we have inconsistency. Yes, as Brian > points out, it would be a small inconsistency. But I loath inconsistency, > regardless of how small it appears to be. Very small, yes. > Thus, I suggest we be a bit more concervative, and dispatch with the > !seq cuteness (for sparse arrays) and answer "no,no". To me, the > consistency is more important, thus here is the proposal: > > | > - I fix the graph model to include keyed/series > | > in the distinctio > | > - We forget about (and remove from the spec) all of the cuteness > | > regarding the !seq transfer method and keyed (for sparse arrays). > | > - We introduce !sparse for sparse arrays if necessary. I'm perfectly agreeable to this. I always thought the "!seq {}" was a stupid pet trick anyway. Make it so. Cheers, Brian PS If you ever think of a real world example, I'd be intrigued. |
From: Brian I. <in...@tt...> - 2002-05-03 18:32:58
|
On 03/05/02 11:01 -0700, Brian Ingerson wrote: > > Thus, I suggest we be a bit more concervative, and dispatch with the > > !seq cuteness (for sparse arrays) and answer "no,no". To me, the > > consistency is more important, thus here is the proposal: > > > > | > - I fix the graph model to include keyed/series > > | > in the distinctio > > | > - We forget about (and remove from the spec) all of the cuteness > > | > regarding the !seq transfer method and keyed (for sparse arrays). > > | > - We introduce !sparse for sparse arrays if necessary. > > I'm perfectly agreeable to this. I always thought the "!seq {}" was a stupid > pet trick anyway. Make it so. Interesting thought: Why do we need/want !map and !seq at all? I say we don't. We've taken away all 3 use cases for them: - !map '' # empty map - !seq '' # empty sequence - !seq {} # sparse array They're not transfers anymore. Let's NUKE em! --- Speaking of simplification: - Can we start calling '[]' a 'sequence' instead of a 'series branch'. - Can we start calling '{}' a 'map' instead of a 'keyed branch'. I'm talking about the syntax model. Remember the good old days when YAML was just a syntax model, and we didn't have to give every concept 3 names? If you really want to use the 'branch' terms, don't you think they belong in the 'tree' model. Get it? I soon will be presenting YAML to a group of 100 people for 3 hours. If I have to make them understand that a hash is sometimes a 'map' and sometimes a 'keyed branch', I'll be left with 3 people after 100 seconds. yaml->perl->python map->hash->dict seq->array->array scalar->scalar->scalar This is for the everyday users. (Like me) Leave the nitpicky terms for implementors (like me) and the streaming app folks (like nobody). Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-03 18:49:32
|
On Fri, May 03, 2002 at 11:32:45AM -0700, Brian Ingerson wrote: | Interesting thought: Why do we need/want !map and !seq at all? I say we | don't. We've taken away all 3 use cases for them: | | - !map '' # empty map | - !seq '' # empty sequence | - !seq {} # sparse array Right. {} is empty map, [] is empty sequence, and !sparse {} will be the sparse array. | They're not transfers anymore. Let's NUKE em! Well, we have two options here: (1) We allow transfer to be "blank", in this case, !seq, !map, and !string can go away. (2) We leave them as transfers, but specify that they are the default transfers and cannot be explicitly provided. I guess #1 is cleaner, but does require some work... | - Can we start calling '[]' a 'sequence' instead of a 'series branch'. | - Can we start calling '{}' a 'map' instead of a 'keyed branch'. Yes, I guess if we got rid of those transfers we could fix up this naming since there would no longer be ambiguity. | I'm talking about the syntax model. Remember the good old days when YAML was | just a syntax model, and we didn't have to give every concept 3 names? Ok. | If you really want to use the 'branch' terms, don't you think they belong in | the 'tree' model. Get it? Perhaps this naming can percolate up through all three models; a bit of thought is require here. I do agree that having good names is very important. | yaml->perl->python | map->hash->dict | seq->array->array | scalar->scalar->scalar I think I'm agreeing with you; I think we could try to make the change to the spec and then come back with any problems. It'd be hard to know what all the concerns are before changes would be made. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-05-03 18:33:00
|
| > | > - I fix the graph model to include keyed/series | > | > in the distinctio | > | > - We forget about (and remove from the spec) all of the cuteness | > | > regarding the !seq transfer method and keyed (for sparse arrays). | > | > - We introduce !sparse for sparse arrays if necessary. | | I'm perfectly agreeable to this. I always thought the "!seq {}" was a stupid | pet trick anyway. Make it so. Cool. | PS If you ever think of a real world example, I'd be intrigued. Ok. Let's be imaginative. Suppose that I'm an employee of Big Motor Co. and I'm writing an application to handle defects at particular location in a steel casting for an engine. A casting may have over 100 identifyable locations, often numbered 0,...,100. To save space, I use a decide to use a sparse array; and being a good programmer and following the YAML specification, I use the {} trick. castings: - model: 427 Hemi serial: 29302-23945A defects: !seq 0: Malformed hole - model: 427 Hemi serial: 29302-23998A defects: !seq 23: Excess shaff 45: Raised ridge Ok. Suppose that the following gets saved from my python program, and then loaded into a Perl program written by another programmer, which accumulates some summary information (say it adds a defect-count). Then this is saved, and re-loaded back into my Python program. At this point, the first defetct above will be listed... defects: - Malformed hole And will be loaded into python as a regular array. Now suppose that my program detects another "defect" to be added to the list, this time in position 63. So, my code does a "set" on what it believes is a sparse array... but it isn't since my sparse array got replaced with a regular arrray. Anyway, the assignment fails, throws an exception and stops the defect reporting process. This causes the feeding program (attached to a sensor) to have an "overflow" error in the "incoming defects" buffer, which causes the entire line to shut down (about $2,000 per minute at Cleveland Casting). By the time the programmer is called in, and pulls his hair out since his sample data set happened never to have a series of contiguous defects starting at index 0. This takes him about 2 hours to diagnose, fix, and bring the production back on-line, total bill: $240K I'm probably not fired... yet. It may seem contrived, but not really. Subtle bugs like this, infrequent as they may be, are what nightmares (frequent as they may be) in Plant Floor Operations are made of. ;) Clark |
From: Brian I. <in...@tt...> - 2002-05-03 18:46:22
|
On 03/05/02 14:38 -0400, Clark C . Evans wrote: > | > | > - I fix the graph model to include keyed/series > | > | > in the distinctio > | > | > - We forget about (and remove from the spec) all of the cuteness > | > | > regarding the !seq transfer method and keyed (for sparse arrays). > | > | > - We introduce !sparse for sparse arrays if necessary. > | > | I'm perfectly agreeable to this. I always thought the "!seq {}" was a stupid > | pet trick anyway. Make it so. > > Cool. > > | PS If you ever think of a real world example, I'd be intrigued. > > Ok. Let's be imaginative. Suppose that I'm an employee of > Big Motor Co. and I'm writing an application to handle defects > at particular location in a steel casting for an engine. A > casting may have over 100 identifyable locations, often > numbered 0,...,100. To save space, I use a decide to use > a sparse array; and being a good programmer and following the > YAML specification, I use the {} trick. I disagree with this case. If you use the !seq trick, both Loaders will do the same thing. It's only when you create objects of type !http://bm.com/cast-defect that you might have an issue. My reply is that the owner of that class must make sure that both the Perl and Python implementations agree. Either both handle sparsity or both don't. Both can handle both (sparse and normal). See. It's the domain's issue, not ours. We just have to do the !seq/!sparse trick correctly. Next. Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-05-03 19:01:19
|
| My reply is that the owner of that class must make sure that | both the Perl and Python implementations agree. Either both handle | sparsity or both don't. Both can handle both (sparse and normal). In the example, both _did_ handle sparcity, they just did it in different ways and this is the essence of the problem that I'm attempting to address. The python binding in the example treated the existance of a keyed structure as significant and used it to signfy a sparse structure. The perl binding in the example didn't treat the existance of a keyed structure as significant. This is the "yes,no" inconsistency that I was illustrating and how it could cause problems. | See. It's the domain's issue, not ours. We just have to do the | !seq/!sparse trick correctly. Let's talk about !sparse then. Would you allow the series form? --- !sparse - one If so, then would you be able to distinguish the in-memory result of the above structure with the following structure? --- !sparse 0: one If not, then this !sparse would violate the proposed model since the in-memory structure would not reflect that the first is a "series" and the second is "keyed". Thus, the only good solution it to fix !sparse so that it only uses the "keyed" form. Note, that your new "perl/Foo::Bar" example uses the distinction, as both of the below would have *different* in memory structures and thus, when you emit, the printer would be able to output the correct form, "keyed" or "series". --- !perl/Foo::Bar [] --- !perl/Foo::Bar {} Best, Clark |
From: Neil W. <neilw@ActiveState.com> - 2002-05-03 20:25:02
|
Err... I wrote this a little bit ago, and since your "edict" phone minutes with Brian, we seem to be agreeing. I think. But I'll click "send" anyway. Clark C . Evans [03/05/02 15:06 -0400]: > Note, that your new "perl/Foo::Bar" example uses the distinction, > as both of the below would have *different* in memory structures > and thus, when you emit, the printer would be able to output the > correct form, "keyed" or "series". > > --- !perl/Foo::Bar [] > --- !perl/Foo::Bar {} Not necessarily! These might represent the *same* in-memory structure. As a module author, I want to write a YAML support routine which allows me to be very flexible in what I accept (i.e. provide different ways for people to write things in configuration files). For example, I might write a Date::Appointment module, which stores an appointment. The following YAML streams all get "parsed" by libyaml, "loaded" by YAML.pm, then "diddled" by Date::Appointment::LoadHelper(), which is called by YAML.pm automatically. --- #YAML:1.0 !perl/Date::Appointment 1:Jan:2002|Once|Call+Mom --- #YAML:1.0 !perl/Date::Appointment date: 2002-01-01 recur: Once desc: "Call Mom" --- #YAML:1.0 !perl/Date::Appointment - 2002-01-01 - Once - Call Mom Those all turn into an opaque Perl object which exposes some methods. They're all identical in memory after the "diddling" stage. So they're different at the libyaml stage, different after they've been loaded by YAML.pm, but then my Date::Appointment::LoadHelper() turns them all into the same structure. When emitting, I might choose to always emit the keyed format. Why? Because otherwise, other implementations of YAML loaders won't display a very nice picture of the appointment. But I'm free to choose any particular output format I want, as long as I can read it back into the same structure. Of course, some formats won't be very useful to other loaders, since they don't know about "Date::Appointment": --- !binary|base64 ] JVBERi0xLjIKJcfsj6IKNCAwIG9iago8PC9MZW5ndGggNSAwIFIvRmlsdGVyIC9GbGF0ZURl Y29kZT4+CnN0cmVhbQp4nJVZ25LcNg59n6/ot81WubWirlTe1s7FrsqmKvHE+8xWc6aV0aWt iyf9HbsfvBABCtBI403KD1PFpkjgAOcAoMNAHcL5H/0tm7vPd58Pyq35P2VzeHt/949fs0MR FFmUJYf7hzt1OLothyRMgjQ6ZGEaRPpw33zzs63qv9//fqeKIMuLAjbdn7/597yig1QVRY4r ZnyqhmFePkY6gPXicFTR/EukYz2vR1GQxP6AH3oz2N4t50Gc+uWPY2/t6E5JiiDHU1Sg9fzj So to me, the distinction between different node "kinds" does not round-trip. Does that make sense? (I realize this example is somewhat contrived, and probably not even a very good idea :) Later, Neil |
From: Clark C . E. <cc...@cl...> - 2002-05-03 22:39:23
|
| > --- !perl/Foo::Bar [] | > --- !perl/Foo::Bar {} | | Not necessarily! These might represent the *same* in-memory structure. Well, I think that there is nothing saying that your application may interpret them the same; the purpose of the graph model is to describe what an application should preserve if it is going to round-trip data between different environments. | So to me, the distinction between different node "kinds" does not round-trip. | Does that make sense? (I realize this example is somewhat contrived, and | probably not even a very good idea :) Yep. And this is obviously "ok" if it is an application thingy, and you consider your application as having "modified" the data in the graphmodel. However, if these modifications are done, your application cannot consider itself as faithfully round-tripping data. ;) Clark |
From: Brian I. <in...@tt...> - 2002-05-03 16:04:41
|
On 03/05/02 01:30 -0700, Neil Watkiss wrote: > Hi guys, > > You know what? I need more examples. Brian's last-but-one email convinced me > he was right again. > > We have Python and Perl YAML bindings. Would it be possible to come up with > an example showing how this affects YAML interoperability? I haven't seen an > example of exactly _why_ the "YNY" matters all that much. A long time ago I remember that the Y3C (YAML 3 Cohorts) made an edict that we only cared about NYN roundtripping, since YNY roundtripping was nigh impossible. (As Neil's examples below demonstrate). I don't think even Clark would think that total YNYRTing is possible. I think he just wants to make sure that a sequence doesn't turn into a map at the serialization level. I think Clark needs to produce some real world cases where this would cause problems, or we should just drop the notion of YNYRT forever. The funny thing is that something like "sparse arrays" really only matters in the "YAML as Serialization" world, not in the "YAML as XML" or "YAML as Config" worlds. In other words, the only processes (in the real world) that would be outputting sparse arrays are systems that would be reading them back in themselves. Even in the case where I have some system that is half Perl, half Python I don;t see the problem with normal sparse arrays. The only gray area is a Perl object (Foo::Bar) that likes to express itself in a sparse manner wouldn;t be interoperable in Python. Big fricking deal! This is *not* going to be a real world problem. If it is, the fix is simple: 'Foo::Bar' should express itself differently. Cheers, Brian > > Let's label some YAML data "Y", and say that Y[i] represents Y at some point > through the following system: > > 0 1 2 3 > -------- ---------- -------- > input -> | Perl | -> | Python | -> | Perl | -> output > -------- ---------- -------- > > At the beginning, Y[0], is the original YAML stream. > > In my mind, the following statements MUST be true if all components in this > system are YAML compliant: > > *1) Y[i] !=? Y[j] for i & j such that i != j > > **2) When loaded by a particular loader, Y[i] and Y[j] are "equal" > for all i and j > > * : !=? means "not necessarily equal to". > **: By equal, I mean that the data structures must be identical. In other > words, by being loaded and dumped, I can't have an integer turn into a > string or a floating point; arrays must stay arrays; "sparse arrays" > (whatever those may be) must remain sparse arrays. > > The obvious thing to notice here is that YAML does not _require_ the Python > implementation to load YAML in the same way as the Perl implementation, as > long as the result of loading Y[i] is the same, no matter what stage it was > grabbed from. > > Examples: > > Y[0]: > > --- !seq > 0: 1 > 2: 3 > > Y[1]: > > --- [1,~,3] > > Y[2]: > > --- > - 1 > - ~ > - 3 > > Y[3]: > > --- #YAML:1.0 !seq > - !int|dec "1" > - !null|tilde "~" > - !int|hex "0x03" > > See whatI mean? These all represent the exact same structure in memory (for > the same loader, that is -- PyYAML differs from Perl of course). But each > stream is totally different at the Y level. > > Later, > Neil > > Clark C . Evans [03/05/02 02:24 -0400]: > > Three options: > > > > 1. We have it such that "keyed"/"series" distinction should > > always be preserved in a YNY round-trip. In this case, > > our trusty !seq {0:one, 1:two} example must be set aside > > and we should instead introduce !sparse {0:one, 1:two} for > > the use case of a sparse array. > > > > 2. We have it such that "keyed"/"series" distinction need not > > be preserved in a YNY round-trip. In this case, your example > > of !perl/Foo::Bar {} is ambiguous and you must make your > > type family more explicit. > > > > 3. We be very restrictive. In this case, we state not only > > that "keyed"/"series" distinction must be preserved in a YNY > > round-trip, but also that a given type family can only be used > > for keyed, series, or scalar kinds and not for any other kind. > > For this case, !perl/Foo::Bar would need a more explicit > > family to distinguish between "hashtable" and "array". Further, > > in this case, we would also need a !sparse type. > > > > I like option #3 the best. It is the least flexible, but I think > > that this will maximize the chances for round-tripping. I guess > > #1 is better than #2, I never really liked a !seq making a sparse > > array anyway. > > _______________________________________________________________ > > Have big pipes? SourceForge.net is looking for download mirrors. We supply > the hardware. You get the recognition. Email Us: ban...@so... > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Oren Ben-K. <or...@ri...> - 2002-05-02 13:46:41
|
Brian Ingerson [mailto:in...@tt...] wrote: > Neil has a pretty compelling argument here. Yup. That's why I tend towards Explicit myself. > Can anyone think > of a situation > where this would be a burden? I was thinking: > > --- > - !map '' > - !seq '' > ... > > We'd need to be explicit here. But on the otherhand, I think > we can safely > drop these transfers altogether for empty map and seq. Yes. This would be a simplification - no longer would map and seq accept any leaf node. > We should use the > following in all cases: > > --- > - {} > - [] > ... > > Then I can have a much nicer object syntax for Perl: > > --- > - !perl/Foo::Bar [] > # instead of > - !perl/Foo::Bar:array > ... Which is much nicer, isn't it? > But back to the matter at hand. Oren, how would EXPLICIT > affect the productions. I *think* it would just require removing the '?' I added in production 163. I'd have to verify this, and also go over all the examples, of course. Have fun, Oren Ben-Kiki |