From: Oren Ben-K. <or...@be...> - 2004-09-16 16:47:47
|
Ah, there's nothing like home cooking in a Spanish-descent Jewish family on new years (and Passover). I'm stuffed as a turkey with so many delicious things I can't even begin to list them. And, I have a perfect alibi for sitting the last brain storm out :-) To the issue. I have one main point before responding to specific issues. There have been so many proposals I can't keep track (remind me of the sald types... by I digress). I think that the last week shows that we can't make any progress by simply throwing proposals up in the air and hoping to hit some magical spot everyone will like. We can keep at it for months getting nowhere. IMVHO, we should focus on the requirements rather than the proposed solutions. We have worked out a short list of requirements: R1: Cooking is a simple, table-less syntactical-only operation. R2: A YAML schema does not transform one private/global/yaml tag to another. R1 + R2 got us the the original !, !! proposal. Brian then added: R3: Private tags should not have to use !!. R1+R2+R3 got us survey item #1. People weren't happy; it turned out that there is another requirement: R4: Quoted scalars are Unicode strings. This got us to survey item #4. #4 has a single advantage going for it: It is the simplest, cleanest proposal we know of that satisfies *all* of R1 - R4. People are still unhappy with #4. To me, this says that people have some yet-unphrased requirement(s) that are not being addressed. Instead of getting lost in a maze of twisty little proposals, all different, could you please put your finger on what's wrong with survey item #4 and phrase it as a requirement? Then, and only then, it would make sense to see what the simplest proposal would be to satisfy the new requirement R5 _in addition_ to R1 - R4. I'm very tired of reading proposals that happily violate some R1/R2/R3/R4, in order to solve some implicit problem which only clear to the proposal's author. Tim was the only one who posted along these lines. He defined two properties of a YAML schema: "Property I" seems to be, basically, R1 + R2. "Property S" is less clear to me. Since this is the one Tim feels isn't provided by survey item #4, I'd be very interested in him rephrasing it. I'll now try to deduce what the missing requirement R5 is. I'm trying to reverse-engineer around 60 messages here while full of a similar number of different dishes, so I'll probably get it wrong... At any rate, it seems to be: R5(?): Quoted scalars should not have a "concrete" type. On the off chance that this is a fair description of what bothers people, before we move on to try to create a #143 proposal, I'd like to try to reconcile this with survey item #4. I'll make three points: 1. When people write "123a", they expect it to be a "characters string". If someone here feels that it EVER makes sense for "123a" to be loaded into anything else than a "characters string", please speak up. AFAIR nobody ever suggested even one example saying: well, in this and that case, it makes more sense for "123q" to mean something other than a string, so using tag:yaml.org,2002:str for it prevents me from doing what I want. Note that YAML's design goal #1 is being human readable. So, show me an example where it is more readable to have "123q" be anything than a simple string. I don't care much for saying "sure, its always best for it to be a string, but I want my options open". YAGNI. Show me ONE plausible use case, first. 2. Now, let's look at R4 again. It says: quoted scalars are Unicode strings. Now, a Unicode string is a concrete type. It carries with it a certain semantics baggage. Saying quoted strings should not have a "concrete" type seems to directly contradict R4. So, it may be that people who want R5 don't really want R4. They want: R4.1: quoted scalars mean something else than plain scalars. Quick show of hands: who is for R4, and who is for R4.1? What is your use case? 3. Yes, tag:yaml.org,2002:str is a concrete *TAG*. It is NOT a concrete native data TYPE. Each application is free to load strings into whatever native data type (or, sa Brian calls it, "cave drawing") it wants. Again: we are talking about the INTENT of whoever is writing the document, NOT the details of how this intent is processed by a given system. Operationally. you may run atoi on your strings in your application, and divide each integer node by its indentation level - FINE. But a "YAML schema" (INTENT of a YAML document) can't say that '!!str 23' should be put through atoi, and that '!!int 23' nodes should be divided by the indentation level. So much for defending survey item #4. Now for some "previously prepared fallback positions" :-) Suppose that "everyone" votes for R4.1 instead of R4. This inevitably brings us to the "plain flag". Nodes without a tag have no tag (== each YAML schema is free to specify a different INTENT for such nodes). But, these nodes know whether they were plain or not (== each YAML schema is allowed to assign a different INTENT to a node with " than to anode without a "). The simplest thing to do in this case is something along the lines of survey item #3 (and about a dozen mutations thereof). Define *two* "I have no tag" values, one for non-plain scalars, one for all other untagged nodes. These two values can be called QNULL and NULL, % and @, or ~foo and ++bar!@# - that's a secondary, minor issue, which can be settled very quickly if we agree to adopt R4.1 instead of R4. Personally I'll go with " and ! (think about it). Either way, problem solved. So, bottom line, please answer the following quick survey (one bit answers only :-): - Are you for R4 or R4.1 + R5? - Do you have an example of why R4.1 is useful in any way? - Do you have an R6 in mind? Sorry for the long post, but I had to read 60 of yours! ;-) Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@be...> - 2004-09-16 19:42:21
|
I wrote: > Show me ONE plausible use case, first. In IRC, Clark just pointed out to me that Tim did: > - > pattern: "[x-z]foo*" # match the RE "[x-z]foo*" > - > pattern: !!str "[x-z]foo*" # match the string "[x-z]foo*" OK. This leaves us with R4.1 (and therefore R5), and therefore survey item #3. Sigh. He also suggested that we use ! and ? (" is too specific for "foo"): - 23 # tag is ! - {} # tag is ! - [] # tag is ! - "23" # tag is ? - '23' # tag is ? - >- # tag is ? 23 - |- # tag is ? 23 Tags must have a suffix so: - A ! by itself always means ! (regardless of %TAG). - !! (and !foo!) are invalid - Tag resolution handles "I have no tag" tags (! and ?) only. I can live with that. So can Brian. It seems that most people have also grown to accept this in the last day or so. OK, "make it so". Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-17 03:29:29
|
Good enough. If it turns out to be horribly wrong, we will certainly find out by the time the spec is updated and implementations get onto the same page. I'd like to thank everyone for their feedback and serious thoughts on the matter. On Thu, Sep 16, 2004 at 10:42:07PM +0300, Oren Ben-Kiki wrote: | OK. This leaves us with R4.1 (and therefore R5), and therefore survey | item #3. Sigh. He also suggested that we use ! and ? (" is too specific | for "foo"): | | - 23 # tag is ! | - {} # tag is ! | - [] # tag is ! | - "23" # tag is ? | - '23' # tag is ? | - >- # tag is ? | 23 | - |- # tag is ? | 23 | | Tags must have a suffix so: | - A ! by itself always means ! (regardless of %TAG). | - !! (and !foo!) are invalid | - Tag resolution handles "I have no tag" tags (! and ?) only. | | I can live with that. So can Brian. It seems that most people have also | grown to accept this in the last day or so. OK, "make it so". | | Have fun, | | Oren Ben-Kiki | |
From: Damian C. <dam...@gm...> - 2004-09-17 13:27:35
|
On Thu, 16 Sep 2004 22:42:07 +0300, Oren Ben-Kiki <or...@be...> wrote: > > - > > pattern: "[x-z]foo*" # match the RE "[x-z]foo*" > > - > > pattern: !!str "[x-z]foo*" # match the string "[x-z]foo*" Yes, except that I would claim that this is surprising enough that better formulations would be - pattern: "[x-z]foo*" - exact: "[x-z]foo*" (using different keys for different data), or one of - pattern: !re "[x-z]foo*" - pattern: "[x-z]foo*" or - pattern: ! "[x-z]foo*" - pattern: "[x-z]foo*" or even - pattern: /[x-z]foo*/ - pattern: ! "/foo: bar+/" - pattern: "[x-z]foo*" using the !-tag to mark the surprising case. The third example uses /.../ tokens as implictly typed regexps. Even if regexps are the common case, it is nevertheless an surprising interpretation of a quoted string, worthy of a !-tag. After all, in languages that have regexps as a built-in type (like Perl and JavaScript), the syntax is not "...", it is /.../. [...Anyways, proceeding with R4.1 assumed...] > Tags must have a suffix so: > - A ! by itself always means ! (regardless of %TAG). +1. Despite suggestions, we do not need to redefine null... > - !! (and !foo!) are invalid +1. This means we avoid having silent errors it we stutter on the ! key. > I can live with that. So can Brian. It seems that most people have also > grown to accept this in the last day or so. OK, "make it so". It's OK by me too. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-17 00:52:08
|
On Thursday 16 September 2004 12:47 pm, Oren Ben-Kiki wrote: > R4: Quoted scalars are Unicode strings. =2E.. > 3. Yes, tag:yaml.org,2002:str is a concrete *TAG*. It is NOT a concrete > native data TYPE. Each application is free to load strings into > whatever native data type (or, sa Brian calls it, "cave drawing") it > wants. Again: we are talking about the INTENT of whoever is writing the > document, NOT the details of how this intent is processed by a given > system. Operationally. you may run atoi on your strings in your > application, and divide each integer node by its indentation level - > FINE. But a "YAML schema" (INTENT of a YAML document) can't say that > '!!str 23' should be put through atoi, and that '!!int 23' nodes should > be divided by the indentation level. I think this is really at the heart of the matter. Clearly the application = can=20 override what yaml.org,2002:str resolves too. So implicit typing can be don= e=20 on the concrete string despite the dictate. And, I think its fair to say,=20 that one could make a like argument for yaml.org,2002:map or=20 yaml.org,2002:seq. Obviously we don't want that --concrete should mean=20 concrete. Yet a clean solution seems to call for making them all alike, but= =20 then how would we differentiate between the str/map/seq which are to be=20 implicitly typed vs. those that are meant to be strictly what they are stat= ed=20 to be. _That's the basic problem._=20 (If that seems confusing don't worry about it. It's hard for me to explain,= =20 but it'll get clearer with what follows.) > R4.1: quoted scalars mean something else than plain scalars. Indeed! And for str, we have a curious solution to the above problem: the=20 plain-style. If we don't use quotes then the scalar is implicitly typed; if= =20 we do, it's not implicitly typed. That's the 'something else'. But map and= =20 seq don't have this unquoted variation. So in a way R4 works, b/c R4.1 is=20 already a given.=20 Yet I think you mean something more by R4.1 than just the implicit=20 differentiation --that a quoted str can be implicitly typed too something=20 else. And I agree with you. That's not really a proper distinction. That's= =20 what the plain scalar is for. But I'm not necessarily going to stop you,=20 either. It's no sweat off my back. Show me an example where allowing this=20 presents a problem for others. Tim has already showed an example where it=20 could help someone even if, as he said, it seems "blah" unlikely. So while I largely agree with you, I still feel that proposal #4 simply has= n't=20 "peaked" to the clean, symmetrical solution that is possible. So what's irking me?=20 R5: Provide a clean solution, just allow us some flexibility to _reasonably_ dirty it up a bit if we want/need. The only truly clean solution is #1. We all know this. So if we don't adopt= it=20 straight out, we have but three alternatives. 1) We can just except the war= t=20 as it is. 2) We can go a bit further and minimize it (which is what the=20 bandaids #3 and #4 and the like are all about). Or 3) we can "practically"= =20 heal the wound --meaning, we can actually heal it, while allowing an openin= g=20 to the other behaviors. This satisfies R5. And we can do this simply by=20 combining the ideas of proposal #2 with the default behavior of #1 and usin= g=20 TAG to rearrange those as suits the user and/or application. R6: All nodes have a transfer tag. This is how I felt originally. Then I was convinced otherwise on the ground= s=20 that if there's no tag given, then, well, "there is no tag" --it's NULL,=20 moreover, I had been told, NULL is simpler to use in the implementation. So= =20 given the two I recanted. Well, honestly I am not satisfied with this. For= =20 starters, NULL also imples "no type". That's not right. But worse, it's not= =20 actually simpler for implementation b/c now you have to pass _kind_ along=20 too, when before type was all that was needed. So NULL really hasn't bought= =20 us anything. (IMHO) R6 is important --whether the tag is explicitly given o= r=20 not, it has one. So what are the types of the untagged nodes? Exactly what= =20 they are now, exactly what YAML spec defines them as yaml.org,2002:map,=20 yaml.org,2002:seq, yaml.org,2002:str. Those are the three core types,=20 corresponding to the three kinds. It is clear. It is logical. R7: Pass only the information that is necessary, no more, no less. This is really the source of the ugly asymmetries of the past bandaids, in= =20 that we haven't been able to concisely determine what to pass from parser t= o=20 resolver. It has been narrowed down to _kind_ plus _type_, being NULLs or != =20 for untagged items, except for the odd man out which is then ? or " or=20 whatever.... Face it they're all hacks. The problem is we simply haven't=20 isolated the proper info. If you reread the first paragraph of this post, i= n=20 it lies the answer to this. The missing piece of info is simply: To Implici= t=20 or Not To Implicit. Of course, we have known it all along, in fact we had=20 been trying to stomp it out b/c we had mistakenly thought it was wrapped up= =20 with style of scalars. But its not. It applies to all. So instead of (kind,= =20 type, context, content), we pass (type, implicit?, context, content). The=20 important difference between what we have now and this solution is that the= =20 implicit flag isn't just a plain-scalar flag. Rather it applies to any node= ,=20 and says to the resolver to try implicit typing, failing that, use the=20 supplied type. Okay, so now we can put it all together. We set aside _kind_ once again and= =20 let type do the work, and add in our missing implicit flag. I will denote t= he=20 flag ? for implicit and ! for not implicit: The default behavior, a la #1: --- - 23 # tag:yaml.org,2002:str, ? - "23" # tag:yaml.org,2002:str, ? - [] # tag:yaml.org,2002:seq, ? - {} # tag:yaml.org,2002:map, ? So all the types that we would expect ally to nodes that are not tagged. Bu= t=20 an implicit flag goes with them. Now we can make any of these concrete by=20 adding a !. --- - ! "23" # tag:yaml.org,2002:str, ! Behavior like #4 can be achieved with a simple TAG header. We define four=20 special TAG keywords to access the untagged variations. This is a barrowing= =20 form proposal #2, but it is limited to the TAG directive itself. We can nam= e=20 these whatever you'd like, but for now I'll just label them STR, QSTR, MAP= =20 and SEQ. Here's the an example: %TAG QSTR tag:yaml.org,2002:str --- - 23 # tag:yaml.org,2002:str, ? - "23" # tag:yaml.org,2002:str, ! - [] # tag:yaml.org,2002:seq, ? - {} # tag:yaml.org,2002:map, ? Someone might complain that this leaves too much flexibility --that QSTR, o= r=20 any of the others, can now be made to be anything, not just implicit vs.=20 explicit. Well, you can limit this if you prefer to allow only '%TAG QSTR != '.=20 But why bother? This bit of flexibility seems to me just the right amount o= f=20 rope --we're not too far off the ground for it to be in any way worrisome. I could go on. There is at least one thing further to point out, but I'll s= ave=20 it. I'm sure you get the general picture from here. Basically we can have t= he=20 clean #1 solution, plus clearly expected defaults for non-tags, and a simpl= e=20 tag line can get us prior behavior should we want/need it, so compatibility= =20 transition is rather minor. T. P.S. Hmm... You can number this one whatever you like. But if you ask me #9= t=20 will work --it sure feels like that many ;) =2D-=20 ( o _ =E3=82=AB=E3=83=A9=E3=83=81 // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. =2DMark Twain |
From: Damian C. <dam...@gm...> - 2004-09-17 13:11:01
|
On Thu, 16 Sep 2004 19:47:41 +0300, Oren Ben-Kiki <or...@be...> wrote: > I don't care much for saying "sure, its always best for > it to be a string, but I want my options open". YAGNI. +1 on YAGNI. > Quick show of hands: who is for R4, and who is for R4.1? What is your > use case? I prefer R4 to R4.1. We are making a special case for quoted text only because it is so 'obvious' that quoted or folded text is literal text (which we represent as a Unicode string). If one wants quoted text to mean something other than what it looks like, that is surprising -- and one indicates surprise with ! or a specific !-tag. Contrariwise, if it is not compellingly 'obvious' that quoted text means a Unicode string, then arguably quotation marks are not compelling enough to be worth having a special exception for, which would imply that neither R4.1 nor R4 apply. > 3. Yes, tag:yaml.org,2002:str is a concrete *TAG*. It is NOT a concrete > native data TYPE. Each application is free to load strings into > whatever native data type it wants. Agree. And again, if you need some strings to be loaded differently, this is surprising, and using a !-tag to mark this seems acceptable to me. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |
From: Oren Ben-K. <or...@be...> - 2004-09-17 15:05:59
|
On Friday 17 September 2004 16:10, Damian Cugley wrote: > > Quick show of hands: who is for R4, and who is for R4.1? What is > > your use case? Since we have a use case, let's make this less abstract. If you have a schema that implicitly types like this: .../pattern: !re { .../x, .../y, .../z }: !!int Would you prefer this to mean this (R4): - pattern: [x-z]foo* # !re - pattern: ! "[x-z]foo*" # Also !re - pattern: "[x-z]foo*" # !!str - invalid - x: 12 # !!int y : ! "23" # Also !!int z : "34" # !!str - invalid Or this (R4.1): - pattern: [x-z]foo* # !re - pattern: ! "[x-z]foo*" # Also !re - pattern: "[x-z]foo*" # ALSO !re - x: 12 # !!int y : ! "23" # Also !!int z : "34" # ALSO !!int > I prefer R4 to R4.1. Me too; the above examples show why. However, if "everyone" sees the above examples and still prefers R4.1... I can live with it. Have fun, Oren Ben-Kiki |
From: Tim H. <tim...@co...> - 2004-09-17 16:26:31
|
Oren Ben-Kiki wrote: >On Friday 17 September 2004 16:10, Damian Cugley wrote: > > >>>Quick show of hands: who is for R4, and who is for R4.1? What is >>>your use case? >>> >>> > >Since we have a use case, let's make this less abstract. If you have a >schema that implicitly types like this: > > .../pattern: !re > { .../x, .../y, .../z }: !!int > >Would you prefer this to mean this (R4): > > - pattern: [x-z]foo* # !re > - pattern: ! "[x-z]foo*" # Also !re > - pattern: "[x-z]foo*" # !!str - invalid > - x: 12 # !!int > y : ! "23" # Also !!int > z : "34" # !!str - invalid > >Or this (R4.1): > > - pattern: [x-z]foo* # !re > - pattern: ! "[x-z]foo*" # Also !re > - pattern: "[x-z]foo*" # ALSO !re > - x: 12 # !!int > y : ! "23" # Also !!int > z : "34" # ALSO !!int > > > >>I prefer R4 to R4.1. >> >> > >Me too; the above examples show why. However, if "everyone" sees the >above examples and still prefers R4.1... I can live with it. > > > I prefer 4.1 to 4, and the above examples just reinforce that. Odd. I wonder if I was just dropped on my head as a child? -tim |
From: Oren Ben-K. <or...@be...> - 2004-09-17 20:02:20
|
On Friday 17 September 2004 19:26, Tim Hochberg wrote: > >Would you prefer this to mean this (R4): > > z : "34" # !!str - invalid > >Or this (R4.1): > > z : "34" # ALSO !!int > I prefer 4.1 to 4, and the above examples just reinforce that. Odd. I > wonder if I was just dropped on my head as a child? Perhaps :-) So far we have Damian for R4, you for R4.1. I think Brian is for R4.1 as well, and that Clark is mostly neutral. I like R4 better myself, but I can live with R4.1. So far it seems R4.1 is it. Oh well... Have fun, Oren Ben-Kiki |
From: Tim H. <tim...@co...> - 2004-09-17 16:24:17
|
Oren Ben-Kiki wrote: >Ah, there's nothing like home cooking in a Spanish-descent Jewish family >on new years (and Passover). I'm stuffed as a turkey with so many >delicious things I can't even begin to list them. And, I have a perfect >alibi for sitting the last brain storm out :-) > > Ah. So Jealous... My response to this is a bit delayed; this message got stuck somewhere and didn't arrive till after all the reply's. I'm in somewhat of a time crunch, so this answer will be shorter than this probably deserves. >To the issue. I have one main point before responding to specific >issues. There have been so many proposals I can't keep track (remind me >of the sald types... by I digress). I think that the last week shows >that we can't make any progress by simply throwing proposals up in the >air and hoping to hit some magical spot everyone will like. We can keep >at it for months getting nowhere. > >IMVHO, we should focus on the requirements rather than the proposed >solutions. We have worked out a short list of requirements: > >R1: Cooking is a simple, table-less syntactical-only operation. > >R2: A YAML schema does not transform one private/global/yaml tag to >another. > >R1 + R2 got us the the original !, !! proposal. Brian then added: > >R3: Private tags should not have to use !!. > >R1+R2+R3 got us survey item #1. People weren't happy; it turned out that >there is another requirement: > >R4: Quoted scalars are Unicode strings. > >This got us to survey item #4. #4 has a single advantage going for it: >It is the simplest, cleanest proposal we know of that satisfies *all* >of R1 - R4. > > This is a good summary, although I'll have a nit to pick in a minute. I think it does point out the areas of disagreement nicely. >People are still unhappy with #4. To me, this says that people have some >yet-unphrased requirement(s) that are not being addressed. Instead of >getting lost in a maze of twisty little proposals, all different, could >you please put your finger on what's wrong with survey item #4 and >phrase it as a requirement? > > We disagree about the R4 requirement. I'm not sure this is the perfect embodiment of the version of R4 that lives in my head, but: R4': A scalar's quotedness is part of it's context. An application doing implicit typing will generally use that bit of context to tag quoted scalars as unicode strings. An application with a schema can use this bit of context however it wants, but there's two primary options: ignore it or treat it as string. Taken literally, this probably leads right back to the plain scalar tag, but #4q (==#4.1, right?) is essentially the same, while papering over the wart. I think that this may be the only area of disagreement. >Then, and only then, it would make sense to see what the simplest >proposal would be to satisfy the new requirement R5 _in addition_ to R1 >- R4. I'm very tired of reading proposals that happily violate some >R1/R2/R3/R4, in order to solve some implicit problem which only clear >to the proposal's author. > >Tim was the only one who posted along these lines. He defined two >properties of a YAML schema: > >"Property I" seems to be, basically, R1 + R2. > > Yep. Maybe even just R2. Basically, Property I is just R2 as applied to implicit typing. >"Property S" is less clear to me. Since this is the one Tim feels isn't >provided by survey item #4, I'd be very interested in him rephrasing >it. > > Property S is just R2 as applied to non-implicit typing. The reason they ended up separate is mostly a result of a misinterpretation of something you (Oren) said. I was interpreting some of your comments on only filling in NULL values during implicit typing, to only apply to implicit typing not other schema. For this reason, property S was separated out as the new part. It now appears from R2, and a subsequent example you sent out that we actually agree on R2, which is just the two properties lumped together and thus you agree with property S whether you know it or not. Of course the application can mangle the tags however it wants once it get's hold of them, but let's call that a transform not a schema. I don't know if that's good terminology, but it works for me. >I'll now try to deduce what the missing requirement R5 is. I'm trying to >reverse-engineer around 60 messages here while full of a similar number >of different dishes, so I'll probably get it wrong... At any rate, it >seems to be: > >R5(?): Quoted scalars should not have a "concrete" type. > > Sure. That sounds good. Note that it conflicts with R4 as you've phrased it, but not with my modified R4'. >On the off chance that this is a fair description of what bothers >people, before we move on to try to create a #143 proposal, I'd like to >try to reconcile this with survey item #4. I'll make three points: > >1. When people write "123a", they expect it to be a "characters string". >If someone here feels that it EVER makes sense for "123a" to be loaded >into anything else than a "characters string", please speak up. AFAIR >nobody ever suggested even one example saying: well, in this and that >case, it makes more sense for "123q" to mean something other than a >string, so using tag:yaml.org,2002:str for it prevents me from doing >what I want. > > Anything that might require quotes for escape purposes, but that I want to interpret as something else based on a schema. I believe that the regex example already came up. I'm sure I could come up with others if necessary. No time right now though. The application can ususally still work, it just can't use a simple schema; it has to break out the big guns and use a transform. >Note that YAML's design goal #1 is being human readable. So, show me an >example where it is more readable to have "123q" be anything than a >simple string. I don't care much for saying "sure, its always best for >it to be a string, but I want my options open". YAGNI. Show me ONE >plausible use case, first. > >2. Now, let's look at R4 again. It says: quoted scalars are Unicode >strings. Now, a Unicode string is a concrete type. It carries with it a >certain semantics baggage. Saying quoted strings should not have a >"concrete" type seems to directly contradict R4. > > Yes. >So, it may be that people who want R5 don't really want R4. They want: > >R4.1: quoted scalars mean something else than plain scalars. > > Ooops. I guess I should read the whole message before replying. Yes, R4.1 is equivalent to R4' >Quick show of hands: who is for R4, and who is for R4.1? What is your >use case? > > I'm for 4.1. Use case later if required. >3. Yes, tag:yaml.org,2002:str is a concrete *TAG*. It is NOT a concrete >native data TYPE. Each application is free to load strings into >whatever native data type (or, sa Brian calls it, "cave drawing") it >wants. Again: we are talking about the INTENT of whoever is writing the >document, NOT the details of how this intent is processed by a given >system. Operationally. you may run atoi on your strings in your >application, and divide each integer node by its indentation level - >FINE. But a "YAML schema" (INTENT of a YAML document) can't say that >'!!str 23' should be put through atoi, and that '!!int 23' nodes should >be divided by the indentation level. > >So much for defending survey item #4. Now for some "previously prepared >fallback positions" :-) > >Suppose that "everyone" votes for R4.1 instead of R4. This inevitably >brings us to the "plain flag". Nodes without a tag have no tag (== each >YAML schema is free to specify a different INTENT for such nodes). But, >these nodes know whether they were plain or not (== each YAML schema is >allowed to assign a different INTENT to a node with " than to anode >without a "). > >The simplest thing to do in this case is something along the lines of >survey item #3 (and about a dozen mutations thereof). Define *two* "I >have no tag" values, one for non-plain scalars, one for all other >untagged nodes. These two values can be called QNULL and NULL, % and @, >or ~foo and ++bar!@# - that's a secondary, minor issue, which can be >settled very quickly if we agree to adopt R4.1 instead of R4. >Personally I'll go with " and ! (think about it). > > I contend that #4q/#11 are marginally cleaner than #3 since in the former quoted scalars are the odd-type-out, while in the latter plain scalars are. This has the consequence that "! []" and "[]" are the same under #4q/#11 by construnction, but they differ under #3 which is a little messy if you think about it. I concede that it's a minor point. >Either way, problem solved. So, bottom line, please answer the following >quick survey (one bit answers only :-): >- Are you for R4 or R4.1 + R5? > > R4.1 + R5 >- Do you have an example of why R4.1 is useful in any way? > > Yes. >- Do you have an R6 in mind? > > No -tim |
From: Brian I. <in...@tt...> - 2004-09-18 18:12:05
|
On 17/09/04 18:05 +0300, Oren Ben-Kiki wrote: > On Friday 17 September 2004 16:10, Damian Cugley wrote: > > > Quick show of hands: who is for R4, and who is for R4.1? What is > > > your use case? > > Since we have a use case, let's make this less abstract. If you have a > schema that implicitly types like this: > > .../pattern: !re > { .../x, .../y, .../z }: !!int > > Would you prefer this to mean this (R4): > > - pattern: [x-z]foo* # !re My mail server went to Mars for two days, so there may be more mail trickling in about this, but I'd like to point out that the above example is invalid: you can't start a plain scalar with '['. And this is the key to why this example is important. Quotes are *needed* as an escape mechanism. Can't do it without them. But the application requires that this be an !re. We *need* implicit typing. Now you could say "just use the ! to get implicit typing", but is that really fair to the application's owner? It depends. If there is just one or two overridings that need to happen for every 1000 values, then it is fair. But put on another pair of glasses. If the application's main data type is !re, then they will have to use a ! every time that quoting is needed for escaping reasons. And that is unfair, IMO. ... I liked what Onama said about passing exactly the information that is needed. It seems for scalars we need: - tag (or absence of tag) - quoted? - content - path (implicitly by nature) For collections we need: - tag (or absence of tag) - content - path (implicit by nature) We also need "kind" by nature; to know what, er, kind of node is being reported. Know what I mean? So it's looking to me like we need a "plain scalar flag" afterall. Which leaves us with R4.1. It has been shown that the flag can be passed as a special variation of the tag. I don't think that "kind" should be smushed into the tag, since kind wasn't a wart in the first place. Cheers, Brian |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-21 09:27:15
|
I suppose the minimal response to my last post was due to my overly verbose explanation on an aged thread. So here's a brief summary. Please see my previous post if you want the detailed reasoning behind these points. R5: Provide a clean solution, just allow some flexibility to _reasonably_ dirty it up a bit if want/need. R6: All nodes have a transfer tag. R7: Pass only the information that is necessary, no more, no less. These stipulations lead to the following solution: The default behavior, is a la #1. --- - 23 # tag:yaml.org,2002:str, ? - "23" # tag:yaml.org,2002:str, ? - [] # tag:yaml.org,2002:seq, ? - {} # tag:yaml.org,2002:map, ? All the expected types ally to nodes that are not tagged. Notice two pieces of information are being passed --an implicit flag goes with the tag. The flag is _not an actual question mark_, but simply a boolen flag meaning IMPLICIT ON. Now we can make any of these concrete by adding a !. --- - ! "23" # tag:yaml.org,2002:str, ! IMPLICT OFF. To get #4 behavior add TAG line: %TAG QSTR ! --- - 23 # tag:yaml.org,2002:str, ? - "23" # tag:yaml.org,2002:str, ! - [] # tag:yaml.org,2002:seq, ? - {} # tag:yaml.org,2002:map, ? TAG keywords are STR, QSTR, MAP and SEQ. These are borrowings from #2, but are limited to use with the TAG directive itself. These special TAG directives may be limited to just changing explicit vs. implicit, as shown here, if that is thought best --though I don't think it would be a problem to allow them full TAG flexability. The KEY POINT and important difference between what we have now and this solution is that the implicit flag isn't just a plain-scalar flag. Rather it applies to any node, and says to the resolver to try implicit typing, failing that, use the supplied type. T. P.S.S. Just yesterday I had a further notion: If the two most common behaviors by far and away will be #1 and #4, which I expect to be the case, and the '%YAML 1.1' directive is to become common place too, then the above tag line might be further briefly implied by an optional argument on the YAML directive itself, like '%YAML 1.1 -q' --albeit that may be too brief. |
From: Oren Ben-K. <or...@be...> - 2004-09-21 19:22:07
|
On Tuesday 21 September 2004 12:27, trans. (T. Onoma) wrote: > These stipulations lead to the following solution: The default > behavior, is a la #1. > ... > To get #4 behavior add TAG line: > > %TAG QSTR ! This is unacceptable to many people. R4(.1) requires that by default there will be a difference between 23 and "23", without the need for directives. I appreciate the cleanliness of #1, but it just isn't going to happen. Have fun, Oren Ben-Kiki |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-21 19:41:59
|
On Tuesday 21 September 2004 03:21 pm, Oren Ben-Kiki wrote: > On Tuesday 21 September 2004 12:27, trans. (T. Onoma) wrote: > > These stipulations lead to the following solution: The default > > behavior, is a la #1. > > ... > > To get #4 behavior add TAG line: > > > > %TAG QSTR ! > > This is unacceptable to many people. Are you speaking for many people? > R4(.1) requires that by default > there will be a difference between 23 and "23", without the need for > directives. > > I appreciate the cleanliness of #1, but it just isn't going to happen. And the rest of it? T. |
From: Clark C. E. <cc...@cl...> - 2004-09-21 20:01:00
|
On Tue, Sep 21, 2004 at 03:41:52PM -0400, trans. (T. Onoma) wrote: | On Tuesday 21 September 2004 03:21 pm, Oren Ben-Kiki wrote: | > On Tuesday 21 September 2004 12:27, trans. (T. Onoma) wrote: | > > These stipulations lead to the following solution: The default | > > behavior, is a la #1. | > > ... | > > To get #4 behavior add TAG line: | > > | > > %TAG QSTR ! | > | > This is unacceptable to many people. | | Are you speaking for many people? Oren and I prefer #1, however, enough people spoke up on the list to keep the distinction between 23 and '23'. Some decisions have to be made and be final. This is one of them. | > R4(.1) requires that by default | > there will be a difference between 23 and "23", without the need for | > directives. | > | > I appreciate the cleanliness of #1, but it just isn't going to happen. | | And the rest of it? Just beacuse the parser will report ? and ! doesn't mean your applications have to treat them the same. See Tim's post about how to configure your parser, I'm sure Syck has, or will have a similar mechanism. But, we've got closure on this issue. It may not be the perfect solution; but given what we know at this time, it is a local maximum, and its important we continue with implementation. Cheers! Clark P.S. Speaking of implementations, the CVS version of the parser I'm writing now accepts multi-line plain scalars per the specification. In a week I'll probably add collections. |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-21 20:32:56
|
On Tuesday 21 September 2004 04:00 pm, you wrote: > Oren and I prefer #1, however, enough people spoke up on the list > to keep the distinction between 23 and '23'. Some decisions have > to be made and be final. This is one of them. I know, if you recall, I was one of those people. I felt the idea of #1 was right, but was against it only b/c I didn't have a simple way back if I wanted it. A simple TAG directive solves that. > | > R4(.1) requires that by default > | > there will be a difference between 23 and "23", without the need for > | > directives. > | > > | > I appreciate the cleanliness of #1, but it just isn't going to happen. > | > | And the rest of it? > > Just beacuse the parser will report ? and ! doesn't mean your > applications have to treat them the same. See Tim's post about > how to configure your parser, I'm sure Syck has, or will have > a similar mechanism. Well, that's not my point at all. My point was that the untagged nodes are the YAML types but with an implicit indication, not separate types. The current standing propsal makes them different and mixes in kind. It's fairly ugly and very redundant. Consider the information being passed from the parser: type, kind ---------- ?, scalar !, scalar !, sequence !, mapping tag:yaml.org,2002:str, scalar tag:yaml.org,2002:map, mapping tag:yaml.org,2002:seq, sequence !mystr, scalar !mymap, mapping !myseq, sequence Compared to what I am proposing: type, implicit-flag -------------------------- tag:yaml.org,2002:str, ? tag:yaml.org,2002:map, ? tag:yaml.org,2002:seq, ? tag:yaml.org,2002:str, ! tag:yaml.org,2002:map, ! tag:yaml.org,2002:seq, ! !mystr, ! !mymap, ! !myseq, ! > But, we've got closure on this issue. It may not be the perfect > solution; but given what we know at this time, it is a local > maximum, and its important we continue with implementation. Sure. T. |
From: Clark C. E. <cc...@cl...> - 2004-09-22 16:31:28
|
(very late reponse, sorry) On Tue, Sep 21, 2004 at 04:32:48PM -0400, trans. (T. Onoma) wrote: | Well, that's not my point at all. My point was that the untagged | nodes are the YAML types but with an implicit indication, not | separate types. The current standing propsal makes them different | and mixes in kind. It's fairly ugly and very redundant. Yes, and this is by design. The January specification implies exactly this 'implicit-flag' and getting rid of it was my express goal. ... This ugly flag-wart was forcing itself into the signature of my otherwise pretty API and I didn't like it. In particular, the idea of the flag is that it should only be used during tag resolution, but, as soon as the flag is there, people naturally want to use it _after_ tag resolution. For all pratical purposes this makes the real 'tag' a (tag, plain-scalar) pair and this is not a thought I want to entertain. The standing agreement lets the parser 'bake' the flag into the tag when as one of two special values. The nice thing about this proposal is that after tag resolution, you just have a single tag... no flag. So the physical translation to an API better matches the intent of the construct. Admittedly, I don't like the idea of needing the 'kind' to differentate between '!' and '!', however, the kind is already in the model. Further, an implementation could arguably do !str, !seq and !map without violating this proposal, but alas, this is why we need implementation feedback. I hope this explains the rationale. There is more than one way to solve this problem, the current agreement may work perfectly or it may be not-quite-right. In either case, the final decision will happen after implementers have reported back their experiences. Cheers, Clark |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-21 23:40:12
|
On Tuesday 21 September 2004 04:00 pm, Clark C. Evans wrote: > Oren and I prefer #1, however, enough people spoke up on the list > to keep the distinction between 23 and '23'. =C2=A0Some decisions have > to be made and be final. =C2=A0This is one of them. Oh, and BTW, with this proposal, you can still make the default behavior #4= ,=20 and use the TAG directive to get #1 instead. (Although I think that=20 backwards, nonetheless...) T. =2D-=20 ( o _ =E3=82=AB=E3=83=A9=E3=83=81 // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. =2DMark Twain |
From: Tim H. <tim...@co...> - 2004-09-22 00:28:04
|
trans. (T. Onoma) wrote: >On Tuesday 21 September 2004 04:00 pm, Clark C. Evans wrote: > > >>Oren and I prefer #1, however, enough people spoke up on the list >>to keep the distinction between 23 and '23'. Some decisions have >>to be made and be final. This is one of them. >> >> > >Oh, and BTW, with this proposal, you can still make the default behavior #4, >and use the TAG directive to get #1 instead. (Although I think that >backwards, nonetheless...) > > Combo schemes seem loopy to me. #1 is nice because it has a simple model. #4q and #3 are nice because they do everything that people want done. If YAML supports #3/4q, it's trivial at the application level to treat SCALAR_NULL and QUOTED_NULL the same, so applications that want #1 behaviour can get it easily. For this reason, the sole advantage for #1 is it's simpler model, which makes it simple to explain and reason about. Don't get me wrong, that's a real advantage, but it doesn't extend to signifigantly easier coding at the application level. A combo scheme that switches is between, for example, #1 and #4q behaviour based on some directive is more complex than either. This is true both at a theoretical and at a practical level, since some set of users is saddled with adding more tags to their YAML documents or else they mysteriously fail (and think for a minute about the kind of failures this would cause). Therefore, I believe that the various combo schemes are worse /in every way/ than just #4q or #3 by themselves. -tim [And I still believe that 4q is marginally better than 3, but that's a different debate] |
From: Brian I. <in...@tt...> - 2004-09-22 04:46:41
|
On 21/09/04 16:00 -0400, Clark C. Evans wrote: > On Tue, Sep 21, 2004 at 03:41:52PM -0400, trans. (T. Onoma) wrote: > | On Tuesday 21 September 2004 03:21 pm, Oren Ben-Kiki wrote: > | > On Tuesday 21 September 2004 12:27, trans. (T. Onoma) wrote: > | > > These stipulations lead to the following solution: The default > | > > behavior, is a la #1. > | > > ... > | > > To get #4 behavior add TAG line: > | > > > | > > %TAG QSTR ! > | > > | > This is unacceptable to many people. > | > | Are you speaking for many people? > > Oren and I prefer #1, however, enough people spoke up on the list > to keep the distinction between 23 and '23'. Some decisions have > to be made and be final. This is one of them. > > | > R4(.1) requires that by default > | > there will be a difference between 23 and "23", without the need for > | > directives. > | > > | > I appreciate the cleanliness of #1, but it just isn't going to happen. > | > | And the rest of it? > > Just beacuse the parser will report ? and ! doesn't mean your > applications have to treat them the same. See Tim's post about > how to configure your parser, I'm sure Syck has, or will have > a similar mechanism. > > But, we've got closure on this issue. It may not be the perfect > solution; but given what we know at this time, it is a local > maximum, and its important we continue with implementation. +1 This issue is not closed for all time. But it is closed for now. Further hair splitting will not help us get implementations sooner. Cheers, Brian |