From: Oren Ben-K. <or...@be...> - 2004-09-10 23:54:00
|
Clark made the following observation: - Given we allow to type a node by context (== path), typing it by its content alone becomes rare. Practically non existant in fact. This makes it possible to reconsider the usefulness of distinguishing between "23" and 23. Take the following example: --- !scatter-graph - x : 1 # looks like an int y: 4.5 label: Hi point-size-in-pixels: 2 - x : 12.4 y: 17.0 label: 8.5 # looks like a float point-size-in-pixels: 4 ... The schema (== application == intent of the author == etc.) says that 'x' and 'y' are floats, a label is a string, and point-size-in-pixels is an integer. Note this solves the "looks like an int/float problem". The schema says that 'x' is always a float, so it doesn't care if it matches the int regexp (and likewise for the label). *This is the typical case*. Sure, there may be a case where someone will write a mixed list where entries may be ints, floats or strings, without any way of knowing which is which without looking inside each entry - but this is SO rare: --- !mixed-list - 1 # An int - !!float 2 # Looks like an int... - 2.0 # OK, a float - 2.5 # A float - Hi # A string - "8.5" # Looks like a float, # but quoting saves the day ... With this in mind, suppose we remove all distinction between plain and non-plain scalars. That's certainly as consistent as it gets. What are the implications? The first example stays the same! The schema uses the path _anyway_. It _has_ to, in order to allow 'x : 1' to be valid. In this schema, the regexp for each type is used for _validation_ (if at all) rather than detection... in practical terms, it might be just calling atof(). So the regexp for string matches all floats, and the regexp for floats matches all ints, and *all is well* because its the _path_ that is used to determine the type. The mixed list, however, suffers a bit. Surprisingly little, actually... In this case, regexps _are_ used for detection, so any case of "wrong auto-detection" need to be tagged. And under proposal #4, quoting _doesn't_ save the day: --- !mixed-list - 1 # An int - !!float 2 # Oops, looks like an int. - 2.0 # OK, a float - 2.5 # Also a float - Hi # A string - !!str "8.5" # Oops, looks like a float, # and quoting doesn't help... ... My additional observation (leading to proposal #4) is that if '?str' and '?var' are unified as above, it is no longer useful to have three different unspecified tags. The main sense of having '?' tags was distinguishing between '?str' and '?var'. After all, the parser reports the kind of the node _anyway_ (it has to), so it is quite sufficient to have just one "I have no tag" indication. What did we get? - If a node has no tag, the parser reports it has no tag. - Each node's style, indentation, comments, etc. MAY be discarded by the parser. Or it might not. However... - The native data type for nodes with a tag is determined by the tag and only by the tag. This is a restriction on "what is a YAML schema". An application may cheat, but then it wouldn't be using a YAML schema. - The native data type for nodes without a tag is determined by the node's context (path to the node) and its content, and only by the context and content. Again, this is a restriction on "what is a YAML schema". It is expected that the vast majority of the schemas will use only the context ( { x : ... => float, y : ... => float, label : ... => string). "Mixed" data structures are very rare. It is OK to use "content" in such cases (e.g., a list of shapes where { x : ..., y : ..., r : ... } => circle, { x : ..., y : ..., w : ..., h : ... } => rectangle). - Every native data type must have a tag associated with it. Another restriction on "what is a YAML schema". - Hence, there MUST be a (possibly implicit) step that (possibly implicitly) provides a tag for each node having no tag. Lets call this step "(automatic) tag completion". - The '!' and '!!' tag-specifiers are now valid. There's no longer a restriction that the "suffix" of a %TAG be non-empty. - !!str, !!map and !!seq move out of the spec. What did we gain (compared to #3++)? - No special cases. Hence: - No pesky issues about why style bleeds into the info model. It doesn't. - No pesky issues about how %TAG relates to a simple ! and !!. Ther are treated as usual. - No pesky issues about transformations. There aren't any. Why? because automatically computing a value that wasn't explicitly specified is completely different from taking a value that _was_ explicitly given and _replacing_ it by another value. Sure, some people might say it is still equivalent in some deep philosophical level - let them, I think the above is good enough an answer and if they don't buy it, well, tough, I don't have another answer for them :-) What did we lose (compared to #3++)? - The ability to use quoting as a way of enforcing a scalar to be interpreted as a string in a mixed-list. Which NEVER happens in any config file or data serialization file I can think of (off the top of my head). I guess there's some guy in China that actually uses mixed-list this way... fine, he'll have to use !!str. All in all, I think this is more than a fair trade. We trade all the downsides that were so deeply discussed with just one small one that is never (well, hardly ever :-) seen in practice. Thoughts? Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-11 01:20:30
|
+1 This is very nice. In summary: - style is soley 'presentation' detail - the missing tag is reported as NULL I was happy to go with the previous proposal (using ?str, etc.) however, T.Onoma and Sean do have a good point -- the plain scalar wart is irritating, so perhaps it should go. However, not in the direction of putting 'style' into the representation model. On Sat, Sep 11, 2004 at 02:53:55AM +0300, Oren Ben-Kiki wrote: | - Given we allow to type a node by context (== path), typing it by its | content alone becomes rare. Practically non existant in fact. I've written four major applications in the last three years, all of them using YAML. None of them use implicits, nor the distinction between "2" and 2. All of them have their own custom 'schema' where I validate the nodes's content, cast it to the native type I want, and often add missing/implicit values. Funny thing is, I was the one arguing _for_ plain scalar flags a few years back. Amazing how a bit of real-live experience will change your opinion. As I remember, Brian sees everyting as a Perl scalar, where the distinction between integers and strings are more how you use them than anything intrinsic. | This makes it possible to reconsider the usefulness of distinguishing | between "23" and 23. Take the following example Nice examples. If you have counter examples, in a real-live context where you can't live without a plain scalar flag, it'd be great to see them. | My additional observation (leading to proposal #4) is that if '?str' and | '?var' are unified as above, it is no longer useful to have three | different unspecified tags. The main sense of having '?' tags was | distinguishing between '?str' and '?var'. After all, the parser reports | the kind of the node _anyway_ (it has to), so it is quite sufficient to | have just one "I have no tag" indication. Ok. I concede that ?tags arn't needed in this case, a NULL tag does work if you only have 3 values. So, this makes one change, the tag in the graph representation model is optional. Ok. (see below before you start commenting Oren) | - If a node has no tag, the parser reports it has no tag. Right. | - Each node's style, indentation, comments, etc. MAY be discarded by the | parser. Or it might not. However... It is presentation detail; parsers should be encouraged to report this extra detail in a 'presentation' shadow or such thing; and an emitter should take such a presentation shadow. But in reality, round-tripping is over-rated. What we need are really good formatters, that configure an presenter based on the content of an incoming data set. We have lots of good presentation styles, it'd be nice if programmers had a clean configuration file format to use them. In other words, a very simple pretty-printer 'hint' file would look like: --- default: scalar: plain indent: 3 margin: 65 styles: some/code: block almost/always/illegal/charaters: quoted some/paragraph: fold ... Just beacuse style is not in the representation model, does not mean its removed from the presentation model... | (oren goes on about constraints and stuff) It's simple: - The YAML Processor reports what's in the Graph Representation Model, optionally, with stuff from the Presentation Model in a clearly distinguished way. - The Application does what ever it wishes; if it is a Good, YAML Compliant program, it ignores the Presentation and Serialization stuff that isn't in the Graph Representation. - It can load the data into Native objects, process records in a streaming manner using a sequential API, or what ever it wishes. That's it. | - The '!' and '!!' tag-specifiers are now valid. There's no longer a | restriction that the "suffix" of a %TAG be non-empty. | | - !!str, !!map and !!seq move out of the spec. | | - No pesky issues about why style bleeds into the info | model. It doesn't. | | - No pesky issues about how %TAG relates to a simple ! and !!. | There are treated as usual. | | - No pesky issues about transformations. There aren't any. Or rather, they are all in the Application's space, after the Graph Representation Model has been delivered to the Application. | What did we lose (compared to #3++)? | | - The ability to use quoting as a way of enforcing a scalar to be | interpreted as a string in a mixed-list. Which NEVER happens in any | config file or data serialization file I can think of (off the top of | my head). I guess there's some guy in China that actually uses | mixed-list this way... fine, he'll have to use !!str. Frankly, this 'feature' can actually be confusing and reduce human-friendly behavior. While an experienced programmer may want to have a difference between "23" and 23, your average user doesn't. One of my four applications is a timesheet system (still in use) where people at my workplace fill out timecards using YAML and email them every week. The hardest thing to explain: - this is a string - 23 # is an integer - "32" # is not an integer, its a string - 23.0 # is also not an integer, it is a float - 2004 # is an integer, not a date - 34 hours # isn't an integer? In the end, I made everything strings and wrote own a custom validator. In every case I used the position of the node in the document to determine what I was expecting. This implict stuff didn't help at all, and the 23 vs "23" actually caused problems. | All in all, I think this is more than a fair trade. We trade all the | downsides that were so deeply discussed with just one small one that is | never (well, hardly ever :-) seen in practice. Absolutely. Now, on to your other items that I moved... | - The native data type for nodes with a tag is determined by the tag and | only by the tag. This is a restriction on "what is a YAML schema". An | application may cheat, but then it wouldn't be using a YAML schema. | | - The native data type for nodes without a tag is determined by the | node's context (path to the node) and its content, and only by the | context and content. Again, this is a restriction on "what is a YAML | schema". | | It is expected that the vast majority of the schemas will use only the | context ( { x : ... => float, y : ... => float, label : ... => string). | "Mixed" data structures are very rare. It is OK to use "content" in | such cases (e.g., a list of shapes where { x : ..., y : ..., r : ... } | => circle, { x : ..., y : ..., w : ..., h : ... } => rectangle). | | - Every native data type must have a tag associated with it. Another | restriction on "what is a YAML schema". | | - Hence, there MUST be a (possibly implicit) step that (possibly | implicitly) provides a tag for each node having no tag. Lets call this | step "(automatic) tag completion". These steps are good if the goal is to create a 'faithful', almost-but-not-quite 1-1 correspondence between the input YAML and a Native Binding. This seems to be a nice foundation for a simple schema language. However, for the average user, this stuff isn't interesting. For example, the 'validation' stage of my programs (a timesheet program for employees) adds nodes, fixes-up content "12 and a half hours" becomes 12.5, etc. When a version change happens I even... god forbid... restructure the graph. In all four programs I've written, the limited process you are talking about above, is, well, too limited. | Why? because automatically computing a value that wasn't explicitly | specified is completely different from taking a value that _was_ | explicitly given and _replacing_ it by another value. | | Sure, some people might say it is still equivalent in some deep | philosophical level - let them, I think the above is good enough an | answer and if they don't buy it, well, tough, I don't have another | answer for them :-) The only thing that is important for interoperability is that applicaton's respect the information model (Graph Representation) and don't use Presentation or Serialization attributes to drive processing. That's all. Most applications will mix filling in the tags with validation or what not. All of this detail undermines this fundamental point.. of not mixing. I don't care if they want to load into a Native structure, or if they want to use cave drawings, or if they think that all of their data is to be transformed to web pages. Your insistence on keeping this 1/2 baked 'resolution' phase is counter-productive. Luckly, this proposal you've put forth doesn't depend on it. ;) Cheers, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: David H. <dav...@bl...> - 2004-09-11 01:31:32
|
Suggestion: in addition to #4, let's have a really easy way to specify that a scalar is a string (which was after all the main purpose of distinguishing quoted from plain scalars). Allow '$' as a tag-specifier that always means "tag:yaml.org,2002:str". This is an incompatible change because it implies that '$' is no longer allowed as the first character of a plain scalar. Since we are breaking compatibility anyway with YAML 1.1, now is the time to do things like this. > --- !mixed-list > - 1 # An int > - !!float 2 # Looks like an int... > - 2.0 # OK, a float > - 2.5 # A float > - Hi # A string - $ 8.5 # A string (If the value was intended to be a monetary amount in dollars, you don't want it to be implicitly typed as a float anyway, and you can always write '$...'.) -- David Hopwood <dav...@bl...> |
From: Tim H. <tim...@co...> - 2004-09-11 01:33:54
|
Oren Ben-Kiki wrote: [Chop] >My additional observation (leading to proposal #4) is that if '?str' and >'?var' are unified as above, it is no longer useful to have three >different unspecified tags. The main sense of having '?' tags was >distinguishing between '?str' and '?var'. After all, the parser reports >the kind of the node _anyway_ (it has to), so it is quite sufficient to >have just one "I have no tag" indication. > >What did we get? > >- If a node has no tag, the parser reports it has no tag. > >- Each node's style, indentation, comments, etc. MAY be discarded by the >parser. Or it might not. However... > >- The native data type for nodes with a tag is determined by the tag and >only by the tag. This is a restriction on "what is a YAML schema". An >application may cheat, but then it wouldn't be using a YAML schema. > > Who can argue with that? >- The native data type for nodes without a tag is determined by the >node's context (path to the node) and its content, and only by the >context and content. Again, this is a restriction on "what is a YAML >schema". > > I'm still of the opinion (whee, an opinion a whole 5 hours old) that *if* the ?str/?var distinction goes away the typing based matching regex's to values should be tossed. The usefullness goes down and the weirdness goes up ("2.0"->float?!) to the point that I think we'd be better off without. This get's even worse when you consider !null and !bool values, although it'd be no great loss to not implicitly type on boolean values. >It is expected that the vast majority of the schemas will use only the >context ( { x : ... => float, y : ... => float, label : ... => string). >"Mixed" data structures are very rare. It is OK to use "content" in >such cases (e.g., a list of shapes where { x : ..., y : ..., r : ... } >=> circle, { x : ..., y : ..., w : ..., h : ... } => rectangle). > > OK. >- Every native data type must have a tag associated with it. Another >restriction on "what is a YAML schema". > >- Hence, there MUST be a (possibly implicit) step that (possibly >implicitly) provides a tag for each node having no tag. Lets call this >step "(automatic) tag completion". > >- The '!' and '!!' tag-specifiers are now valid. There's no longer a >restriction that the "suffix" of a %TAG be non-empty. > >- !!str, !!map and !!seq move out of the spec. > >What did we gain (compared to #3++)? > >- No special cases. Hence: > >- No pesky issues about why style bleeds into the info model. It >doesn't. > >- No pesky issues about how %TAG relates to a simple ! and !!. Ther are >treated as usual. > >- No pesky issues about transformations. There aren't any. > >Why? because automatically computing a value that wasn't explicitly >specified is completely different from taking a value that _was_ >explicitly given and _replacing_ it by another value. > >Sure, some people might say it is still equivalent in some deep >philosophical level - let them, I think the above is good enough an >answer and if they don't buy it, well, tough, I don't have another >answer for them :-) > > All attractive stuff. >What did we lose (compared to #3++)? > >- The ability to use quoting as a way of enforcing a scalar to be >interpreted as a string in a mixed-list. Which NEVER happens in any >config file or data serialization file I can think of (off the top of >my head). I guess there's some guy in China that actually uses >mixed-list this way... fine, he'll have to use !!str. > > I use this. In fact I used it today. It's not hard to avoid though, since I have an informal schema that I can apply to force stuff to string. I was just being lazy so the schema is living in the implicit typing. Part of the great thing about YAML though is that it enables laziness. Or perhaps I should say "makes things easier"?. However, if we get some easy way to specify schemas one of these days this would be no big loss. >All in all, I think this is more than a fair trade. We trade all the >downsides that were so deeply discussed with just one small one that is >never (well, hardly ever :-) seen in practice. > >Thoughts? > > One issue that just occured to me is how will you treat !null values (not Null tags). It's a fairly common requirement to accept string or Null. But how are you going to spell Null in this case. A schema doesn't help much, since "~" could be either a string or Null. There are a couple different workaround, if implicit typing is ditched completely, then Null gets spelled "!null ~" which seems a little odd. It would be a bit nicer if you could spell it as just "!null", with an empty value. On the other hand, with implicit typing it would be necessary to spell "~", "Null", "NULL" and "null" as "!str ~", etc. Not the end of the world, but it's more confusing to have to special case certain strings than to have to special case Null. The upshot of that is: with ?str/?var you can avoid tags for most common cases, with or without a schema. Without them, you cannnot avoid tags for most common cases, even with a schema. And without ?str/?var, but with content typing is in some respects the worst of both worlds. -tim |
From: Clark C. E. <cc...@cl...> - 2004-09-11 03:51:52
|
On Fri, Sep 10, 2004 at 06:33:47PM -0700, Tim Hochberg wrote: | All attractive stuff. | | >What did we lose (compared to #3++)? | > | >- The ability to use quoting as a way of enforcing a scalar to be | >interpreted as a string in a mixed-list. Which NEVER happens in any | >config file or data serialization file I can think of (off the top of | >my head). I guess there's some guy in China that actually uses | >mixed-list this way... fine, he'll have to use !!str. | | I use this. In fact I used it today. It's not hard to avoid though, | since I have an informal schema that I can apply to force stuff to | string. I was just being lazy so the schema is living in the implicit | typing. Part of the great thing about YAML though is that it enables | laziness. Or perhaps I should say "makes things easier"?. However, if we | get some easy way to specify schemas one of these days this would be no | big loss. Your application can still use implicit typing, it just is, when your regex says a node is a number, and you want it to be a string, you need to explicitly type the node. Luckly, the ! tag is freed by this proposal, so one could use this private tag to indicate a 'string' value (to turn off the escaping). This actually makes sense as well, since the ! actually means to tag something. So, this isn't really a work around at all. It is actually just good pratice. | One issue that just occured to me is how will you treat !null values | (not Null tags). It's a fairly common requirement to accept string or | Null. But how are you going to spell Null in this case. A schema doesn't | help much, since "~" could be either a string or Null. There are a | couple different workaround, if implicit typing is ditched completely, | then Null gets spelled "!null ~" which seems a little odd. It would be a | bit nicer if you could spell it as just "!null", with an empty value. Would this work? - !!null Or your 'default' implicit typer could continue to use ~ and if you want a '~' string, use ! ~ | On the other hand, with implicit typing it would be necessary to spell | "~", "Null", "NULL" and "null" as "!str ~", etc. Not the end of the | world, but it's more confusing to have to special case certain strings | than to have to special case Null. Hmm. | The upshot of that is: with ?str/?var you can avoid tags for most common | cases, with or without a schema. Without them, you cannnot avoid tags | for most common cases, even with a schema. And without ?str/?var, but | with content typing is in some respects the worst of both worlds. Yes, but with the ! private tag, which your application is free to treat as a string... the difference is just in syntax. - "23" # proposal 3 - ! 23 # proposal 4 The primary advantage of proposal #4 is that it decouples indicating that something is a string from the style used. This way both can do their job independently without colliding. Kind Regards, Clark |
From: Oren Ben-K. <or...@be...> - 2004-09-11 04:31:02
|
On Saturday 11 September 2004 04:33, Tim Hochberg wrote: > I'm still of the opinion (whee, an opinion a whole 5 hours old) that > *if* the ?str/?var distinction goes away the typing based matching > regex's to values should be tossed. Fine. Never use it in your schemas. Other people (like Onoma) really want this. Also, the principle is "by content" - you want to be able to implicitly type '{ x : ..., y : ... }' as a point and '{ r : ..., i : ... }' as a complex number. > One issue that just occured to me is how will you treat !null values > (not Null tags). It's a fairly common requirement to accept string or > Null. So? Your schema says "/foo/bar/label" is implicitly typed as a string unless it is the single character '~', in which case it is implicitly typed as a null. This is the sort of regexp-based typing that you said isn't useful :-) > ... with implicit typing it would be necessary to > spell "~", "Null", "NULL" and "null" as "!str ~", etc. Then use the "!" trick: --- foo: bar: label: ~ # null --- foo: bar: label: ! ~ # string --- foo: bar: label: text # string ... > Not the end of > the world, but it's more confusing to have to special case certain > strings than to have to special case Null. Well... your user have to have _some_ way of saying "this is a null". It is your choice whether he has to write '!!null' for a null or '! ~' for a string-which-is-~. > The upshot of that is: with ?str/?var you can avoid tags for most > common cases, with or without a schema. Without them, you cannnot > avoid tags for most common cases, even with a schema. The question hinges on whether this is "most" cases or "rare" cases. And, whether the "!" trick is good enough for these cases. Have fun, Oren Ben-Kiki |
From: Tim H. <tim...@co...> - 2004-09-11 19:51:38
|
Oren Ben-Kiki wrote: >On Saturday 11 September 2004 04:33, Tim Hochberg wrote: > > >>I'm still of the opinion (whee, an opinion a whole 5 hours old) that >>*if* the ?str/?var distinction goes away the typing based matching >>regex's to values should be tossed. >> >> > >Fine. Never use it in your schemas. Other people (like Onoma) really >want this. Also, the principle is "by content" - you want to be able to >implicitly type '{ x : ..., y : ... }' as a point and '{ r : ..., >i : ... }' as a complex number. > > OK. Somehow the above seems different, less hacky, than typing off of a regex of a scalar. It's probably simpler/cleaner to allow any content to be used than to try to differentiate structural content like the above from text content. So consider me satisfied with the context and content rule. >>One issue that just occured to me is how will you treat !null values >>(not Null tags). It's a fairly common requirement to accept string or >>Null. >> >> > >So? Your schema says "/foo/bar/label" is implicitly typed as a string >unless it is the single character '~', in which case it is implicitly >typed as a null. This is the sort of regexp-based typing that you said >isn't useful :-) > > Yeah and I find the above rule fairly unpleasant. >>... with implicit typing it would be necessary to >>spell "~", "Null", "NULL" and "null" as "!str ~", etc. >> >> > >Then use the "!" trick: > > --- > foo: > bar: > label: ~ # null > --- > foo: > bar: > label: ! ~ # string > --- > foo: > bar: > label: text # string > ... > > Also unpleasant. Null is what wants to be marked as different, not the string '~'. >>Not the end of >>the world, but it's more confusing to have to special case certain >>strings than to have to special case Null. >> >> > >Well... your user have to have _some_ way of saying "this is a null". It >is your choice whether he has to write '!!null' for a null or '! ~' for >a string-which-is-~. > > Right. And since in theory I control the schema I can enfore my enforce my possibly warped sense of aethetics on the world, or at least my users, and make them write !!null. This seems like it'll work. >>The upshot of that is: with ?str/?var you can avoid tags for most >>common cases, with or without a schema. Without them, you cannnot >>avoid tags for most common cases, even with a schema. >> >> > >The question hinges on whether this is "most" cases or "rare" cases. >And, whether the "!" trick is good enough for these cases. > > <ponder> I think I've lost track of the state of things because something isn't making sense. Wasn't the point of take #4 that untagged values have NULL tags. And, isn't ! the explicit null tag? So isn't the ! tag now a no-op? <dig/scrutinize> No, by default the ! tag just cooks to itself ('!'), while NULL tag is presumably some special beast. So: --- - "This has no tag" # Tag == NULL - ! "This has the an empty tag" Tag == ! So presumably a schema can do different things for these two tags, including always treating NULL tagged scalars as strings, and implicitly typing on ! tagged scalars. The exact opposite of what happens by default. This makes the Null|String issue I described above much prettier, IMO. It doesn't matter to me if you like it, since assuming my interpretation above is correct, I can do it locally without stepping on anyone else's toes. --- # Schema says that these are all String|Float depening on the tag (NULL->string, !->implicit, which must be null). - ~ # string - Null # string - ! ~ # Null, - ! Null # Null Is that right? If so, that's cool. -tim |
From: David H. <dav...@bl...> - 2004-09-11 20:56:12
|
Tim Hochberg wrote: > Oren Ben-Kiki wrote: >> On Saturday 11 September 2004 04:33, Tim Hochberg wrote: >>> One issue that just occured to me is how will you treat !null values >>> (not Null tags). It's a fairly common requirement to accept string or >>> Null. [...] >> --- >> foo: >> bar: >> label: ~ # null >> --- >> foo: >> bar: >> label: ! ~ # string >> --- >> foo: >> bar: >> label: text # string >> ... > > Also unpleasant. Null is what wants to be marked as different, not the > string '~'. How about getting rid of ~, always writing a null node as !!null (assuming the default prefix for !!), and having its content be the empty string? Then you would never have to implicitly type nulls. -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-11 21:28:14
|
On Sat, Sep 11, 2004 at 09:56:10PM +0100, David Hopwood wrote: | Tim Hochberg wrote: | > Also unpleasant. Null is what wants to be marked as | > different, not the | > string '~'. | | How about getting rid of ~, always writing a null node as !!null (assuming | the default prefix for !!), and having its content be the empty string? | Then you would never have to implicitly type nulls. Null values occur quite a bit in database dumps, etc. With the last post (default tags for ! and !!), we'd have... ! # tag:yaml.org,2002:str !! # tag:yaml.org,2002:null This makes a very clean notation for two very common items, the empty string '' and the null value. It is a bit of magic to wrap up into the !tag proposal, but at least the short-hands are all explicit and part of the same mechanism. Thoughts? Clark |
From: Damian C. <dam...@gm...> - 2004-09-13 13:30:28
|
On Sat, 11 Sep 2004 17:28:09 -0400, Clark C. Evans <cc...@cl...> wrote: > Null values occur quite a bit in database dumps, etc. With the last > post (default tags for ! and !!), we'd have... > > ! # tag:yaml.org,2002:str > !! # tag:yaml.org,2002:null > > This makes a very clean notation for two very common items, the > empty string '' and the null value. The syntax allows for a plain value to be of zero length? I guess that means if I have something daft like --- zum: - '' - ... it will end up with an empty value tagged ?str (if we assume quotes imply stringiness) and another empty value tagged ?dwim, which (if we are in the mood for implicit types) we could decide means a null, thus requiring no special tags. Thus we get {'zum': ['', None]}. This is kind of nasty for the parser though; it has to know when '-' on a line on its own is a null and when it introduces a multi-line value: --- zum: - biff: pow - ... And I would have to go through the grammar in detail to convince myself there are not ambiguous cases... I don't much like having ! and !! as special nameless tags. There is a risk that people will accidentally attach them to the following value, causing the appearance of a private tag. Having both ! and !! as special keywords also raises the question of whether people will stutter on the keyboard and ! when they mean to !! or vice versa. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |
From: Clark C. E. <cc...@cl...> - 2004-09-13 14:15:13
|
(top posting) Enough people have stated that they really want "23" and 23 to be reported differently (aka keeping the wart) that this entire thread is rendered moot, I think. On Mon, Sep 13, 2004 at 02:30:24PM +0100, Damian Cugley wrote: | The syntax allows for a plain value to be of zero length? Sure | --- | zum: | - '' | - | ... In the context of the thread (no warts) these would be reported the same. In the alternative I prefer, this document is equivalent to, --- zum: - "" - ! "" ... | it will end up with an empty value tagged ?str (if we assume quotes | imply stringiness) and another empty value tagged ?dwim, which (if we | are in the mood for implicit types) we could decide means a null, thus | requiring no special tags. Thus we get {'zum': ['', None]}. Sure. If you wish to interpret ! via regex and transform it into various other types, then yes. | This is kind of nasty for the parser though; it has to know when '-' | on a line on its own is a null and when it introduces a multi-line value It isn't nasty. All plain scalars would be reported differently, including multi-line ones. There isn't an issue. | I don't much like having ! and !! as special nameless tags. There is a | risk that people will accidentally attach them to the following value, | causing the appearance of a private tag. Having both ! and !! as | special keywords also raises the question of whether people will | stutter on the keyboard and ! when they mean to !! or vice versa. We strictly don't need !! but in any proposal, warts or no warts, the ! is needed as a toggle. In the no-wart case, the ! explicitly marks a node as a string. In the wartish variety, the ! would mark any scalar as ! (which happens to be, the same tag reported when a plain scalar lacks a tag). Or, in the ?str ?var variety, if a ?var node was serialized, and it contained a ascii bell, I'd be written as ! "\b" with special magic to unpack ! "" as a ?var. In any case, there is no clear winner, it's a matter of how you want your wart, over easy or scrambled. Clark |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-11 02:15:33
|
IMPORTANT! I _now_ know exactly why there is such disagreement about this. Clark's "observation" and Brian's examples have made it abundantly clear. The problem is simply this: _Statically vs. Dynamically Typed Language_. It occurred to me right-off the bat when you said: > - Given we allow to type a node by context (== path), typing it by its > content alone becomes rare. Practically non existant in fact. Because for me, using Ruby, that's completely false!!!! I put two and two together with what Oren had said earlier about how the code would work for typing. I had give some Ruby examples and he responded "No, it's more like this..." and rattled off some C code. So now it makes much more sense. You see, I use YAML for config files for everything. When they get loaded into my programs they don't go through any static based assignments. Its all dynamic! So style is very very important. > This makes it possible to reconsider the usefulness of distinguishing > between "23" and 23. Take the following example: So you see, this distinguishment, for me, is vital. Otherwise I would have to go very far out of my way, and do non-Ruby like things. > The schema (== application == intent of the author == etc.) says that > 'x' and 'y' are floats, a label is a string, and point-size-in-pixels > is an integer. Note this solves the "looks like an int/float problem". > The schema says that 'x' is always a float, so it doesn't care if it > matches the int regexp (and likewise for the label). > > *This is the typical case*. For me, it is not the typical case. > Sure, there may be a case where someone will write a mixed list where > entries may be ints, floats or strings, without any way of knowing > which is which without looking inside each entry - but this is SO rare: All the time! Almost every time! > --- !mixed-list > - 1 # An int > - !!float 2 # Looks like an int... > - 2.0 # OK, a float > - 2.5 # A float > - Hi # A string > - "8.5" # Looks like a float, > # but quoting saves the day > ... > > With this in mind, suppose we remove all distinction between plain and > non-plain scalars. That's certainly as consistent as it gets. What are > the implications? That's why I can't do it. If this were to happened I would really just have to stop using YAML. It just wouldn't jive anymore. > [snip] > What did we get? > > - If a node has no tag, the parser reports it has no tag. > > - Each node's style, indentation, comments, etc. MAY be discarded by the > parser. Or it might not. However... > > - The native data type for nodes with a tag is determined by the tag and > only by the tag. This is a restriction on "what is a YAML schema". An > application may cheat, but then it wouldn't be using a YAML schema. > > - The native data type for nodes without a tag is determined by the > node's context (path to the node) and its content, and only by the > context and content. Again, this is a restriction on "what is a YAML > schema". > > It is expected that the vast majority of the schemas will use only the > context ( { x : ... => float, y : ... => float, label : ... => string). > "Mixed" data structures are very rare. It is OK to use "content" in > such cases (e.g., a list of shapes where { x : ..., y : ..., r : ... } > => circle, { x : ..., y : ..., w : ..., h : ... } => rectangle). > > - Every native data type must have a tag associated with it. Another > restriction on "what is a YAML schema". > > - Hence, there MUST be a (possibly implicit) step that (possibly > implicitly) provides a tag for each node having no tag. Lets call this > step "(automatic) tag completion". > > - The '!' and '!!' tag-specifiers are now valid. There's no longer a > restriction that the "suffix" of a %TAG be non-empty. > > - !!str, !!map and !!seq move out of the spec. All of these points derive from my proposal as well --the only difference being the addition of the style factor. _You_ can just ignore the style if you don't want to use it. But _I_ can't use it if it's not available. Don't pertty print _my_ documents, if you have a problem with them (albeit 99.9% of the time it won't even matter). > What did we gain (compared to #3++)? > > - No special cases. Hence: > > - No pesky issues about why style bleeds into the info model. It > doesn't. > > - No pesky issues about how %TAG relates to a simple ! and !!. Ther are > treated as usual. Ditto. > - No pesky issues about transformations. There aren't any. > > Why? because automatically computing a value that wasn't explicitly > specified is completely different from taking a value that _was_ > explicitly given and _replacing_ it by another value. The same mechanics will be used. So it may be distingusihable but quite practically they happen concurrently. But okay sure. It works both ways. > Sure, some people might say it is still equivalent in some deep > philosophical level - let them, I think the above is good enough an > answer and if they don't buy it, well, tough, I don't have another > answer for them :-) I do. And I have conved thie too many times now. I now understand why you feel it is unimportant. What you did not see is why it is important to others. I hope the above observations about dynamic vs static make that understandable now. > What did we lose (compared to #3++)? > > - The ability to use quoting as a way of enforcing a scalar to be > interpreted as a string in a mixed-list. Which NEVER happens in any > config file or data serialization file I can think of (off the top of > my head). I guess there's some guy in China that actually uses > mixed-list this way... fine, he'll have to use !!str. That guy lives in Florida currently ;) I wonder if anyone in China knows about YAML. > All in all, I think this is more than a fair trade. We trade all the > downsides that were so deeply discussed with just one small one that is > never (well, hardly ever :-) seen in practice. Its a trade off that would cause many to simply stop using YAML. Your perespctive is limited to your needs. Please think about it. Thanks, T. -- ( o _ // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. -Mark Twain |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-11 02:35:27
|
On Friday 10 September 2004 10:15 pm, trans. (T. Onoma) wrote: > Clark's > "observation" and Brian's examples have made it abundantly clear. oops! I meant Oren's examples. Sorry!!! -- ( o _ // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. -Mark Twain |
From: Clark C. E. <cc...@cl...> - 2004-09-11 03:15:58
|
On Fri, Sep 10, 2004 at 10:15:23PM -0400, trans. (T. Onoma) wrote: | _Statically vs. Dynamically Typed Language_. I fail to see how this is relevant, I code in both C and Python. | It occurred to me right-off the bat when you said: | | > - Given we allow to type a node by context (== path), typing it by its | > content alone becomes rare. Practically non existant in fact. | | Because for me, using Ruby, that's completely false!!!! Could you show some examples. Talking about concrete things helps. What this is referring to is how much "help" an external schema can be. Is it possible, for your config files to put expected typing information into a schema file? /timelog/*/hours: tag:yaml.org,2002:integer /expense/*/amount: tag:yaml.org,2002:currency /wild: '[-+]?(0|[1-9])[0-9,]*)': tag:yaml.org,2002:int 'true|false': tag:yaml.org,2002:bool I'm sure with all of this effort, we could come up with a nice mechanism like this in just a few weeks or so. A python implementation would be quite easy as well. | So now it makes much more sense. You see, I use YAML for config files for | everything. When they get loaded into my programs they don't go through any | static based assignments. Its all dynamic! So style is very very important. Besides string vs variant distinction within a string what other needs do styles help with? Steve talks about using > vs | to change data typing decisions... but I havn't the foggiest idea of an actual use case, the ones hinted at seem more dangerous than useful. ;) | > This makes it possible to reconsider the usefulness of distinguishing | > between "23" and 23. Take the following example: | | So you see, this distinguishment, for me, is vital. Otherwise I would | have to go very far out of my way, and do non-Ruby like things. Ok. In most cases, I have a simple verifier that goes though my file and converts strings that need to be integers into integers. It would not be too hard to convert this into a schema of sorts, like the one above. | > The schema (== application == intent of the author == etc.) says that | > 'x' and 'y' are floats, a label is a string, and point-size-in-pixels | > is an integer. Note this solves the "looks like an int/float problem". | > The schema says that 'x' is always a float, so it doesn't care if it | > matches the int regexp (and likewise for the label). | > | > *This is the typical case*. | | For me, it is not the typical case. Specifics would be very helpful. | > Sure, there may be a case where someone will write a mixed list where | > entries may be ints, floats or strings, without any way of knowing | > which is which without looking inside each entry - but this is SO rare: | | All the time! Almost every time! Ok. How about this process: (a) your application runs every scalar reported with NULL through a regex matcher (see above) that's similar to the one today (b) the '!' private type is a string (forced) So, where one would "quote" the value to make sure it is a string, you'd use ! instead. | > --- !mixed-list | > - 1 # An int | > - !!float 2 # Looks like an int... | > - 2.0 # OK, a float | > - 2.5 # A float | > - Hi # A string | > - "8.5" # Looks like a float, - ! 8.5 | All of these points derive from my proposal as well --the only difference | being the addition of the style factor. _You_ can just ignore the style if | you don't want to use it. That is not great logic. Let's try to solve the problem you have without using styles. Styles are really meant to be presentation level stuff, if you can use them to determine type, then it means pretty printers and editors have to be super-careful when they are used. | > - The ability to use quoting as a way of enforcing a scalar to be | > interpreted as a string in a mixed-list. Which NEVER happens in any | > config file or data serialization file I can think of (off the top of | > my head). I guess there's some guy in China that actually uses | > mixed-list this way... fine, he'll have to use !!str. | | That guy lives in Florida currently ;) I wonder if anyone in | China knows about YAML. With this proposal, tagging something as a string would only require a single ! and suprizingly, using the tag mechanism! - "23" - ! 23 And... the _same_ number of characters. Clear meaning, the "" is used when you want to escape, not for its side-effect of making the value a string. ;) | > All in all, I think this is more than a fair trade. We trade all the | > downsides that were so deeply discussed with just one small one that is | > never (well, hardly ever :-) seen in practice. | | Its a trade off that would cause many to simply stop using YAML. Your | perespctive is limited to your needs. Please think about it. Look, Oren and I are very firm about not putting style into the Representation Model. Just like we are very firm about not putting key order in the Representation Model. Same rationale. However, this doesn't mean that we don't want to make your life super-friendly. So, what are the other options? Clark |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-11 03:45:13
|
On Friday 10 September 2004 11:15 pm, Clark C. Evans wrote: > Could you show some examples. =A0Talking about concrete things helps. Mind you that this is a _very small_ example with mostly expected elements.= =20 But my system also takes plugins, of almost any variety --so I have no idea= =20 what they might consit of. Under your proposal I would have to add schemas= =20 for all these components. And plugin authors would have to too. Plus I'd ha= ve=20 to extend the plugin system to deal with schemas, all to type some integers= =20 and floats as needed. Ugh. (BTW- a couple of these are also getting emitted= =20 like crap b/c style is being forgotten.) =2D-- &id001 !!model meta: !!meta title: Fireworks tables: - &id002 !!table parent: *id001 name: body tags: [] tag: data: {} style: {} scale: &id003 - 100% form: false sections: - &id004 !!section parent: *id002 name: '' tags: [] tag: data: {} style: {} scale: *id003 repeat: true repeat_length: 2 cells: - &id005 !!cell parent: *id004 name: '' tags: [] tag: data: {} style: {} x: 0 y: 0 col: 0 row: 0 colspan: 1 rowspan: 1 width: 100% alignment: left content: " entries " elements: - !!general parent: *id005 name: entries tags: [] tag: data: value: - "Testing " - "Testing Again... " style: font-weight: bold font-size: 9pt font-family: "sans-serif, helvetica" literal: entries size: 0 - !!section parent: *id002 name: '' tags: [] tag: data: {} style: {} scale: *id003 repeat: true repeat_length: 0 cells: [] - &id006 !!table parent: *id001 name: main tags: - table tag: table data: {} style: background: "#eeeeee" padding: 6px width: 800px scale: &id007 - "200" - auto - "200" form: false sections: - &id008 !!section parent: *id006 name: '' tags: [] tag: data: {} style: {} scale: *id007 repeat: false repeat_length: 0 cells: - &id009 !!cell parent: *id008 name: '' tags: [] tag: data: {} style: {} x: 0 y: 0 col: 0 row: 0 colspan: 1 rowspan: 1 width: "200" alignment: center content: " fimg " elements: - !!general parent: *id009 name: fimg tags: [] tag: data: value: "<img src=3D\"../../images/banana-right.jpg\">" style: {} literal: fimg size: 0 - &id010 !!cell parent: *id008 name: '' tags: [] tag: data: {} style: {} x: 7 y: 0 col: 1 row: 0 colspan: 1 rowspan: 1 width: alignment: left content: " title " elements: - !!general parent: *id010 name: title tags: [] tag: data: {} style: font-weight: bold font-size: 16pt font-family: "sans-serif, helvetica" literal: title size: 0 - &id011 !!cell parent: *id008 name: '' tags: [] tag: data: {} style: {} x: 25 y: 0 col: 2 row: 0 colspan: 1 rowspan: 1 width: "200" alignment: center content: " menu " elements: - !!general parent: *id011 name: menu tags: [] tag: data: value: "<a href=3D\"../happy/\"><b>TS Weblands</b></a><br= /> <a href=3D\"../masstransit/\">Masstransit</a><br/> <a href=3D\"../st-johns-alum/\">SJ2000</a><br/> <br> <a href=3D\"../../jupzeus/\"><b>JupiterZeus</b></a><br/>" style: font-size: 10pt background: yellow border: solid yellow 4px font-family: sans-serif literal: menu size: 0 - &id012 !!cell parent: *id008 name: '' tags: [] tag: data: {} style: {} x: 0 y: 2 col: 0 row: 1 colspan: 2 rowspan: 1 width: alignment: left content: " body " elements: - !!general parent: *id012 name: body tags: [] tag: data: {} style: {} literal: body size: 0 - &id013 !!cell parent: *id008 name: '' tags: [] tag: data: {} style: {} x: 25 y: 2 col: 2 row: 1 colspan: 1 rowspan: 1 width: "200" alignment: left content: " google " elements: - !!general parent: *id013 name: google tags: [] tag: data: {} style: {} literal: google size: 0 - &id014 !!cell parent: *id008 name: '' tags: [] tag: data: {} style: {} x: 0 y: 6 col: 0 row: 2 colspan: 3 rowspan: 1 width: alignment: left content: " copyright " elements: - !!general parent: *id014 name: copyright tags: [] tag: data: {} style: font-size: 6pt font-family: "sans-serif, helvetica" literal: copyright size: 0 =2D-=20 ( o _ // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. =2DMark Twain |
From: Clark C. E. <cc...@cl...> - 2004-09-11 04:21:30
|
On Fri, Sep 10, 2004 at 11:44:55PM -0400, trans. (T. Onoma) wrote: | On Friday 10 September 2004 11:15 pm, Clark C. Evans wrote: | > Could you show some examples. ?Talking about concrete things helps. | | Mind you that this is a _very small_ example with mostly expected elements. Thanks, specific comments: - this proposal does _not_ say you can't do implicit typing, in fact, you can continue to use the same rules you've always used; and - you could load the private type ! into a string, and then tag nodes that don't regex correctly I looked at the attached file, you have only 6 lines that would be affected, they all look like: | - "200" You'd change them to: | - ! 200 | But my system also takes plugins, of almost any variety --so I have no | idea what they might consit of. Under your proposal I would have to add | schemas for all these components. And plugin authors would have to too. | Plus I'd have to extend the plugin system to deal with schemas, all to | type some integers and floats as needed. Ugh. Well, this is no worse than today; only you'd get the %TAG mechanism and, you'd only "quote" things when you need to, and you're intent to ensure that 200 is a string, would be quite clear due to the ! tag. Cheers! Clark P.S. Migration is another issue, but, _why can assume that things that don't start with %YAML 1.1 are all %YAML 1.0 and this sort of policy could be built-in. |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-11 04:36:19
|
On Saturday 11 September 2004 12:21 am, Clark C. Evans wrote: > I looked at the attached file, you have only 6 lines that would > > be affected, they all look like: > | =A0 =A0 =A0 - "200" > > You'd change them to: > | =A0 =A0 =A0 - ! 200 Okay, but I wouldn't be the one to change them. They are mostly generated a= =20 program. It's up to the emitter to get them right. Which leads to a good=20 question: how will emitters handle these changes? Will an emitter understan= d=20 my schema? =2D-=20 ( o _ // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. =2DMark Twain |
From: Clark C. E. <cc...@cl...> - 2004-09-11 04:54:24
|
On Sat, Sep 11, 2004 at 12:36:13AM -0400, trans. (T. Onoma) wrote: | On Saturday 11 September 2004 12:21 am, Clark C. Evans wrote: | > I looked at the attached file, you have only 6 lines that would | > | > be affected, they all look like: | > | ? ? ? - "200" | > | > You'd change them to: | > | ? ? ? - ! 200 | | Okay, but I wouldn't be the one to change them. Of course not. This is why we will bump the version number. ;) | They are mostly generated a by program. It's up to the emitter to | get them right. Correct. And the emitter will probably need the same mapping of types to regex to make sure that it knows when something has to be explictly typed. If we formalise the standard one, it should be built-in. | Which leads to a good question: how will emitters handle these changes? Well, as far as versions go, if syck isn't pumping out a version number, it might have to start doing so (or it could use a comment) so that it can identify 1.0 vs 1.1 documents; as the behavior is different. | Will an emitter understand my schema? We'll have to follow-through on David's idea of a complete review of the YAML Tag repository, probably with _why's help, to make sure they reflect current practice. For this standard set of regular expressions, it can probably be 'built-in' as a parser/emitter option. In the long run, some way to 'register' a schema would work, and then your app could use which ever set of regex stuff it needes. Clark |
From: David H. <dav...@bl...> - 2004-09-11 18:28:39
|
trans. (T. Onoma) wrote: >>Sure, there may be a case where someone will write a mixed list where >>entries may be ints, floats or strings, without any way of knowing >>which is which without looking inside each entry - but this is SO rare: > > All the time! Almost every time! > >> --- !mixed-list >> - 1 # An int >> - !!float 2 # Looks like an int... >> - 2.0 # OK, a float >> - 2.5 # A float >> - Hi # A string >> - "8.5" # Looks like a float, >> # but quoting saves the day >> ... >> >>With this in mind, suppose we remove all distinction between plain and >>non-plain scalars. That's certainly as consistent as it gets. What are >>the implications? > > That's why I can't do it. If this were to happened I would really just have to > stop using YAML. It just wouldn't jive anymore. Would being able to use `8.5 in this example help? -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-11 02:53:40
|
On Sat, Sep 11, 2004 at 02:31:26AM +0100, David Hopwood wrote: | Suggestion: in addition to #4, let's have a really easy way to specify | that a scalar is a string (which was after all the main purpose of | distinguishing quoted from plain scalars). I totally agree with the idea. This was also the show-stopper question that Brian asked this afternoon on IRC. I answered !!str and the result was... Hung Jury. ;) ... Nothing stopping one from taking, --- - unspecified # NULL - 23 # NULL - ! 23 # ! - !!str xxx # tag:yaml.org,2002:str and automatically transforming via Application regex/rename rules to, --- - unspecified # tag:yaml.org,2002:str - 23 # tag:yaml.org,2002:int - ! 23 # tag:yaml.org,2002:str - !!str xxx # tag:yaml.org,2002:str ... Perhaps the %TAG format could sprout a default-specific, %TAG !prefix! URI default-specific %TAG !! tag:yaml.org:2002: str --- - !! a string # tag:yaml.org,2002:str - !!int 23 # tag:yaml.org,2002:int In any case, since the rule you're talking about is providing a 'tag', it should probably use ! which means 'tag'. ;) | Allow '$' as a tag-specifier that always means "tag:yaml.org,2002:str". Not a bad idea, let's try. | > --- !mixed-list | > - 1 # An int, via application regex | > - !!float 2 # Looks like an int... | > - 2.5 # A float, via application regex | > - Hi # A string, via application regex | - $ 8.5 # tag:yaml.org,2002:str via parser cooking - $34.40 # A currency, via application regex Ok. A typical use case for '$' is specifying accounting information in dollars. I actually have this in my timesheet data... in about 200+ timesheets, there are about 40 with expenses reported, and of them 2 that would be affected: amount: $ 15.69 amount: $ 293 The about half of the remaining expenses use $ followed immediately by a number, like $45.34 -- so, we already treat '# comment' and '#data' as different things, I don't see why '$ ' can't be made a special indicator. So, for these two items, I'd get a string value of '15.59' and '293', that's not horrible, as I know it is an 'amount' and further, the other half of these expense reports forgot the '$' sign ;) Looks good to me. It's actually similar to the single bar | in proposal #3 that turns other non-plain styles into ?var Other possibilities: - ` the backtick is currently a reserved character, no issues at all - @ this is also reserved, no issues at all - ^ is not reserved (similar to $) but is lighter weight Hmm. I think we're doing tagging... so why not just use ! or !! I think that's better than these options. Best, Clark |
From: David H. <dav...@bl...> - 2004-09-11 18:21:49
|
Clark C. Evans wrote: > On Sat, Sep 11, 2004 at 02:31:26AM +0100, David Hopwood wrote: > | Suggestion: in addition to #4, let's have a really easy way to specify > | that a scalar is a string (which was after all the main purpose of > | distinguishing quoted from plain scalars). > > I totally agree with the idea. This was also the show-stopper question > that Brian asked this afternoon on IRC. I answered !!str and the result > was... Hung Jury. ;) [...] > In any case, since the rule you're talking about is providing > a 'tag', it should probably use ! which means 'tag'. ;) > > | Allow '$' as a tag-specifier that always means "tag:yaml.org,2002:str". > > Not a bad idea, let's try. > > | > --- !mixed-list > | > - 1 # An int, via application regex > | > - !!float 2 # Looks like an int... > | > - 2.5 # A float, via application regex > | > - Hi # A string, via application regex > | - $ 8.5 # tag:yaml.org,2002:str via parser cooking > - $34.40 # A currency, via application regex > > Ok. A typical use case for '$' is specifying accounting information in > dollars. Yes, that's a problem. How about ` (backtick) instead: --- !mixed-list - 1 # An int, via application regex - !!float 2 # Looks like an int... - 2.5 # A float, via application regex - Hi # A string, via application regex - `8.5 # tag:yaml.org,2002:str via parser cooking - $34.40 # A currency, via application regex (Whitespace between ` and the value would be optional.) Of course style makes no difference, so you could also write "1", !!float "2", "2.5", "Hi", `"8.5", "$34.40", etc. > Other possibilities: > - ` the backtick is currently a reserved character, no issues at all > - @ this is also reserved, no issues at all > - ^ is not reserved (similar to $) but is lighter weight > > Hmm. I think we're doing tagging... so why not just use ! or !! > I think that's better than these options. I think ` looks better than !, and the fact that you can omit the whitespace helps. We need some better examples. -- David Hopwood <dav...@bl...> |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-11 04:03:10
|
On Friday 10 September 2004 11:15 pm, Clark C. Evans wrote: > However, this doesn't mean that we don't want to make your > life super-friendly. =A0So, what are the other options? Well, I just don't want to have to go out of my way to get floats and integ= ers=20 and dates and bools, etc. --basic types. The only way I see doing it with= =20 what your propsing is by being explicit (!!int, !!float) or by making a=20 schema. Which means more work for me. I see why you want to do it. Its a one variable simpler and it fulfills you= r=20 expectation of presentation vs. representation. I just worry simple configs files using numbers and such types, (plus highl= y=20 adaptable applications too) will get a cold shoulder. But I get the feeling you guys are set on it. So lets see what happens. I=20 don't want to force anyone's arm. If everyone else is for it then I'm willi= ng=20 to stand back and see how it pans out. If anyone else has problems with Oren's and Clark's proposal, then speak up= =20 now, b/c I'm done arguing. I said my piece and I have other fish to fry. Peace, T. =2D-=20 ( o _ // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. =2DMark Twain |
From: Clark C. E. <cc...@cl...> - 2004-09-11 04:31:47
|
On Sat, Sep 11, 2004 at 12:03:03AM -0400, trans. (T. Onoma) wrote: | On Friday 10 September 2004 11:15 pm, Clark C. Evans wrote: | > However, this doesn't mean that we don't want to make your | > life super-friendly. ?So, what are the other options? | | Well, I just don't want to have to go out of my way to get floats | and integers and dates and bools, etc. --basic types. The only way | I see doing it with what your propsing is by being explicit | (!!int, !!float) or by making a schema. Which means more work for me. I hope the last post was helpful. | I see why you want to do it. Its a one variable simpler and it fulfills | your expectation of presentation vs. representation. | I just worry simple configs files using numbers and such types, | (plus highly adaptable applications too) will get a cold shoulder. We will have to have a _separate_ specification describing a standard set of regular expressions that map to common YAML types in a way that gets the most coverage. For the exceptional cases we have !!tags and, as I pointed out, this sort of standard could specify that ! gets _transformed_ to tag:yaml.org,2002: | But I get the feeling you guys are set on it. So lets see what happens. | I don't want to force anyone's arm. If everyone else is for it then | I'm willing to stand back and see how it pans out. Look; at every turn we've spent lots of time considering the issues you have raised, from tagging, and now your dislike of certain warts. Your last example was helpful. Specific, concrete use cases are what we need to work through these issues. | If anyone else has problems with Oren's and Clark's proposal, then speak up | now, b/c I'm done arguing. I said my piece and I have other fish to fry. As Brian said, it'll be a while before implementations happen and specifications get updated. People come up with new ideas, etc. Cheers, Clark |
From: trans. (T. Onoma) <tra...@ru...> - 2004-09-11 04:25:22
|
On Friday 10 September 2004 11:15 pm, Clark C. Evans wrote: > However, this doesn't mean that we don't want to make your > life super-friendly. =A0So, what are the other options? That's what I like about you Clark. Your at least are willing to vie for th= e=20 alternatives :) So here's one: Along with proposal #4, you create the YAML Schema Repository, in which the= re=20 will be schema can be added, but there will also be a few core schemas that= =20 implementations are generally expected to make available. The schema=20 language, for now, can be just what you have demonstrated, basically TAG=20 directives using regexp (but it should be a valid YAML doc). It can be=20 improved over time. But lets just get something out there with YAML 1.1 tha= t=20 deals with this. The first three schema's are called something like 'raw', 'empty' and=20 'standard'. Raw does less then nothing, not even escaping (we can add a !!r= aw=20 type too), 'empty' does nothing except the escaping (this is default=20 behavior, so this is not strictly needed, but we should spec it anyway), an= d=20 'standard' which does the things we've come to expect from YAML 1.0-- (int= s,=20 floats, bools, etc). Then we add a %SCHEMA directive. Very simple: %SCHEMA standard --- - bla bla =20 How about it? T. P.S. Can I have my !!rel type? :) =2D-=20 ( o _ // trans. / \ tra...@ru... I don't give a damn for a man that can only spell a word one way. =2DMark Twain |
From: Clark C. E. <cc...@cl...> - 2004-09-11 04:46:37
|
On Sat, Sep 11, 2004 at 12:25:09AM -0400, trans. (T. Onoma) wrote: | Along with proposal #4, you create the YAML Schema Repository, in | which there will be schema can be added, but there will also be a | few core schemas that implementations are generally expected to | make available. The schema language, for now, can be just what you | have demonstrated, basically TAG directives using regexp (but it | should be a valid YAML doc). It can be improved over time. But lets | just get something out there with YAML 1.1 that deals with this. Sounds good. It will require some work, some patience will be required. | The first three schema's are called something like 'raw', 'empty' and | 'standard'. Raw does less then nothing, not even escaping (we can add | a !!raw type too), I'm going to forget you even mentioned 'raw' ;) | standard, which does the things we've come to expect from YAML 1.0 | (floats, bools, etc). *nod* | Then we add a %SCHEMA directive. Very simple: | | %SCHEMA standard | --- | - bla bla I can completely see how declaring this mechanism is different from giving your 'root' node a specific tag. But, would that suffice? (getting directives by Brian is, well, a long process) | How about it? Well, the set of regex matches to tag:yaml.org,2002: has turned a bit informal, and as David suggested, it would be nice to turn that back around into something concrete. | P.S. Can I have my !!rel type? :) How about proposing it once more in the form of a docbook file that can be added to the repository? For example, http://cvs.sourceforge.net/viewcvs.py/yaml/spec/int.dbk?view=markup Cheers, Clark |