From: Clark C. E. <cc...@cl...> - 2004-09-04 20:50:13
|
Ok. Here is a suggestion, call it #9. It incoporates several ideas floating around: - It uses the Python/Ruby style of name resolution, as suggested by T.Onoma and Why. That is, you check for a local (aka private) package first, next you check built-in packages, and failing that, an exception is raised. - It incorporates David's suggestion of limiting built-in types to only words (but allowing the '/'). This helps reduce the chance of collisions, you can be sure that resolution of built-in packages will always fail if you use names like "Perl::Package" or "com.company.JavaPackage', etc. - It also incorporates David's suggestion of using 'implicit-plain' and 'implicit-not-plain' tags to make implicits easier to grok; this happens to put some very nice makeup on a ugly wart. - It follow's T.Onoma's request that he be able to specify a private tag that is _not_ subject to default %TAG cooking. It make it possible to _expressly_ disable cooking no matter what %TAGs are present - It allows people to use YAML tags in most cases without problem, and, but, if they really want to be super-safe they would need to use explicit %TAG based typing. - It provides a model for Brian's notion that the Application is the final authority of what each node's tag is; that is, the proposal formalizes ambiguity. - It incorporates, for the first time, a rationalization of how implcit typing should be done; which is still poorly defined and explained in the specification. First, let me review/define the types of 'serialization' tags: - Global tags are those that are globally unique, traditionally, these have been URIs; that is, they start with a word followed by a colon and use only URI characters. Strictly speaking, Perl::Packages happen to match this production, so they could also be considered global even though they are not URIs. - Private tags are those that have meaning local only to a given processing environment. They are convient to use, but may conflict with other uses. Therefore, they should be used carefully but, in most 99% of cases, there just isn't a problem with collisions. - Magical tags are those which are explicitly provided, but happen to not be Global nor Private. It is not necessary that magic tags be used; as a combination of global or private tags would suffice for many purposes. - Missing tags are those that are not provided in the YAML syntax. These have been traditionally been called "implicit" tags, but please use "missing" instead, as it is far more clear. Then, we define a process, called 'Cooking', which is done by the parser and is purely a syntax-only operation on a Document's tags. The cooking process uses the %TAG directive to change magical tags into either Global tags, or Ambiguous tags (defined below). This is done without any application involvement and is completely defined by the YAML specification. - Ambiguous tags are Magical tags which do not become 'Global' during the cooking process. They are also Missing tags, with the following names (provided by the Cooking process): plain scalar -> !implicit-plain non-plain scalar -> !implicit-scalar mapping -> !implicit-mapping sequence -> !implicit-sequence Therefore, the result of the 'Cooking' process is a non-empty tag having either Global, Private, or Ambiguous tags. While it is not strictly necessary to give mappings and sequences non-empty tags, it is done for consistency. Then, we have another process, called 'Resolution' converts Ambiguous tags into either Global or Private tags. Unlike cooking, this is an application-directed process; probably carried out by the YAML Processor via given instructions. The information used by the resolution process is restricted to that provided in the YAML Representational Model. In particular, 'Resolution' should be viewed as a transformation of the YAML graph, the result of resolution _is_ a different YAML document, albeit one that will typically be directly related to the source document plus schema information. Note that 'Resolution' does not in any way affect Global nor Private tags. Thus, one can provide a private or global tag, and no matter how the resolution process is defined, it will be passed through unchanged. The last stage of processing, 'Recognition' usually happens during loading, where each node's tag is used to "find" an appropriate native data type and construct the appropriate binding. If a tag is not 'recognized' during this process, it is an error. states: { O: Orignal, C: Cooked, R: Resolved } category: { G: Global, P: Private, _: Missing, M: Magic, A: Ambigous, '*': Depends } In a more concreate form, --- # OCR After-Cooking - !http://yaml.org # GGG http://yaml.org - !Perl::Package # GGG Perl::Package - !!private # PPP private - # _A* implicit-plain - '' # _A* implict-scalar - !int # MA* int ... %TAG clarkevans.com,2004: #default namespace --- # OCR After-Cooking - !http://yaml.org # GGG http://yaml.org - !Perl::Package # GGG Perl::Package - !!private # PPP private - # _A* implicit-plain - '' # _A* implict-scalar - !int # MGG tag:clarkevans.com,2004:int ... %TAG clarkevans.com,2004: cce --- # OCR After-Cooking Resolve? - !http://yaml.org # GGG http://yaml.org No - !Perl::Package # GGG Perl::Package No - !!private # PPP private No - # _A* implicit-plain Yes - '' # _A* implict-scalar Yes - !cce^int # MGG tag:clarkevans.com,2004:int No - !int # MA* int Yes ... Basically, in this proposal, which we can call #9 if you wish, is much like #8, only that the default is not private; it is the process of: - check for private matches, if not, - check for any 'regex' based matches - use matches from tag:yaml.org,2004, namely !str, !map, !seq for implicit-s - raise an exception. So, it attempts to blend the 'implicit' mechanism with the !unambiguous tags. If people use !ambiuous tags... well, that's their choice; possibly enough rope so they can do cool things; or, perhaps enough rope to hang themselves, but, in any event, using ambiguous tags (implicit, or non-private non-global tags) _is_ recognized as a transofrmation of the YAML document and treated appropraitely. Cheers! Clark |
From: Oren Ben-K. <or...@be...> - 2004-09-04 21:44:51
|
On Saturday 04 September 2004 23:50, Clark C. Evans wrote: > Ok. Here is a suggestion, call it #9... I like the terminology, but I don't understand the proposal itself, exactly. Can you formalize it a bit (what syntax forms are interpreted as what category of tags)? How does the classification to global, private, and magic interact with %TAG? Have fun, Oren Ben-Kiki |
From: David H. <dav...@bl...> - 2004-09-04 23:54:39
|
Define #7a as follows: - start with #7 - change directives to per-stream instead of per-document as in #8, but allow %YAML at the start of each document for compatibility. - change the separator in %TAG directives to whitespace as in #8: tag-directive ::= "%TAG" WS taggingEntity ":" spec_first [ WS handle ] [ WS comment] - allow any non-space, non-control, ASCII character in a tag. - change the default-default %TAG to "tag:private.yaml.org,2002:" - add !unspecified-plain, !unspecified-scalar, !unspecified-mapping and !unspecified-sequence, with corresponding entries in the tag repository. These are referred to as "unspecified tags". - the YAML version stays at 1.0. Although the cut^paste mechanism is no longer part of the spec and %TAG has been added, there is no advantage in marking YAML documents conforming to the new spec, because they are already marked by the presence of %TAG or the absence of ^. There is nothing preventing a parser from continuing to accept cut^paste temporarily. Technically the addition of !unspecified-* might change how unspecified tags are reported, but that is an issue of API versioning rather than YAML versioning. - make some terminology changes to the spec as described below. Clark C. Evans wrote: > Ok. Here is a suggestion, call it #9. It incoporates several ideas > floating around: > > - It uses the Python/Ruby style of name resolution, as suggested > by T.Onoma and Why. That is, you check for a local (aka private) > package first, next you check built-in packages, and failing that, > an exception is raised. Complicated. Why is it needed? (Note that a particular YAML implementation may do something like this as part of *recognizing* types, but I don't think it's a good idea for resolution.) > - It incorporates David's suggestion of limiting built-in types to > only words (but allowing the '/'). This helps reduce the chance of > collisions, you can be sure that resolution of built-in packages > will always fail if you use names like "Perl::Package" or > "com.company.JavaPackage', etc. #7, #8 and #7a don't need this. > - It also incorporates David's suggestion of using 'implicit-plain' > and 'implicit-not-plain' tags to make implicits easier to grok; > this happens to put some very nice makeup on a ugly wart. #7a also uses this idea (with 4 separate tags as in #9). > - It follow's T.Onoma's request that he be able to specify a > private tag that is _not_ subject to default %TAG cooking. #7a allows this. > It make it possible to _expressly_ disable cooking no matter > what %TAGs are present I don't see why this is useful; it is not needed to satisfy Onoma's request. > - It allows people to use YAML tags in most cases without problem, > and, but, if they really want to be super-safe they would need > to use explicit %TAG based typing. So do #7, #8 and #7a. > - It provides a model for Brian's notion that the Application > is the final authority of what each node's tag is; that is, > the proposal formalizes ambiguity. To some extent, the application is the final authority on what each node's tag is for *any* possible tagging scheme, because a YAML implementation could allow the application to *change* the tags however it likes before recognition. > - It incorporates, for the first time, a rationalization of > how implicit typing should be done; which is still poorly > defined and explained in the specification. #7a does this to the same extent. > First, let me review/define the types of 'serialization' tags: > > - Global tags are those that are globally unique, traditionally, > these have been URIs; that is, they start with a word followed by a > colon and use only URI characters. Strictly speaking, Perl::Packages > happen to match this production, so they could also be considered > global even though they are not URIs. > > - Private tags are those that have meaning local only to a given > processing environment. They are convient to use, but may conflict > with other uses. Therefore, they should be used carefully but, in > most 99% of cases, there just isn't a problem with collisions. In #7a, private tags are just tags in the "tag:private.yaml.org,2002:" domain. Since that is what they are in the current spec, we get an additional API compatibility advantage for free. > - Magical tags are those which are explicitly provided, but happen > to not be Global nor Private. It is not necessary that magic > tags be used; as a combination of global or private tags would > suffice for many purposes. A string in a YAML document that specifies a tag is not a tag. <http://www.usc.edu/schools/annenberg/asc/projects/comm544/library/images/336.html> We should probably call it something like a tag-specifier. > - Missing tags are those that are not provided in the YAML > syntax. These have been traditionally been called "implicit" tags, > but please use "missing" instead, as it is far more clear. "Unspecified tag" sounds clearer to me, and fits in with "tag-specifier" if we use that. (If a node does not have a tag-specifier, the result of parsing is an unspecified tag.) > Then, we define a process, called 'Cooking', which is done by the parser > and is purely a syntax-only operation on a Document's tags. The cooking > process uses the %TAG directive to change magical tags into either > Global tags, or Ambiguous tags (defined below). This is done without > any application involvement and is completely defined by the YAML > specification. > > - Ambiguous tags are Magical tags which do not become 'Global' during > the cooking process. #7a has no ambiguous tags in this sense, and does not need them. > They are also Missing tags, with the following > names (provided by the Cooking process): > > plain scalar -> !implicit-plain > non-plain scalar -> !implicit-scalar > mapping -> !implicit-mapping > sequence -> !implicit-sequence #7a has these (named !unspecified-*), but apart from being treated differently by a resolver, they are just ordinary tags in the YAML domain. > Therefore, the result of the 'Cooking' process is a non-empty > tag having either Global, Private, or Ambiguous tags. While > it is not strictly necessary to give mappings and sequences > non-empty tags, it is done for consistency. > > Then, we have another process, called 'Resolution' converts Ambiguous > tags into either Global or Private tags. In #7a, resolution converts unspecified tags (only) into specified tags. An application is free to do other transformations on a graph that may contain unspecified tags, but only the process of converting unspecified -> specified is called resolution. Resolution is not required to specify *all* tags (for example, tags within a document extension that is not recognised by the application may remain unspecified). > Unlike cooking, this is an > application-directed process; probably carried out by the YAML Processor > via given instructions. The information used by the resolution process > is restricted to that provided in the YAML Representational Model. In > particular, 'Resolution' should be viewed as a transformation of the > YAML graph, the result of resolution _is_ a different YAML document, > albeit one that will typically be directly related to the source > document plus schema information. Note that 'Resolution' does not in > any way affect Global nor Private tags. Thus, one can provide a private > or global tag, and no matter how the resolution process is defined, it > will be passed through unchanged. > The last stage of processing, 'Recognition' usually happens during > loading, where each node's tag is used to "find" an appropriate native > data type and construct the appropriate binding. If a tag is not > 'recognized' during this process, it is an error. All this is the same between #7a and #9. -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-05 00:24:52
|
On Sun, Sep 05, 2004 at 12:54:29AM +0100, David Hopwood wrote: | - change the default-default %TAG to "tag:private.yaml.org,2002:" Hmm. | - add !unspecified-plain, !unspecified-scalar, !unspecified-mapping | and !unspecified-sequence, with corresponding entries in the tag | repository. These are referred to as "unspecified tags". I love unspecified. Wonderful word. | Clark C. Evans wrote: | >Ok. Here is a suggestion, call it #9. It incoporates several ideas | >floating around: | > | > - It uses the Python/Ruby style of name resolution, as suggested | > by T.Onoma and Why. That is, you check for a local (aka private) | > package first, next you check built-in packages, and failing that, | > an exception is raised. | | Complicated. Why is it needed? (Note that a particular YAML implementation | may do something like this as part of *recognizing* types, but I don't | think it's a good idea for resolution.) I'm viewing resolution as a 'transformation' of the YAML document, from one representation to another. This is quite unavoidable given the ambiguity in plain scalar implicit typing rules. We already have ambiguity; I've resigned myself to it. I just want to make it managable. Your notion of assigning 'unassigned' tags makes this very workable. Rather than fight the process, why not just structure it, and puts bounds around it so that a 'schema' or some other formal mechansim can be provided later that formalizes the operation. | > - It provides a model for Brian's notion that the Application | > is the final authority of what each node's tag is; that is, | > the proposal formalizes ambiguity. | | To some extent, the application is the final authority on what each node's | tag is for *any* possible tagging scheme, because a YAML implementation | could allow the application to *change* the tags however it likes before | recognition. This process of changing tags is what I'm calling 'resolution'. | >First, let me review/define the types of 'serialization' tags: | > | > - Global tags are those that are globally unique, traditionally, | > these have been URIs; that is, they start with a word followed by a | > colon and use only URI characters. Strictly speaking, Perl::Packages | > happen to match this production, so they could also be considered | > global even though they are not URIs. | > | > - Private tags are those that have meaning local only to a given | > processing environment. They are convient to use, but may conflict | > with other uses. Therefore, they should be used carefully but, in | > most 99% of cases, there just isn't a problem with collisions. | | In #7a, private tags are just tags in the "tag:private.yaml.org,2002:" | domain. Since that is what they are in the current spec, we get an | additional API compatibility advantage for free. I admit, this is nice. Hmm. | > - Magical tags are those which are explicitly provided, but happen | > to not be Global nor Private. It is not necessary that magic | > tags be used; as a combination of global or private tags would | > suffice for many purposes. | | A string in a YAML document that specifies a tag is not a tag. | We should probably call it something like a tag-specifier. Perfect. | > - Missing tags are those that are not provided in the YAML | > syntax. These have been traditionally been called "implicit" tags, | > but please use "missing" instead, as it is far more clear. | | "Unspecified tag" sounds clearer to me, and fits in with "tag-specifier" | if we use that. (If a node does not have a tag-specifier, the result of | parsing is an unspecified tag.) +1 | >Then, we define a process, called 'Cooking', which is done by the parser | >and is purely a syntax-only operation on a Document's tags. The cooking | >process uses the %TAG directive to change magical tags into either | >Global tags, or Ambiguous tags (defined below). This is done without | >any application involvement and is completely defined by the YAML | >specification. | > | > - Ambiguous tags are Magical tags which do not become 'Global' during | > the cooking process. | | #7a has no ambiguous tags in this sense, and does not need them. I'm willing to go with #7a, but the idea of ambiguous tags hel I'm not sure | | > They are also Missing tags, with the following | > names (provided by the Cooking process): | > | > plain scalar -> !implicit-plain | > non-plain scalar -> !implicit-scalar | > mapping -> !implicit-mapping | > sequence -> !implicit-sequence | | #7a has these (named !unspecified-*), but apart from being treated | differently by a resolver, they are just ordinary tags in the YAML domain. Well, T. and others have talked quite a bit about their schema changing tags. Really, this process is _just_ a generic transformation, in that way #9 is too limiting; one may even want to add nodes, etc. | > Therefore, the result of the 'Cooking' process is a non-empty | > tag having either Global, Private, or Ambiguous tags. While | > it is not strictly necessary to give mappings and sequences | > non-empty tags, it is done for consistency. | > | >Then, we have another process, called 'Resolution' converts Ambiguous | >tags into either Global or Private tags. | | In #7a, resolution converts unspecified tags (only) into specified tags. | | An application is free to do other transformations on a graph that | may contain unspecified tags, but only the process of converting | unspecified -> specified is called resolution. Resolution is not required | to specify *all* tags (for example, tags within a document extension that | is not recognised by the application may remain unspecified). Ok. So one could resonably define 'resolution' as just fixing-up unspecified tags. And then something else which converts private tags to other private tags is a transform. Thanks David. This is a big help having you throw up other alternatives. Cheers! Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Clark C. E. <cc...@cl...> - 2004-09-05 01:00:01
|
David, Three comments: - The key insight in your proposal is that missing tags can be modeled as a "syntax shorthand" for unspecified-*. This makes the whole definition of tag resolution quite beside the point. - Both your and my proposals are too complicated, we are trying to 'define' a stage after parsing and before 'tag recognition'. However, this stage is simple to describe -- it is a YAML Transform. An application is free to do _anything_ they wish as long as it follows the YAML Representation Model. In fact, they may choose to do _nothing_ and leave the !unspecified tags as they are and build a digital signature. Or, they may wish to do quite a bit in this transform before loading, adding default values, re-arranging the graph to upgrade from an older version, etc. - I don't like the idea of a default-default tag. I'd much rather say that tag-specifiers like !! are simply not subject to default tag globalization. Also, if I put !tag in my document, I expect it to tell me "tag" in my model, and it would be really unexpected for things to be cooked without me explicitly asking for it. Having to turn this off sucks. My proposal is better here *wink* - We should take most of what wrote and add it as a 'informational' appendix explaining a 'recommended' default transformation between the parse result and the graph that is eventually recognized and then loaded. Cheers! Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Oren Ben-K. <or...@be...> - 2004-09-05 05:32:18
|
David, thanks for merging #7 with the "1.1" features. I was going to do it but you beat me to the punch. Nicely done. On Sunday 05 September 2004 03:59, Clark C. Evans wrote: > - The key insight in your proposal is that missing tags can be > modeled as a "syntax shorthand" for unspecified-*. This makes > the whole definition of tag resolution quite beside the point. Clark, the whole extra-transformation step is, IMVHO, an unnecessary complication. Yes, it is nice saying that unspecified tags are treated as if written "!unspecified-<stuff>", and that "tag resolution" transforms these into some other tags. Call this "tag specification" instead of "resolution" and everything is just fine. But the way I see it this is a minor wording change in the spec (instead of talking about a wart-ish 'plain scalar' bit). Its a far cry from inventing a whole new way for the parser and the application to interact and doing generic application-driver transformations as a standard part of YAML processing! That's completely uncalled for - I didn't see any tangible advantage coming out of this added complication. I see plenty of potential for confusion, though. > - Both your and my proposals are too complicated, #9 sure is. #7 (#7a to incorporate the '1.1-ish' features) is as simple as it gets. One simple syntactical mechanism, period. > we are trying to > 'define' a stage after parsing and before 'tag recognition'. _You_ are. Needlessly (IMVHO). > - I don't like the idea of a default-default tag. I'd much rather say > that tag-specifiers like !! are simply not subject to default tag > globalization. The whole _point_ of #7 is that they _are_. So, in the context of #7, David's proposal of a default-default %TAG makes tons of sense, and as he points out, it increases backward compatibility again. David makes an excellent point that #7(a) doesn't require going to YAML 1.1 since it is so backward compatible with the current state of affairs. The only serious incompatibiolity is the new directive syntax, and the fact we stick them at the start of the document. However, as he pointed out, this is trivially auto-detected. Since %TAg is a new directive, We could simple deprecate the use of old-style %YAML:1.0 tags and be done. No need for migrating to %YAML 1.1! One more advantage to the #7 proposal. Its simply the simplest _and_ most most backward compatible choice on the table. And if the extra "!" is that bad, well, there's always #8. I really don't see the point of #9. > - We should take most of what wrote and add it as a 'informational' > appendix The need for such an appendix is a clear indication of an overly complex proposal. The spec should stand on its own. Continuing the polishing of #7, I suggest that the spec say that: - Tags starting with '!scheme:' _should_ be valid URIs under that scheme This would allow a parser to emit a warning if someone writes "!Bit::Vector" instead of "!!Bit::Vector". - Tags starting with "![^!]" _should_ be valid tags defined in the yaml.org repository. This "future-proofs" our spec. As long as we don't ever use, say, "!foo$bar" in the yaml.org repository, we'll be free to use this syntax in YAML1.1 to employ some yet-unforseen cooking mechanism. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-05 06:44:34
|
On Sun, Sep 05, 2004 at 08:32:12AM +0300, Oren Ben-Kiki wrote: | Clark, the whole extra-transformation step is, IMVHO, an unnecessary | complication. Yes, it is nice saying that unspecified tags are treated | as if written "!unspecified-<stuff>", and that "tag resolution" | transforms these into some other tags. The whole extra transformation step is what everyone else here is talking about. Its where they convert their implicit tags into ones that their environment can bind. It needs to be discussed in the proposal, regardless if the language makes it into the spec, that's a _different_ issue entirely. | But the way I see it this is a minor wording change in the spec | (instead of talking about a wart-ish 'plain scalar' bit). Actually, it impacts one of the diagrams to make it simpler. The notion of 'tag resolution' is just gone. This isn't a small wording change. | #9 sure is. #7 (#7a to incorporate the '1.1-ish' features) is as | simple as it gets. One simple syntactical mechanism, period. Agreed that #9 seemed complicated. I was not clear that the last half of the proposal was completely in "application" land and not in the YAML spec. I hope I made this clear in #9a. | > - I don't like the idea of a default-default tag. I'd much rather say | > that tag-specifiers like !! are simply not subject to default tag | > globalization. | | The whole _point_ of #7 is that they _are_. So, in the context of #7, | David's proposal of a default-default %TAG makes tons of sense, and | as he points out, it increases backward compatibility again. Having !sometag reported to my application as tag:private.yaml.org,2002:sometag is really hackish. I put in !sometag, I should get 'sometag'. | The need for such an appendix is a clear indication of an overly complex | proposal. The spec should stand on its own. Once again, it isn't clear from reading the spec how one would 'resolve' unspecified-mapping into a python 'dict'. Perhaps this isn't necessary. Best, Clark |
From: David H. <dav...@bl...> - 2004-09-05 18:14:38
|
Clark C. Evans wrote: > On Sun, Sep 05, 2004 at 08:32:12AM +0300, Oren Ben-Kiki wrote: > | Clark, the whole extra-transformation step is, IMVHO, an unnecessary > | complication. Yes, it is nice saying that unspecified tags are treated > | as if written "!unspecified-<stuff>", and that "tag resolution" > | transforms these into some other tags. > > The whole extra transformation step is what everyone else here is > talking about. !unspecified-<stuff> is independent of %TAG. It's almost a coincidence that we're discussing them at the same time. > | > - I don't like the idea of a default-default tag. I'd much rather say > | > that tag-specifiers like !! are simply not subject to default tag > | > globalization. > | > | The whole _point_ of #7 is that they _are_. So, in the context of #7, > | David's proposal of a default-default %TAG makes tons of sense, and > | as he points out, it increases backward compatibility again. > > Having !sometag reported to my application as > tag:private.yaml.org,2002:sometag is really hackish. I put in !sometag, > I should get 'sometag'. There is a way to have "!!sometag" produce "sometag", but I hesitate to suggest yet another proposal so quickly. Oh well, never mind: - start with #7a - change %TAG to %PRE, and allow the prefix to be "". Include the scheme name in the prefix, i.e. no implicit "tag:" - I also suggest swapping the handle and prefix in %PRE, because I think that's easier to read. A ! separator is needed to avoid ambiguity between an empty prefix and an empty handle. More precisely, pre-directive ::= "%PRE" WS [handle "!" [WS]] [prefix] [WS comment] handle ::= [a-zA-Z0-9_]* - the directive "%PRE prefix" (no !) changes the prefix used when there is no ! or : in a tag-specifier. By default this is "tag:yaml.org,2002:". - the default-default prefix is changed back to "", i.e. equivalent to "%PRE !". Let's refer to this as #7b. Examples: --- - !int # tag:yaml.org,2002:int - !!foo # foo - !!Bit::Vector # Bit::Vector - !!java.lang.Exception # java.lang.Exception - !Bit::Vector # Bit::Vector (warning: possibly invalid URI) - !java.lang.Exception # tag:yaml.org,2002:java.lang.Exception - !tag:example.com,2004:bar # tag:example.com,2004:bar - !xsd!decimal # error, undefined handle 'xsd' %PRE tag:notyaml.org,2004: %PRE ! tag:example.org,2004: %PRE xsd! http://www.w3.org/2001/XMLSchema# %PRE priv! --- - !int # tag:notyaml.org,2004:int - !!foo # tag:example.org,2004:foo - !!Bit::Vector # tag:example.org,2004:Bit::Vector - !!java.lang.Exception # tag:example.org,2004:java.lang.Exception - !Bit::Vector # Bit::Vector (warning: possibly invalid URI) - !java.lang.Exception # tag:notyaml.org,2004:java.lang.Exception - !tag:example.com,2004:bar # tag:example.com,2004:bar - !xsd!decimal # http://www.w3.org/2001/XMLSchema#decimal - !priv!mytag # mytag Effects: - loses the backward compatibility advantage of reporting private tags as "tag:private.yaml.org,2002:..." - removes the ugliness of reporting private tags as "tag:private.yaml.org,2002:..." - the tag: URI scheme no longer has any special status (although it would still be recommended) - tags are no longer necessarily URI references - referring to private tags after globalization is still possible - essentially as simple as #7a. > | The need for such an appendix is a clear indication of an overly complex > | proposal. The spec should stand on its own. > > Once again, it isn't clear from reading the spec how one would > 'resolve' unspecified-mapping into a python 'dict'. Perhaps this > isn't necessary. I don't think it's necessary. The spec is language-independent; this mapping is language- and API-dependent. It's the kind of thing that should go in an "Implementors' Guidelines" document, not the spec. David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-05 07:00:53
|
On Sun, Sep 05, 2004 at 12:54:29AM +0100, David Hopwood wrote: | - change the default-default %TAG to "tag:private.yaml.org,2002:" I'm not sure how this affects things. --- - !int JMP is cooked to, --- - !tag:private.yaml.org,2002:int JMP I thought this was backwards compatible? Thanks. Clark |
From: David H. <dav...@bl...> - 2004-09-05 18:17:28
|
Clark C. Evans wrote: > On Sun, Sep 05, 2004 at 12:54:29AM +0100, David Hopwood wrote: > | - change the default-default %TAG to "tag:private.yaml.org,2002:" > > I'm not sure how this affects things. > > --- > - !int JMP > > is cooked to, > > --- > - !tag:private.yaml.org,2002:int JMP > > I thought this was backwards compatible? No, in #7a (and #7 and #7b), the default prefix only affects tag-specifiers that start with "!!". So "!int" still gets cooked to "tag:yaml.org,2002:int". -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-05 00:12:48
|
summary: This is the nineth-pass draft, based on the sixth-pass draft. and incorporates what I imagine to be Brian's way of thinking about this sort of stuff. This draft incorporates: - David's notion of !implicit-plain and !implicit-scalar - Onoma/Why suggestion of local then built-in resolution order - Oren/Clark's need for an explicit model of what's going on - Onoma's recognition that private types are different from ambiguous types (although he may not have said it that way). In addition, this draft assumes "YAML 1.1" (see Clark's post). In a nutshell, this means: - YAML version bumped to 1.1 to reflect incompatible changes. - Directives per stream instead of per document. - One directive per line (with optional trailing comment) - Use spaces in directives. # note: None of this has been approved by Brian yet. Also, the # YAML 1.1 notion has not received any feedback yet. It isn't # crucial for this proposal, though. One method of typing is to use 'local' tags in your YAML document, which specify that a tag's binding does not come from the set of globally unique names. It does, in essence only name a type specific to a particular application or language and is not intended for sharing. For small teams or controled contexts, this is a perfectly resonable logic; one used by many programming languages and thus support of these sorts of types is needed. The 'global' method of typing, uses URIs, in particular tagURIs where possible, to identify a particular data type. This method is needed in situations where work is spread across different teams, platforms, languages, and applications. Agreement is often hard to get without explicit mechanisms to prevent name collision. Since 'global' typing is often painful to write, we introduce a syntax-only mechanism, %TAG. This mechanism allows a tagURI to be broken into chunks, the first (longer) half of the tag, containing the taggingEntity, is moved up into the declaration and given a handle. The second (shorter) half is then used within each tag as an together with the handle that links it to the longer half. The combining of the parts is done by the parser, so the application always sees full tagURIs. Since tags are often tedious to write, many times they are implied by the context or by a regular expression; may be omitted omitted altogether, which we call missing; or given short-hand forms which cannot be interpreted by a YAML Processor. We call these tags "ambiguous" tags. An ambiguous tag is always rewritten into global or local tag during tag 'resolution' by the application before loading the given node into a native data structure. This proposal introduces 2 mechanisms for 'fixing-up' tags, and five classes of tags that are involved. The first mechanism is called "cooking", this is done by the parser without any feedback from the application. The second mechanism is called tag resolution, and is done by the application before loading into memory. The result of the first pass, cooking, gives three types of tags: 'local', 'global', and 'ambiguous'. The result of the second pass converts all 'ambiguous' tags into 'local' or 'global' tags. The second fix-up, resolution, is considered by the YAML specification as a "transformation" from one YAML document to another. And thus, this transformation should only use information that is in the Representation Model (including where the node is, it's kind, and its tag). In particular, styles, key order, and other information should not be used in tag resolution. syntax: - We open up the tag mechanism !tag to allow _only_ characters containing uric characters from RFC2396, additionally allowing '[' and ']' since these characters are used in IP6 addresses. In particular, characters forbidden from tags include | \ ^ {} ` and non-printable characters. While the ^ character may occur in a !tag appearing in a YAML document, the ^ character is magical and is not considered part of the tag. - We introduce a new directive 'tag' which provides a way to shorten the data entry of tagURIs. In particular, declaration := "%TAG" WS taggingEntity ":" spec_first [ WS handle ] Where 'taggingEntity' refers to the same production in the tagURI specification and WS is white space. The taggingEntity refers to either a domain or email address followed by the minting date; see tagURI specification for details. The 'spec_first' refers to zero or more non-space characters (it is optional). The 'handle' refers to a sequence of one or more word characters [a-zA-Z0-9_]. Optionally, the handle may be missing, this case is called the 'default prefix' and the handle is considered to be the empty string ''. In a YAML document, each handle must be unique via string comparison. - We extend the !tag syntax production to allow a single '^' character, which is in the reserved characters above, the syntax for this special case is, taguri := '!' handle '^' spec_second In this circumstance, the 'handle' _must_ appear as a handle in one of the stream's directives. The 'spec_second', is zero or more non-space characters; with the restriction that either spec_first or spec_second (or both) must be at least one character. - Tags from the YAML type library are hereby limited to a single word, this limitation allows common tags to be used in the 'ambiguous' case, with other tag sets with a custom 'resolution' mechanism (aka schema); so that the chance for conflict is signicantly lessened. cooking rules: - For every special tag having a '^', the parser will do special cooking to join the information specified in the declaration together with the node's tag, such nodes will be treated as if they had been tagged, cooked := "!tag:' taggingEntity ":" spec_first spec_second Note that the 'handle' is not included in this information, it is considered a detail of the Presentation model, and should not occur in tools that comply with the Serialization nor Representation models. Thus, the 'handle' is _not_ part of the core YAML information model, it is merely a syntax-level trick to ease the burden of typing and human reading. Also note while other URI schemes may appear in a tag, this cooking mechanism purposefully constructs tagURIs; that is, globally unique identifiers lacking protocol or access semantics. - Missing tags are also cooked. If the node's tag is not provided, then the cooking process provides a default 'missing' tag. mapping -> !missing-mapping sequence -> !missing-seqence plain scalar -> !missing-plain other scalar -> !missing-scalar Note that this cooking is done as part of the 'parse' process, and that it is impossible to distinguish between a missing tag, and one that was actually tagged missing. After this cooking, these tags are said to be 'ambiguous'. In particular, the Serial and Representation model DO NOT have a flag that distinguishes between plain and other scalar forms. Other than this magical cooking process (which is entirely syntatic short-hand), the difference between a plain and quotes or block scalar are purely presentational. - Tags starting with a '!' are considered "local" types, and are passed through the cooking process without further ado. - Tags starting with a word followed by a colon, are considered "global" types, and are also passed through the cooking process. Note that "global" tags include URIs, as well as Perl package names, such as Perl::Package - If the document has a default prefix (a directive with an empty handle), then all remaining tags are cooked according to the "^" rule above, using the taggingEntity and the spec_first from the directive with the empty handle. Otherwise, these tags are passed through the cooking process untouched. In this case they are called 'ambiguous' tags. resolution: - Resolution is optional. In particular, every YAML document has a valid YAML Representation before resolution. The resolution process is a transformation process from one YAML Representation to another. While the result after resolution may technically a different YAML document, it is intended that the application maintain 'semantic' equivalence. - Only tags which are 'ambiguous' should be affected by the resolution process. In particular, global and local tags pass through this stage untouched. Resolution is successful when all ambiguous tags have been converted to global tags or local tags. - The application has first say on how ambiguous tags are handled. However, the application should only take into account information from the YAML Representation Model when making its determination; that it, it should not use key order, syntax styles, comments, directives, or any other presentation or sequential model attribute when choosing what tags to modify. Of course, placement of the node in the graph, regular expression analysis, and of course, the actual tag name may be used during this transformation. - Any remaining 'ambiguous' tags which remain after the application has finished its schema-specific resolution follow a standard procedure. First, the following missing-tags are rewritten: - !missing-sequence -> 'tag:yaml.org,2002:seq' - !missing-mapping -> 'tag:yaml.org,2002:map' - !missing-scalar -> 'tag:yaml.org,2002:str' - The 'missing-plain' tag, if any still remain, is processed by the parser against any regular expressions in any YAML types from the YAML Type Repository it knows about. This is inheritly a fuzzy process; but, a processor should make good and try to resolve as many YAML Types as it can. All remaining 'missing-plain' tags are mapped to 'tag:yaml.org,2002:str' - The YAML processor then may choose to match any remaining ambiguous tags against types it knows about from the YAML Type Repository. In particular, it could choose to map !int to 'tag:yaml.org,2002:int', or, if the YAML processor doesn't know about int, it may just pass. - At this point, if any ambiguous tags remain, they are converted to local tags. So, !bing becomes !!bing recognition: - This process happens after resolution, and simply 'looks-up' native data types supported by the local environment with those that are requested via the tag property. - If any tags are 'unrecognized', they should be reported via warning message; although the YAML Processor may choose to continue by loading them as a string, mapping or sequence. This is entirely an option of the Application. design: - This change set takes into account that !tags are just like missing tags; the application should have a say in what they mean. However, this say should eventually be formalized into a schema, thus it requires a 'formal underpinning'. This proposal calls this process 'resolution' and grounds any such process as a standard YAML Transformation. In effect, this allows any YAML Schema to be mathematically described. Thus, while the process is inheritantly 'ambiguous' and must be left open to the Application; it is properly constrainted to fit within the intent of YAML's model. In particular, a %SCHEMA or other directive later could be added as an optional "formal" specification of how this process should be carried out. - It is helpful to view an empty tag as simply a syntax-shorthand for a particular tag. You could almost think that this removes the 'plain scalar' wart. The default information model from the parse is then formal before resolution is done, and can be used to generate an identical document on output. In particular, it allows one to even specify on 'emit' what syntax form to use, quoted or plain; although, this would be a specific transformation. - The %TAG shorthand, is actually just a simple syntax trick, and is a directive to clearly show that magic is about to happen. - The "^" character was chosen because it is not included in RFC2396's uric production (aka taguri's specific), and it doesn't look like any of our other indicators. This character _was_ used for the previous cut^paste mechanism, but that mechanism is depreciated. - We use tagURI specification (http://taguri.org) to define the unique URIs. This follows previous versions of the YAML spec. The tagURI is used because it does not imply access semantics and defines an easily 'mint-able' unique identifier. - We purposefully named the directive TAG since it corresponds to the tagURI. If at a later date and time, we decide on another mechanism, say one based on HTTP schema access, we can add this directive independently, and, if appropriate phase out this directive. - While this mechanism does not explicitly allow a quick usage of YAML types; it allows for global tags to be used via long form or with a prefix^. The limitation of tags to a single word allows them to be used in an ambiguous places with a small risk of collision. compatibility: - This proposal is introduced as part of the YAML 1.1 set of changes. Documents explicitly makrd as YAML:1.0 must be parsed according to the old rules. - Documents lacking a %YAML directive should be assumed to be YAML 1.1. However, the processor should make a reasonable attempt to identify that a YAML 1.0 document is being parsed. When this discovery is made "too late", the processor should emit an error and abort parsing. - This proposal uses the same ^ character as the older cut^paste mechanism. This older syntax trick is not compatible with this proposal, and is depreciated. During the transition period, we recommend parser's keep the old cut^paste logic, with an appropriate warning, unless there is a %TAG directive, in this case, the usage above is implied. - The magical cooking rules in the core specification are also depreciated with this specification. Since the current version of the specification does not allow tags "looking like a URI" and the %TAG parameter, either of these can be used to identify newer style YAML documents. When read with this semantics: - !!private tags remain private and are not subject to cooking, this is quite nice; PyYAML will be completely unaffected by this proposal. - In the absence of explicit application intervention, !word tags get mapped to their corresponding YAML types in the type library, if they happen to be available. Also, most users will be unaffected by this change - The other tags, which are less frequently used, will require parser-specific or application specific routing; but the proposal allows this to happen, so the pratical impact should be minimal. example: The following document, %TAG bar.com,2004:timesheet/ meet --- !tag:baz.com,2004:mixed/list # global tag - event: !meet^meeting # magic tag where: office # missing tag date: 2004-09-09 # missing tag duration: !!int 1:00 # local tag text: boring # missing tag shape: !ellipse # magic tag width: !float 10 # ambiguous tag height: 5 # missing tag ... After 'parsing', but before 'resolution' could be serialized as, with the _same_ Representational Model: --- !tag:baz.com,2004:mixed/list # global tag - event: !tag:bar.com,2004/timesheet/meeting # global tag where: !implicit-plain office # ambiguous tag date: !implicit-plain "2004-09-09" # ambiguous tag duration: !!int 1:00 # local tag text: !implicit-plain 'boring' # ambiguous tag shape: !ellipse # ambiguous tag width: !float "10" # ambiguous tag height: !implicit-plain '5' # ambiguous tag ... After 'resolution', assuming that the application allowed for the 'default' processing, we get: --- !tag:baz.com,2004:mixed/list # global tag - event: !tag:bar.com,2004/timesheet/meeting # global tag where: !tag:yaml.org,2002:str office # global tag date: !tag:yaml.org,200 "2004-09-09" # global tag duration: !!int 1:00 # local tag text: !tag:yaml.org,2002:str 'boring' # global tag shape: !!ellipse # local tag width: !tag:yaml.org:float "10" # global tag height: !tag:yaml.org:int '5' # global tag ... Of course, then during recognition, '!ellipse' and "!int" would have to be there or it'd be an exception. Alternatively, if the application wanted to rewrite the 'ambiguous' tag 'ellipse' it could have output: --- !tag:baz.com,2004:mixed/list # global tag - event: !tag:bar.com,2004/timesheet/meeting # global tag where: !tag:yaml.org,2002:str office # global tag date: !tag:yaml.org,200 "2004-09-09" # global tag duration: !!int 1:00 # local tag text: !tag:yaml.org,2002:str 'boring' # global tag shape: !http://somewhere.tld/bing/elipse # global tag width: !tag:yaml.org:float "10" # global tag height: !tag:yaml.org:int '5' # global tag ... If the original document had a _default_ %TAG mydefault,2002: then after parsing one would have gotten: --- !tag:baz.com,2004:mixed/list # global tag - event: !tag:bar.com,2004/timesheet/meeting # global tag where: !implicit-plain office # ambiguous tag date: !implicit-plain "2004-09-09" # ambiguous tag duration: !!int 1:00 # local tag text: !implicit-plain 'boring' # ambiguous tag shape: !tag:mydefault,2002:ellipse # global tag width: !tag:mydefault,2002:float "10" # global tags height: !implicit-plain '5' # ambiguous tag ... My wrists hurt too much to type more... Clark On Sat, Sep 04, 2004 at 04:50:09PM -0400, Clark C. Evans wrote: | Ok. Here is a suggestion, call it #9. It incoporates several ideas | floating around: | | - It uses the Python/Ruby style of name resolution, as suggested | by T.Onoma and Why. That is, you check for a local (aka private) | package first, next you check built-in packages, and failing that, | an exception is raised. | | - It incorporates David's suggestion of limiting built-in types to | only words (but allowing the '/'). This helps reduce the chance of | collisions, you can be sure that resolution of built-in packages | will always fail if you use names like "Perl::Package" or | "com.company.JavaPackage', etc. | | - It also incorporates David's suggestion of using 'implicit-plain' | and 'implicit-not-plain' tags to make implicits easier to grok; | this happens to put some very nice makeup on a ugly wart. | | - It follow's T.Onoma's request that he be able to specify a | private tag that is _not_ subject to default %TAG cooking. | It make it possible to _expressly_ disable cooking no matter | what %TAGs are present | | - It allows people to use YAML tags in most cases without problem, | and, but, if they really want to be super-safe they would need | to use explicit %TAG based typing. | | - It provides a model for Brian's notion that the Application | is the final authority of what each node's tag is; that is, | the proposal formalizes ambiguity. | | - It incorporates, for the first time, a rationalization of | how implcit typing should be done; which is still poorly | defined and explained in the specification. | | First, let me review/define the types of 'serialization' tags: | | - Global tags are those that are globally unique, traditionally, | these have been URIs; that is, they start with a word followed by a | colon and use only URI characters. Strictly speaking, Perl::Packages | happen to match this production, so they could also be considered | global even though they are not URIs. | | - Private tags are those that have meaning local only to a given | processing environment. They are convient to use, but may conflict | with other uses. Therefore, they should be used carefully but, in | most 99% of cases, there just isn't a problem with collisions. | | - Magical tags are those which are explicitly provided, but happen | to not be Global nor Private. It is not necessary that magic | tags be used; as a combination of global or private tags would | suffice for many purposes. | | - Missing tags are those that are not provided in the YAML | syntax. These have been traditionally been called "implicit" tags, | but please use "missing" instead, as it is far more clear. | | Then, we define a process, called 'Cooking', which is done by the parser | and is purely a syntax-only operation on a Document's tags. The cooking | process uses the %TAG directive to change magical tags into either | Global tags, or Ambiguous tags (defined below). This is done without | any application involvement and is completely defined by the YAML | specification. | | - Ambiguous tags are Magical tags which do not become 'Global' during | the cooking process. They are also Missing tags, with the following | names (provided by the Cooking process): | | plain scalar -> !implicit-plain | non-plain scalar -> !implicit-scalar | mapping -> !implicit-mapping | sequence -> !implicit-sequence | | Therefore, the result of the 'Cooking' process is a non-empty | tag having either Global, Private, or Ambiguous tags. While | it is not strictly necessary to give mappings and sequences | non-empty tags, it is done for consistency. | | Then, we have another process, called 'Resolution' converts Ambiguous | tags into either Global or Private tags. Unlike cooking, this is an | application-directed process; probably carried out by the YAML Processor | via given instructions. The information used by the resolution process | is restricted to that provided in the YAML Representational Model. In | particular, 'Resolution' should be viewed as a transformation of the | YAML graph, the result of resolution _is_ a different YAML document, | albeit one that will typically be directly related to the source | document plus schema information. Note that 'Resolution' does not in | any way affect Global nor Private tags. Thus, one can provide a private | or global tag, and no matter how the resolution process is defined, it | will be passed through unchanged. | | The last stage of processing, 'Recognition' usually happens during | loading, where each node's tag is used to "find" an appropriate native | data type and construct the appropriate binding. If a tag is not | 'recognized' during this process, it is an error. | | states: { O: Orignal, C: Cooked, R: Resolved } | category: { G: Global, P: Private, _: Missing, | M: Magic, A: Ambigous, '*': Depends } | | | In a more concreate form, | | --- # OCR After-Cooking | - !http://yaml.org # GGG http://yaml.org | - !Perl::Package # GGG Perl::Package | - !!private # PPP private | - # _A* implicit-plain | - '' # _A* implict-scalar | - !int # MA* int | ... | | | %TAG clarkevans.com,2004: #default namespace | --- # OCR After-Cooking | - !http://yaml.org # GGG http://yaml.org | - !Perl::Package # GGG Perl::Package | - !!private # PPP private | - # _A* implicit-plain | - '' # _A* implict-scalar | - !int # MGG tag:clarkevans.com,2004:int | ... | | | %TAG clarkevans.com,2004: cce | --- # OCR After-Cooking Resolve? | - !http://yaml.org # GGG http://yaml.org No | - !Perl::Package # GGG Perl::Package No | - !!private # PPP private No | - # _A* implicit-plain Yes | - '' # _A* implict-scalar Yes | - !cce^int # MGG tag:clarkevans.com,2004:int No | - !int # MA* int Yes | ... | | Basically, in this proposal, which we can call #9 if you wish, | is much like #8, only that the default is not private; it is | the process of: | - check for private matches, if not, | - check for any 'regex' based matches | - use matches from tag:yaml.org,2004, | namely !str, !map, !seq for implicit-s | - raise an exception. | | | So, it attempts to blend the 'implicit' mechanism with the | !unambiguous tags. If people use !ambiuous tags... well, | that's their choice; possibly enough rope so they can do | cool things; or, perhaps enough rope to hang themselves, | but, in any event, using ambiguous tags (implicit, or | non-private non-global tags) _is_ recognized as a transofrmation | of the YAML document and treated appropraitely. | | Cheers! | | Clark | | | ------------------------------------------------------- | This SF.Net email is sponsored by BEA Weblogic Workshop | FREE Java Enterprise J2EE developer tools! | Get your free copy of BEA WebLogic Workshop 8.1 today. | http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click | _______________________________________________ | Yaml-core mailing list | Yam...@li... | https://lists.sourceforge.net/lists/listinfo/yaml-core | -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: T. O. <tra...@ru...> - 2004-09-05 03:31:50
|
On Saturday 04 September 2004 08:12 pm, Clark C. Evans wrote: > summary: > > This is the nineth-pass draft, based on the sixth-pass draft. and > incorporates what I imagine to be Brian's way of thinking about > this sort of stuff. This draft incorporates: > > - David's notion of !implicit-plain and !implicit-scalar > - Onoma/Why suggestion of local then built-in resolution order > - Oren/Clark's need for an explicit model of what's going on > - Onoma's recognition that private types are different from > ambiguous types (although he may not have said it that way). > > In addition, this draft assumes "YAML 1.1" (see Clark's post). > In a nutshell, this means: > > - YAML version bumped to 1.1 to reflect incompatible changes. > - Directives per stream instead of per document. > - One directive per line (with optional trailing comment) > - Use spaces in directives. > > # note: None of this has been approved by Brian yet. Also, the > # YAML 1.1 notion has not received any feedback yet. It isn't > # crucial for this proposal, though. > > One method of typing is to use 'local' tags in your YAML document, > which specify that a tag's binding does not come from the set of > globally unique names. It does, in essence only name a type specific > to a particular application or language and is not intended for > sharing. For small teams or controled contexts, this is a perfectly > resonable logic; one used by many programming languages and thus > support of these sorts of types is needed. > > The 'global' method of typing, uses URIs, in particular tagURIs > where possible, to identify a particular data type. This method > is needed in situations where work is spread across different > teams, platforms, languages, and applications. Agreement is often > hard to get without explicit mechanisms to prevent name collision. > > Since 'global' typing is often painful to write, we introduce a > syntax-only mechanism, %TAG. This mechanism allows a tagURI to be > broken into chunks, the first (longer) half of the tag, containing > the taggingEntity, is moved up into the declaration and given a > handle. The second (shorter) half is then used within each tag as an > together with the handle that links it to the longer half. The > combining of the parts is done by the parser, so the application > always sees full tagURIs. > > Since tags are often tedious to write, many times they are implied by > the context or by a regular expression; may be omitted omitted > altogether, which we call missing; or given short-hand forms which > cannot be interpreted by a YAML Processor. We call these tags > "ambiguous" tags. An ambiguous tag is always rewritten into global > or local tag during tag 'resolution' by the application before > loading the given node into a native data structure. This paragraph was bit hard to read. > This proposal introduces 2 mechanisms for 'fixing-up' tags, and five > classes of tags that are involved. The first mechanism is called > "cooking", this is done by the parser without any feedback from the > application. The second mechanism is called tag resolution, and is > done by the application before loading into memory. The result of > the first pass, cooking, gives three types of tags: 'local', > 'global', and 'ambiguous'. The result of the second pass converts > all 'ambiguous' tags into 'local' or 'global' tags. > > The second fix-up, resolution, is considered by the YAML > specification as a "transformation" from one YAML document to > another. And thus, this transformation should only use information > that is in the Representation Model (including where the node is, > it's kind, and its tag). In particular, styles, key order, and other > information should not be used in tag resolution. Hmmm. Maybe a technical name for "cooking" can be tag 'substitution' vs tag 'resolution' (or 'tag transfomation'). We could give resolution a common name too like "cooling". Just a thought. > syntax: > > - We open up the tag mechanism !tag to allow _only_ characters > containing uric characters from RFC2396, additionally allowing > '[' and ']' since these characters are used in IP6 addresses. > > In particular, characters forbidden from tags include | \ ^ {} ` > and non-printable characters. While the ^ character may occur > in a !tag appearing in a YAML document, the ^ character is > magical and is not considered part of the tag. > > - We introduce a new directive 'tag' which provides a way > to shorten the data entry of tagURIs. In particular, > > declaration := "%TAG" WS taggingEntity ":" spec_first [ WS handle ] > > Where 'taggingEntity' refers to the same production in the tagURI > specification and WS is white space. The taggingEntity refers to > either a domain or email address followed by the minting date; > see tagURI specification for details. The 'spec_first' refers to zero > or more non-space characters (it is optional). > > The 'handle' refers to a sequence of one or more word characters > [a-zA-Z0-9_]. Optionally, the handle may be missing, this case is > called the 'default prefix' and the handle is considered to be the > empty string ''. In a YAML document, each handle must be unique via > string comparison. > > - We extend the !tag syntax production to allow a single '^' character, > which is in the reserved characters above, the syntax for this > special case is, > > taguri := '!' handle '^' spec_second > > In this circumstance, the 'handle' _must_ appear as a handle in one > of the stream's directives. The 'spec_second', is zero or more > non-space characters; with the restriction that either spec_first or > spec_second (or both) must be at least one character. > > - Tags from the YAML type library are hereby limited to a single word, > this limitation allows common tags to be used in the 'ambiguous' > case, with other tag sets with a custom 'resolution' mechanism (aka > schema); so that the chance for conflict is signicantly lessened. > > cooking rules: > > - For every special tag having a '^', the parser will do special > cooking to join the information specified in the declaration > together with the node's tag, such nodes will be treated as if > they had been tagged, > > cooked := "!tag:' taggingEntity ":" spec_first spec_second > > Note that the 'handle' is not included in this information, it is > considered a detail of the Presentation model, and should not occur > in tools that comply with the Serialization nor Representation > models. Thus, the 'handle' is _not_ part of the core YAML > information model, it is merely a syntax-level trick to ease the > burden of typing and human reading. > > Also note while other URI schemes may appear in a tag, this cooking > mechanism purposefully constructs tagURIs; that is, globally unique > identifiers lacking protocol or access semantics. > > - Missing tags are also cooked. If the node's tag is not provided, > then the cooking process provides a default 'missing' tag. > > mapping -> !missing-mapping > sequence -> !missing-seqence > plain scalar -> !missing-plain > other scalar -> !missing-scalar I guess 'unspecified' is better. It still strikes me as odd that we have !str for what's not missing, but plain and scalar for what is. seems like !str should be !scalar, but I suppose its too late to worry about that. Hence, 'unspecified-plain' and 'unspecified-str'? > Note that this cooking is done as part of the 'parse' process, > and that it is impossible to distinguish between a missing tag, > and one that was actually tagged missing. After this cooking, > these tags are said to be 'ambiguous'. > > In particular, the Serial and Representation model DO NOT have > a flag that distinguishes between plain and other scalar forms. > Other than this magical cooking process (which is entirely > syntatic short-hand), the difference between a plain and > quotes or block scalar are purely presentational. > > - Tags starting with a '!' are considered "local" types, and are > passed through the cooking process without further ado. > > - Tags starting with a word followed by a colon, are considered > "global" types, and are also passed through the cooking process. > Note that "global" tags include URIs, as well as Perl package > names, such as Perl::Package That's a nice straightforward distinction (IMHO) !local vs. !global: > - If the document has a default prefix (a directive with an empty > handle), then all remaining tags are cooked according to the "^" > rule above, using the taggingEntity and the spec_first from > the directive with the empty handle. > > Otherwise, these tags are passed through the cooking process > untouched. In this case they are called 'ambiguous' tags. > > resolution: > > - Resolution is optional. In particular, every YAML document has > a valid YAML Representation before resolution. The resolution > process is a transformation process from one YAML Representation > to another. While the result after resolution may technically > a different YAML document, it is intended that the application > maintain 'semantic' equivalence. > > - Only tags which are 'ambiguous' should be affected by the resolution > process. In particular, global and local tags pass through this > stage untouched. Resolution is successful when all ambiguous tags > have been converted to global tags or local tags. > > - The application has first say on how ambiguous tags are handled. > However, the application should only take into account information > from the YAML Representation Model when making its determination; > that it, it should not use key order, syntax styles, comments, > directives, or any other presentation or sequential model attribute > when choosing what tags to modify. Of course, placement of the node > in the graph, regular expression analysis, and of course, the > actual tag name may be used during this transformation. Interesting, this sounds familiar. Is there another stipulation like this somewhere? Something to with !omap? > - Any remaining 'ambiguous' tags which remain after the application > has finished its schema-specific resolution follow a standard > procedure. First, the following missing-tags are rewritten: > > - !missing-sequence -> 'tag:yaml.org,2002:seq' > - !missing-mapping -> 'tag:yaml.org,2002:map' > - !missing-scalar -> 'tag:yaml.org,2002:str' > > - The 'missing-plain' tag, if any still remain, is processed by > the parser against any regular expressions in any YAML types > from the YAML Type Repository it knows about. This is inheritly > a fuzzy process; but, a processor should make good and try to > resolve as many YAML Types as it can. All remaining 'missing-plain' > tags are mapped to 'tag:yaml.org,2002:str' Implicit typing? > - The YAML processor then may choose to match any remaining ambiguous > tags against types it knows about from the YAML Type Repository. > In particular, it could choose to map !int to 'tag:yaml.org,2002:int', > or, if the YAML processor doesn't know about int, it may just pass. > > - At this point, if any ambiguous tags remain, they are converted to > local tags. So, !bing becomes !!bing Well, it certainly is more complicated (-1), but it is well defined and does the job (+2). > recognition: > > - This process happens after resolution, and simply 'looks-up' > native data types supported by the local environment with > those that are requested via the tag property. > > - If any tags are 'unrecognized', they should be reported via > warning message; although the YAML Processor may choose to > continue by loading them as a string, mapping or sequence. > This is entirely an option of the Application. > > design: > > - This change set takes into account that !tags are just like missing > tags; the application should have a say in what they mean. However, > this say should eventually be formalized into a schema, thus it > requires a 'formal underpinning'. This proposal calls this process > 'resolution' and grounds any such process as a standard YAML > Transformation. In effect, this allows any YAML Schema to be > mathematically described. Thus, while the process is inheritantly > 'ambiguous' and must be left open to the Application; it is properly > constrainted to fit within the intent of YAML's model. In particular, > a %SCHEMA or other directive later could be added as an optional > "formal" specification of how this process should be carried out. I think this really puts whole tag system on firm foundation. Nice. > - It is helpful to view an empty tag as simply a syntax-shorthand for a > particular tag. You could almost think that this removes the 'plain > scalar' wart. The default information model from the parse is then > formal before resolution is done, and can be used to generate an > identical document on output. In particular, it allows one to > even specify on 'emit' what syntax form to use, quoted or plain; > although, this would be a specific transformation. > > - The %TAG shorthand, is actually just a simple syntax trick, and > is a directive to clearly show that magic is about to happen. > > - The "^" character was chosen because it is not included > in RFC2396's uric production (aka taguri's specific), and > it doesn't look like any of our other indicators. This > character _was_ used for the previous cut^paste mechanism, > but that mechanism is depreciated. > > - We use tagURI specification (http://taguri.org) to define the > unique URIs. This follows previous versions of the YAML spec. > The tagURI is used because it does not imply access semantics > and defines an easily 'mint-able' unique identifier. > > - We purposefully named the directive TAG since it corresponds to the > tagURI. If at a later date and time, we decide on another mechanism, > say one based on HTTP schema access, we can add this directive > independently, and, if appropriate phase out this directive. > > - While this mechanism does not explicitly allow a quick usage > of YAML types; it allows for global tags to be used via > long form or with a prefix^. The limitation of tags to a > single word allows them to be used in an ambiguous places with > a small risk of collision. Tags beginning with 'yaml:' could become yaml.org,2002: (hmmm, maybe 'yaml.org,1.1:' ?) > compatibility: > > - This proposal is introduced as part of the YAML 1.1 set of changes. > Documents explicitly makrd as YAML:1.0 must be parsed according > to the old rules. > > - Documents lacking a %YAML directive should be assumed to be > YAML 1.1. However, the processor should make a reasonable > attempt to identify that a YAML 1.0 document is being parsed. > When this discovery is made "too late", the processor should > emit an error and abort parsing. > > - This proposal uses the same ^ character as the older cut^paste > mechanism. This older syntax trick is not compatible with this > proposal, and is depreciated. During the transition period, we > recommend parser's keep the old cut^paste logic, with an > appropriate warning, unless there is a %TAG directive, in this > case, the usage above is implied. > > - The magical cooking rules in the core specification are also > depreciated with this specification. Since the current version > of the specification does not allow tags "looking like a URI" and > the %TAG parameter, either of these can be used to identify newer > style YAML documents. When read with this semantics: > > - !!private tags remain private and are not subject to cooking, > this is quite nice; PyYAML will be completely unaffected by > this proposal. > > - In the absence of explicit application intervention, > !word tags get mapped to their corresponding YAML types > in the type library, if they happen to be available. Also, > most users will be unaffected by this change > > - The other tags, which are less frequently used, will require > parser-specific or application specific routing; but the > proposal allows this to happen, so the pratical impact > should be minimal. > > example: > > The following document, > > %TAG bar.com,2004:timesheet/ meet > --- !tag:baz.com,2004:mixed/list # global tag > - event: !meet^meeting # magic tag > where: office # missing tag > date: 2004-09-09 # missing tag > duration: !!int 1:00 # local tag > text: boring # missing tag > shape: !ellipse # magic tag > width: !float 10 # ambiguous tag > height: 5 # missing tag > ... > > After 'parsing', but before 'resolution' could be serialized as, > with the _same_ Representational Model: > > --- !tag:baz.com,2004:mixed/list # global tag > - event: !tag:bar.com,2004/timesheet/meeting # global tag > where: !implicit-plain office # ambiguous tag > date: !implicit-plain "2004-09-09" # ambiguous tag > duration: !!int 1:00 # local tag > text: !implicit-plain 'boring' # ambiguous tag > shape: !ellipse # ambiguous tag > width: !float "10" # ambiguous tag > height: !implicit-plain '5' # ambiguous tag > ... > > After 'resolution', assuming that the application allowed for > the 'default' processing, we get: > > --- !tag:baz.com,2004:mixed/list # global tag > - event: !tag:bar.com,2004/timesheet/meeting # global tag > where: !tag:yaml.org,2002:str office # global tag > date: !tag:yaml.org,200 "2004-09-09" # global tag > duration: !!int 1:00 # local tag > text: !tag:yaml.org,2002:str 'boring' # global tag > shape: !!ellipse # local tag > width: !tag:yaml.org:float "10" # global tag > height: !tag:yaml.org:int '5' # global tag > ... > > Of course, then during recognition, '!ellipse' and "!int" would > have to be there or it'd be an exception. Alternatively, if > the application wanted to rewrite the 'ambiguous' tag 'ellipse' > it could have output: > > --- !tag:baz.com,2004:mixed/list # global tag > - event: !tag:bar.com,2004/timesheet/meeting # global tag > where: !tag:yaml.org,2002:str office # global tag > date: !tag:yaml.org,200 "2004-09-09" # global tag > duration: !!int 1:00 # local tag > text: !tag:yaml.org,2002:str 'boring' # global tag > shape: !http://somewhere.tld/bing/elipse # global tag > width: !tag:yaml.org:float "10" # global tag > height: !tag:yaml.org:int '5' # global tag > ... > > If the original document had a _default_ %TAG mydefault,2002: then > after parsing one would have gotten: > > --- !tag:baz.com,2004:mixed/list # global tag > - event: !tag:bar.com,2004/timesheet/meeting # global tag > where: !implicit-plain office # ambiguous tag > date: !implicit-plain "2004-09-09" # ambiguous tag > duration: !!int 1:00 # local tag > text: !implicit-plain 'boring' # ambiguous tag > shape: !tag:mydefault,2002:ellipse # global tag > width: !tag:mydefault,2002:float "10" # global tags > height: !implicit-plain '5' # ambiguous tag > ... > > My wrists hurt too much to type more... I'm amazed you can type that much and so well! Nice work. I especially like how it clarifies _all_ the possibilities. This proposal may indeed prove to have wings. -- T. |
From: Clark C. E. <cc...@cl...> - 2004-09-05 06:25:23
|
summary: This is draft 9a, based on the 9th pass draft. I think with this draft I finally 'grok' what Brian is saying: The Application is in _complete_ control of how a YAML document gets loaded into native language types. I agree. But that said, I still want a succinct mechanism for specifying globally unique names in YAML. a nasty wart: Our current information model, especially section 3.3 (Completeness) is poor thinking, or, at the very least is too complicated. Much of the complexity emerged from the 'implicit' typing of nodes having the plain scalar style. When this section was written, we pictured the parser producing 'untagged' nodes as having an empty tag. However, to allow an application to do cool things with plain scalars, we pictured a "hack" flag that was present for scalars that signaled if the node had come from a plain scalar. This visualization bugged us. Beacuse a property of the Presentation Model (the style of scalar nodes) seemed to be "bleeding" into the rest of the model. It gave Oren and I a very uncomfortable feeling. However, it was _clearly_ correct to allow applications to type their data differently depending if the scalar was plain or not. So, things were not good. And a good part of Section 3.3 is an attempt to limit the impact of this ugly hack. So, we invented this new thing, "tag resolution", where those pesky empty tags were filled in, so to speak. Also, right after tag resolution, we pictured that pesky 'isPlainStyle' flag going bye bye. It was a wart. a new hope: Lukly, David has shown us an alternative path. The distinction between untagged plain scalars and untagged quoted scalars _can_ after all be reported without introducing this ugly wart flag -- without the "bleeding" of presentation information into the representation model. Thank goodness. So here is how it is done. We start by having 4 built-in tags specified by the YAML specification. These tags are, unspecified-mapping unspecified-sequence unspecified-implicit unspecified-scalar Plus an parsing rule, so that the following, --- plain: - 'single' - "double" - | literal - > folded is simply syntax sugar for, --- !unspecified-mapping { !unspecified-implicit "plain": !unspecified-sequence [ !unspecified-scalar "single", !unspecified-scalar "double", !unspecified-scalar "literal\n", !unspecified-scalar "folded" ] } That is, both of the documents above have exactly the same YAML Representation. What this also tells us is that the what we thought was a very special thing -- tag resolution -- isn't so special after all. Since the parser's results can always have tags filled-in, and deliver content in the exact structure of the 'Node Graph Representational Model', we do not need to worry about tag resolution. No bleeding at all! No special treatment required. impacts: - The most important impact of this change is that tag 'resolution' need not be detailed in the specification 'proper' -- in fact, it is completely up to the application, as Brian has been insisting. Resolution is no longer special, it is simply a transformation of the input graph to produce another graph, where the target graph is the one that is actual loaded into native data structures. While this should be explained in the specification; the only "limits" placed on the transformation is that serialization and presentation features should not be used during this process. - We can still allow a user to distinguish between plain and quoted scalars; but _only_ if they have not specified a tag for the node. Once a tag is provided, the syntax sugar doesn't happen, so a parser would not report the difference between a plain or single quoted node for tagged nodes. This is a sensible thing, it allows people to change styles without worrying about changing the YAML Representation of the data. For example, a really-really-long tag-free plain scalar could be converted to a double quoted string by providing the 'unspecified-implicit' tag. - Allows for two scalars with same content to be used in a mapping, for example, { 3: "number 3", '3': "string 3" } is perfectly legal since the first key is !unspecified-implicit and the second key has a tag of !unspecified-scalar. This is expected behavior; in one system the first item may be converted to an integer, and the second key into a string so that are not any duplicate keys. However, on a system with only strings, the resulting document would be an exception. This inconsistency was difficult to explain, but it is easy now. In the system with only strings, the 'resolution' process (which is after parsing, before loading) created a duplicat key, and thus is in error. The document itself, is _not_ in error. So, the seeming inconsistency is moved from the document using 'implicit' types into the application's domain who's transform created invalid YAML Representation Grap. This is where we'd expect the error, and it is now in the correct spot. - In the same light, if we allowed the behavior above, the previous line of thinking had problems with the document { 3: 'number 3', 3: 'string 3' }. Clearly the document itself is invalid; however, it was previously unclear where the problem was at. Now it is obvious, the equivalent representation after the syntax sugar is expanded has a duplicate key, ('unspecified-implied', '3'). Another problem successfully averted. - We also move 'str', 'map' and 'seq' tags out of the specification. They properly belong in the YAML Tag Repository. This matches with expectations, while a YAML document may contain an 'unspecified-mapping', we shouldn't require that a native binding actually have a hashtable or dictionary implementation. It could, with a very strict resolution transform, map the unspecified mapping onto a COBOL record. I always wondered 'why' these tags were special and other built-in types were not. - I can't think of any downsides; this syntax sugar rocks. on globalization: While creating global tags is what started this whole discussion this month, it is related to the above wart removal. The proposed %TAG mechanism is also syntax shorthand to help build long tags. Since we already had a tag 'resolution' process, it seemed logical to move this globalization operation there as well. Brian objected, and rightly so. The tag "resolution" process is actually a full-blown transformation from one YAML representation to another. While it may happen subtly, and without actual code, it is logically converting one document (which is ambiguous due to its unspecified tags and whatnot) to another which more closely resembles what the application needs. It is completely unrelated to tag globalization. While tag globalization is a syntax-rewrite trick, it happens at a much lower level; it is not a transformation of the parser's output, "cooking" so to speak can (and probably should) happen at parse time. syntax: - We open up the tag mechanism !tag to allow _only_ characters containing uric characters from RFC2396, additionally allowing '[' and ']' since these characters are used in IP6 addresses. In particular, characters forbidden from tags include | \ ^ {} ` and non-printable characters. While the ^ character may occur in a !tag appearing in a YAML document, the ^ character is magical and is not considered part of the tag. - We introduce a new directive 'tag' which provides a way to shorten the data entry of tagURIs. In particular, declaration := "%TAG" WS taggingEntity ":" spec_first [ WS handle ] Where 'taggingEntity' refers to the same production in the tagURI specification and WS is white space. The taggingEntity refers to either a domain or email address followed by the minting date; see tagURI specification for details. The 'spec_first' refers to zero or more non-space characters (it is optional). The 'handle' refers to a sequence of one or more word characters [a-zA-Z0-9_]. Optionally, the handle may be missing, this case is called the 'default prefix' and the handle is considered to be the empty string ''. In a YAML document, each handle must be unique via string comparison. - We extend the !tag syntax production to allow a single '^' character, which is in the reserved characters above, the syntax for this special case is, taguri := '!' handle '^' spec_second In this circumstance, the 'handle' _must_ appear as a handle in one of the stream's directives. The 'spec_second', is zero or more non-space characters; with the restriction that either spec_first or spec_second (or both) must be at least one character. parsing rules: - For every special tag having a '^', the parser will join the information specified in the declaration together with the node's tag, such nodes will be treated as if they had been tagged, cooked := "!tag:' taggingEntity ":" spec_first spec_second Note that the 'handle' is not included in the parser output, it is considered a detail of the Presentation model, and should not occur in tools that comply with the Serialization nor Representation models. Thus, the 'handle' is _not_ part of the core YAML information model, it is merely a syntax-level trick to ease the burden of typing and human reading. Also note while other URI schemes may appear in a tag, this cooking mechanism purposefully constructs tagURIs; that is, globally unique identifiers lacking protocol or access semantics. - If a node's tag is not provided, the parser gives it one: mapping -> !unspecified-mapping sequence -> !unspecified-seqence plain scalar -> !unspecified-implicit other scalar -> !unspecified-scalar In particular, the Serial and Representation model DO NOT have a flag that distinguishes between plain and other scalar forms. Further, just beacuse a node is reported with a 'unspecified-implicit' tag does not mean it was written with the plain style. The tag could have been provided by the document author. Other than this parsing rule, which applies _only_ when a tag is missing, difference between a plain and quotes or block scalar are purely presentational. - Tags starting with a '!' are considered "local" types, and are passed through without further ado. Thus, these tags are not subject to default %TAG prefix expansion. - Tags starting with a word followed by a colon, are considered "global" types, and are also passed on through. Note that "global" tags include URIs, as well as Perl package names, such as Perl::Package - If the document has a default prefix (a directive with an empty handle), then all remaining tags are cooked according to the "^" rule above, using the taggingEntity and the spec_first from the directive with the empty handle. - Otherwise, these tags are passed through the cooking process untouched. resolution: - Resolution refers to a process after the application has been provided a valid YAML Representation, and before the application has loaded this representation into native data structures. - An application may choose to alter the input document in any way it sees fit, provided that it only uses information provided in the YAML Representation model for this transformation. In particular, style information, key order, and other presentation or serialization attributes should not be used to guide the transformation process. - In particular, if the application chooses to use types from the YAML Type Repository, it may choose to use a helper document transformation which the parser may provide. helper: - A YAML parser may wish to provide a 'helper' transformation which fills in unspecified tags, and converts short 'local' tags which seem to refer to YAML types to their global variety. - Unspecified tags could be converted as follows: - !unspecified-sequence -> 'tag:yaml.org,2002:seq' - !unspecified-mapping -> 'tag:yaml.org,2002:map' - !unspecified-scalar -> 'tag:yaml.org,2002:str' - The 'unspecified-plain' tag, if any still remain, is processed by the parser against any regular expressions in any YAML types from the YAML Type Repository it knows about. This is inheritly a fuzzy process; but, a processor should make good and try to resolve as many YAML Types as it can. All remaining 'unspecified-plain' tags are mapped to 'tag:yaml.org,2002:str' - The YAML processor then may choose to match any remaining local tags against types it knows about from the YAML Type Repository. In particular, it could choose to map !int to 'tag:yaml.org,2002:int', or, if the YAML processor doesn't know about int, it may just pass. # the rest is similar to proposal #9, I may finish later -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Clark C. E. <cc...@cl...> - 2004-09-05 14:35:03
|
a shorter version | parsing rules: | | - For every special tag having a '^', the parser will join the | information specified in the declaration together with the node's | tag, such nodes will be treated as if they had been tagged, | | cooked := "!tag:' taggingEntity ":" spec_first spec_second | | Note that the 'handle' is not included in the parser output, it is | considered a detail of the Presentation model, and should not occur | in tools that comply with the Serialization nor Representation | models. Thus, the 'handle' is _not_ part of the core YAML | information model, it is merely a syntax-level trick to ease the | burden of typing and human reading. | | Also note while other URI schemes may appear in a tag, this cooking | mechanism purposefully constructs tagURIs; that is, globally unique | identifiers lacking protocol or access semantics. | | - If a node's tag is not provided, the parser gives it one: | | mapping -> !unspecified-mapping | sequence -> !unspecified-seqence | plain scalar -> !unspecified-implicit | other scalar -> !unspecified-scalar | | In particular, the Serial and Representation model DO NOT have a | flag that distinguishes between plain and other scalar forms. | Further, just beacuse a node is reported with a | 'unspecified-implicit' tag does not mean it was written with the | plain style. The tag could have been provided by the document | author. Other than this parsing rule, which applies _only_ when a | tag is missing, difference between a plain and quotes or block | scalar are purely presentational. Note these 'unspecified-' tags are described in the YAML specification and are _not_ part of the YAML Tag Library. tag:yaml.org,2002:str, tag:yaml.org,2002:map, tag:yaml.org,2002:seq are all moved out of the specification. | | - Tags starting with a word followed by a colon, are considered | "global" types, and are also passed on through. Note that "global" | tags include URIs, as well as Perl package names, such as Perl::Package athere is no semantics attached with global tags, in particular, they need not be URIs. | | - Tags starting with a '!' are considered "local" types, are not subject to the default %TAG and are otherwise equivalent to those tags written without the the leading '!'. !!no-default -> !no-default | - If the document has a default prefix (a directive with an empty | handle), then all remaining tags are cooked according to the "^" | rule above, using the taggingEntity and the spec_first from | the directive with the empty handle. | - Otherwise, these tags are reported to the application exactly as they passed to the application exactly as they appear; no default-default %TAG; all %TAGs are explicit. The YAML 'process' has nothing to do with tag 'resolution', unspecified tags are actaully concretely represented via a logal tag, which can be interpreted by the application in a manner it wishes. example: This particular document, %TAG yaml.org,2002: yaml %TAG clarkevans.com,2004:bing/ # default --- bing: - !int wobble - !!int JMP - !yaml^int 23 - 23 plain: - 'single' - "double" - |- literal - > folded Would be parsed as the following, --- !unspecified-mapping !unspecified-implicit "bing": !unspecified-sequence - !tag:clarkevans.com,2004:bing/int "wobble" - !int "JMP" - !tag:yaml.org,2002:int "23" - !unspecified-implicit "23" !unspecified-implicit "plain": !unspecified-mapping - !unspecified-scalar "single" - !unspecified-scalar 'double' - !unspecified-scalar "literal" - !unspecified-scalar "folded" This would be parsed (a second time) as exactly the same document since all of the tags are providied, the !! doesn't occur, and since there isn't a %TAG directive. compatibility: This is completely up to the application to provide. Clark |
From: Clark C. E. <cc...@cl...> - 2004-09-05 14:48:16
|
ignore the previous one, please | parsing rules: | | - For every special tag having a '^', the parser will join the | information specified in the declaration together with the node's | tag, such nodes will be treated as if they had been tagged, | | cooked := "!tag:' taggingEntity ":" spec_first spec_second | | Note that the 'handle' is not included in the parser output, it is | considered a detail of the Presentation model, and should not occur | in tools that comply with the Serialization nor Representation | models. Thus, the 'handle' is _not_ part of the core YAML | information model, it is merely a syntax-level trick to ease the | burden of typing and human reading. | | Also note while other URI schemes may appear in a tag, this cooking | mechanism purposefully constructs tagURIs; that is, globally unique | identifiers lacking protocol or access semantics. | | - If a node's tag is not provided, the parser gives it one: | | mapping -> !unspecified-mapping | sequence -> !unspecified-seqence | plain scalar -> !unspecified-implicit | other scalar -> !unspecified-scalar | | In particular, the Serial and Representation model DO NOT have a | flag that distinguishes between plain and other scalar forms. | Further, just beacuse a node is reported with a | 'unspecified-implicit' tag does not mean it was written with the | plain style. The tag could have been provided by the document | author. Other than this parsing rule, which applies _only_ when a | tag is missing, difference between a plain and quotes or block | scalar are purely presentational. Note these 'unspecified-' tags are described in the YAML specification and are _not_ part of the YAML Tag Library. tag:yaml.org,2002:str, tag:yaml.org,2002:map, tag:yaml.org,2002:seq are all moved out of the specification. | | - Tags starting with a word followed by a colon, are considered | "global" types, and are also passed on through. Note that "global" | tags include URIs, as well as Perl package names, such as Perl::Package , there are no semantics attached with global tags, in particular, they need not be URIs. | | - Tags starting with a '!' are considered "private" tags and | are passed on without further ado. In particular, they are | not subject to %TAG globalization | | - If the document has a default prefix (a directive with an empty | handle), then all remaining tags are cooked according to the "^" | rule above, using the taggingEntity and the spec_first from | the directive with the empty handle. | - Otherwise, these tags are reported to the application exactly as they passed to the application exactly as they appear; no default-default %TAG; all %TAGs are explicit. The YAML 'process' has nothing to do with tag 'resolution', unspecified tags are actaully concretely represented via a logal tag, which can be interpreted by the application in a manner it wishes. example: This particular document, %TAG yaml.org,2002: yaml %TAG clarkevans.com,2004:bing/ # default --- bing: - !int wobble - !!int JMP - !yaml^int 23 - 23 plain: - 'single' - "double" - |- literal - > folded Would be parsed as the following, --- !unspecified-mapping !unspecified-implicit "bing": !unspecified-sequence - !tag:clarkevans.com,2004:bing/int "wobble" - !!int "JMP" - !tag:yaml.org,2002:int "23" - !unspecified-implicit "23" !unspecified-implicit "plain": !unspecified-mapping - !unspecified-scalar "single" - !unspecified-scalar 'double' - !unspecified-scalar "literal" - !unspecified-scalar "folded" This would be parsed (a second time) as exactly the same document since all of the tags are providied, the !! doesn't occur, and since there isn't a %TAG directive. compatibility: This is completely up to the application to provide. Clark |
From: T. O. <tra...@ru...> - 2004-09-05 15:59:59
|
Wow! T. |
From: Damian C. <pd...@al...> - 2004-09-05 16:59:18
|
On Sunday, Sep 5, 2004, Clark C. Evans wrote: > However, it was _clearly_ correct to allow applications to type > their data differently depending if the scalar was plain or not. I suppose this is true. I have been caught out by implicit typing in PyYAML: a hyphen used as a key was converted to the number zero, and even quoting the offending scalar did not suppress implicit typing :-( In the end my application had to test for zero explicitly and map it back to the hyphen... Of course (a) hyphen is no longer a Boolean constant, and (b) I should be able to switch off implicit typing, but that's another story. > We start by having 4 built-in tags specified by the YAML > specification. These tags are, > > unspecified-mapping > unspecified-sequence > unspecified-implicit > unspecified-scalar Why not use shorter names, like map, seq, naked, and quoted? > - We can still allow a user to distinguish between plain and > quoted scalars; but _only_ if they have not specified a tag for > the > node. I agree. Once you have determined the desired type to convert the node to there is no need for second-guessing based on syntax. > For example, a really-really-long > tag-free plain scalar could be converted to a double quoted string > by providing the 'unspecified-implicit' tag. It would be a bit weird, but I guess it should be allowed. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |
From: Damian C. <pd...@al...> - 2004-09-05 17:28:50
|
On Sunday, Sep 5, 2004, Clark C. Evans wrote: > - Tags starting with a word followed by a colon, are considered > "global" types, and are also passed on through. Note that "global" > tags include URIs, as well as Perl package names, such as > Perl::Package I am worried that we are allowing global tags from different sources to be mixed. In other words, the tags specified as - !Banjo::Player Jack O'Leary - !http://example.com/2012/bar 12345 - !std::string 12345 - !FFFF::127.0.0.1 Bonko! are passed to the calling application as 'global' tags. While I agree it is unlikely that there will be name clashes between Perl package names and, say, C++ classes, there are no guarantees that this is the case, or that future tag systems will not invent a notation using colons. The only way to *ensure* no collisions ever occur is to restrict one's choice of tags to a single system, such as URIs. Tag URIs make this easy enough there should be no excuses. So if you must use Perl (or C++) identifiers as tags, then either they have to be counted as local tags, or else transformed in to URIs through some convention like tag:yaml.org,2004:perl:Banjo::Player or tag:yaml.org,2004:c++:Banjo::Player. Another reason to not consider Perl (or any language's) type names as global tags is that they encode low-level information that should not really be being used in YAML document-streams shared between unrelated entities. In other words, *if* you care about your tags being globally unique, you should not be using tags that imply a choice of programming language for the YAML processor. OK, it would not be a fatal problem for language-specific info to be encoded in to tags; other users can ignore it and treat tags as opaque identifiers nevertheless. But it would be inelegant. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |
From: Oren Ben-K. <or...@be...> - 2004-09-05 17:30:54
|
On Sunday 05 September 2004 09:24, Clark C. Evans wrote: > This is draft 9a, based on the 9th pass draft. I think with this > draft I finally 'grok' what Brian is saying: This completely ignores my use case. There's just one extra point I _must_= =20 make: > The most important impact of this change is that tag 'resolution' > need not be detailed in the specification 'proper' -- in fact, it > is completely up to the application, as Brian has been insisting. > Resolution is no longer special, it is simply a transformation of > the input graph to produce another graph, where the target graph is > the one that is actual loaded into native data structures. =A0While > this should be explained in the specification; the only "limits" > placed on the transformation is that serialization and presentation > features should not be used during this process. So, in effect, tags now mean "whatever the current piece of code says they= =20 mean"? =46or example, my application can now convert all !foo.com,2003:bar/baz=20 to !baz.com,2004/foo/bar in sequence entries with prime indices, and the wo= rd=20 "chaos" doesn't appear in the document an even number of times. Lovely. I now understand how Brian feels when he thinks both of us went=20 completely ethereal. I have no idea at all what you are trying to achieve,= =20 and why. Reading your proposal, I see something like that: 1. If a tag isn't specified, its converted to=20 "unspecified-mapping/sequence/scalar/plain". Fine. 2. The YAML specification intentionally does not say anything about how tag= s=20 are to be used. Specifically, issues of uniqueness, mix-and-matching schema= s,=20 migrating "private" documents to "public" documents and so on are beyond th= e=20 scope of the specification. 3. Oh, BTW, here's something nice: You can use %TAG to have a prefixing=20 shorthand for some arbitrary set of tags. Since apps can do anything they=20 want with tags, there's no guarantee that using this means anything in=20 particular - for example, two nodes with the same %TAG based tag may have=20 completely different types - but hey, its neat, and you _can_ use it for go= od=20 deeds if you feel like it. Or you can just ignore it (most people do). All I can say is: -1000. Could someone please tell me again what's wrong with #7? Ah, the extra "!",= =20 right. Well, using proposal #9 to solve this is like nuking your home to ge= t=20 rid of a fly. Have fun, Oren Ben-Kiki |
From: David H. <dav...@bl...> - 2004-09-05 19:34:16
|
Clark C. Evans wrote: > summary: > > This is draft 9a, based on the 9th pass draft. I think with this > draft I finally 'grok' what Brian is saying: > > The Application is in _complete_ control of how a YAML > document gets loaded into native language types. > > I agree. But that said, I still want a succinct mechanism for > specifying globally unique names in YAML. > > a nasty wart: > > Our current information model, especially section 3.3 (Completeness) > is poor thinking, or, at the very least is too complicated. Much of > the complexity emerged from the 'implicit' typing of nodes having the > plain scalar style. When this section was written, we pictured the > parser producing 'untagged' nodes as having an empty tag. However, to > allow an application to do cool things with plain scalars, we > pictured a "hack" flag that was present for scalars that signaled if > the node had come from a plain scalar. > > This visualization bugged us. Beacuse a property of the Presentation > Model (the style of scalar nodes) seemed to be "bleeding" into the > rest of the model. It gave Oren and I a very uncomfortable feeling. > However, it was _clearly_ correct to allow applications to type > their data differently depending if the scalar was plain or not. > > So, things were not good. And a good part of Section 3.3 is an > attempt to limit the impact of this ugly hack. So, we invented this > new thing, "tag resolution", where those pesky empty tags were filled > in, so to speak. Also, right after tag resolution, we pictured that > pesky 'isPlainStyle' flag going bye bye. It was a wart. > > a new hope: > > Lukly, David has shown us an alternative path. The distinction > between untagged plain scalars and untagged quoted scalars _can_ > after all be reported without introducing this ugly wart flag -- > without the "bleeding" of presentation information into the > representation model. Thank goodness. So here is how it is done. > > We start by having 4 built-in tags specified by the YAML > specification. These tags are, > > unspecified-mapping > unspecified-sequence > unspecified-implicit > unspecified-scalar Several versions of these names have been suggested. This version isn't right because plain scalars ("unspecified-implicit") are also scalars. "unspecified-plain" and "unspecified-nonplain" would probably be better. > Plus an parsing rule, so that the following, > > --- > plain: > - 'single' > - "double" > - | > literal > - > > folded > > is simply syntax sugar for, > > --- !unspecified-mapping { > !unspecified-implicit "plain": > !unspecified-sequence [ > !unspecified-scalar "single", > !unspecified-scalar "double", > !unspecified-scalar "literal\n", > !unspecified-scalar "folded" > ] > } > > That is, both of the documents above have exactly the same YAML > Representation. What this also tells us is that the what we thought > was a very special thing -- tag resolution -- isn't so special after > all. Since the parser's results can always have tags filled-in, and > deliver content in the exact structure of the 'Node Graph > Representational Model', we do not need to worry about tag > resolution. No bleeding at all! No special treatment required. > > impacts: > > - The most important impact of this change is that tag 'resolution' > need not be detailed in the specification 'proper' -- in fact, it > is completely up to the application, as Brian has been insisting. > Resolution is no longer special, it is simply a transformation of > the input graph to produce another graph, where the target graph is > the one that is actual loaded into native data structures. While > this should be explained in the specification; the only "limits" > placed on the transformation is that serialization and presentation > features should not be used during this process. > > - We can still allow a user to distinguish between plain and > quoted scalars; but _only_ if they have not specified a tag for the > node. Once a tag is provided, the syntax sugar doesn't happen, so > a parser would not report the difference between a plain or single > quoted node for tagged nodes. This is a sensible thing, it allows > people to change styles without worrying about changing the YAML > Representation of the data. For example, a really-really-long > tag-free plain scalar could be converted to a double quoted string > by providing the 'unspecified-implicit' tag. > > - Allows for two scalars with same content to be used in a mapping, > for example, { 3: "number 3", '3': "string 3" } is perfectly legal > since the first key is !unspecified-implicit and the second key has > a tag of !unspecified-scalar. This is expected behavior; in one > system the first item may be converted to an integer, and the > second key into a string so that are not any duplicate keys. > > However, on a system with only strings, the resulting document > would be an exception. This inconsistency was difficult to > explain, but it is easy now. In the system with only strings, the > 'resolution' process (which is after parsing, before loading) > created a duplicat key, and thus is in error. The document itself, > is _not_ in error. So, the seeming inconsistency is moved from the > document using 'implicit' types into the application's domain who's > transform created invalid YAML Representation Grap. This is > where we'd expect the error, and it is now in the correct spot. > > - In the same light, if we allowed the behavior above, the > previous line of thinking had problems with the document > { 3: 'number 3', 3: 'string 3' }. Clearly the document itself > is invalid; however, it was previously unclear where the problem > was at. Now it is obvious, the equivalent representation > after the syntax sugar is expanded has a duplicate key, > ('unspecified-implied', '3'). Another problem successfully > averted. > > - We also move 'str', 'map' and 'seq' tags out of the specification. > They properly belong in the YAML Tag Repository. This matches with > expectations, while a YAML document may contain an > 'unspecified-mapping', we shouldn't require that a native binding > actually have a hashtable or dictionary implementation. It could, > with a very strict resolution transform, map the unspecified > mapping onto a COBOL record. I always wondered 'why' these > tags were special and other built-in types were not. > > - I can't think of any downsides; this syntax sugar rocks. Good explanation. I hadn't realized just how bad the previous situation was :-), because I never really had a mental model of a resolution process working on different input and output infosets. > on globalization: > > While creating global tags is what started this whole discussion > this month, it is related to the above wart removal. The proposed > %TAG mechanism is also syntax shorthand to help build long tags. > Since we already had a tag 'resolution' process, it seemed logical > to move this globalization operation there as well. Er, actually we told you all along that was a bad idea. Oren wrote: # NO. You are mixing globalization ("cooking") with resolution ("implicit") # again. # # - Input: YAML tags as in the document, some missing. # # - GLOBALIZATION/COOKING: Purely syntactical, no tables, no schema, looks # at directives. # # - Intermediate1: Globalized tags, plus private tags, and some missing tags. # # - RESOLUTION/IMPLICIT: Schema-driven, tables, what not. # # - Intermediate2: All tags are known (some global, some private). # # - RECOGNITION: Per platform (language, app, etc.). # # - Intermediate3: Which native data structure to use for each tag. # # - CONSTRUCTION: Per platform (language, app, etc.) # # - Final result: Actual native data structures. # # Please stop pushing RESOLUTION features into the GLOBALIZATION phase. > Brian objected, and rightly so. The tag "resolution" process is > actually a full-blown transformation from one YAML representation to > another. While it may happen subtly, and without actual code, it is > logically converting one document (which is ambiguous due to its > unspecified tags and whatnot) to another which more closely resembles > what the application needs. It is completely unrelated to tag > globalization. > > While tag globalization is a syntax-rewrite trick, it happens at a > much lower level; it is not a transformation of the parser's output, > "cooking" so to speak can (and probably should) happen at parse time. Now you get it :-) With that out of the way, we can concentrate on defining the simplest possible mechanism for "cooking" that satisfies the requirements, which is #7 or its variants. #9a is no simpler, less backward-compatible, and doesn't seem to have any other advantages over #7a or #7b. > - The 'unspecified-plain' tag, if any still remain, is processed by > the parser against any regular expressions in any YAML types > from the YAML Type Repository it knows about. This is inherently > a fuzzy process; but, a processor should make good and try to > resolve as many YAML Types as it can. All remaining > 'unspecified-plain' tags are mapped to 'tag:yaml.org,2002:str' This is Implementors' Guide stuff, not specification (especially since it may change as the repository changes). > - The YAML processor then may choose to match any remaining local > tags against types it knows about from the YAML Type Repository. > In particular, it could choose to map !int to 'tag:yaml.org,2002:int', > or, if the YAML processor doesn't know about int, it may just pass. No, no, and thrice no. !int et al are too important to leave to a fuzzily defined process outside the specification. In #7a, !int *always* means "tag:yaml.org,2002:int". In #7b, !int *always* means "int" appended to the stream's no-!-prefix. This is predictable and unsurprising, which is a good thing for core types. If you want a local type that may or may not get converted to "tag:yaml.org,2002:int" later, use !!integer or something else. (OTOH, if you really want it to work the way you've just described in #7b, write "%PRE # undefine yaml.org prefix" at the start of the stream. Plenty of rope...) -- David Hopwood <dav...@bl...> |
From: Oren Ben-K. <or...@be...> - 2004-09-05 17:42:08
|
On Sunday 05 September 2004 09:44, Clark C. Evans wrote: > | But the way I see it this is a minor wording change in the spec > | (instead of talking about a wart-ish 'plain scalar' bit). > > Actually, it impacts one of the diagrams to make it simpler. The > notion of 'tag resolution' is just gone. This isn't a small > wording change... I hope I made this clear in #9a. Yes, it upgraded the A-bomb (#9) to an H-bomb (#9a) :-) I still think we just need a fly-swatter (#7). > Having !sometag reported to my application as > tag:private.yaml.org,2002:sometag is really hackish. I put in !sometag, > I should get 'sometag'. 1. In the context of #7, if you specify !tag, you get !yaml.org,2004:tag. 2. I suppose you meant, if I put in "!!tag", I should get "!tag". No, in the context of #7, if you put in "!<handle>!tag", you get "<value-of-handle>tag", and that also holds for "!!tag" and "<value-of-empty-handle>tag". > --- > - !int JMP > > is cooked to, > > --- > - !tag:private.yaml.org,2002:int JMP > > I thought this was backwards compatible? PLEASE... we are talking about #7 here, not #8. "!int" is cooked to "! yaml.org,2002:int". "!!private" is cooked to "<value-of-empty-handle>private" where "<value-of-empty-handle>" is either: - Specified in an %TAG - Is some default value (David suggests "tag:private.yaml.org,2002:", originally the proposal was "", both have pros and cons and its secondary to the proposal). This *IS* backward compatible. BTW, just for my curiosity. I haven't seen a single post that gives any down side to #7 (or #7a), other than "We don't like the extra !". Does _anone_ has any other problem with #7 other than the extra "!"? No credit for problems that also appear in #8 and #9 :-) Have fun, Oren Ben-Kiki |
From: Damian C. <dam...@gm...> - 2004-09-05 20:37:12
|
On Sun, 5 Sep 2004 20:42:03 +0300, Oren Ben-Kiki <or...@be...> wrote: > BTW, just for my curiosity. I haven't seen a single post that gives any down > side to #7 (or #7a), other than "We don't like the extra !". Does _anone_ has > any other problem with #7 other than the extra "!"? No credit for problems > that also appear in #8 and #9 :-) I like #7, #7a, and #8 well enough with a slight preference for #7. Proposal #9 seems very complicated compared to its immediate predecessors. I'm not sure there is much odds between !int, !!circle versus !!int, !circle. The cooking process defined in proposal #7 is simpler to describe. The search & replace needed to convert to using a global tagspace is less ambiguous ('convert !!bar to !foo!bar', as oppoosed to 'convert !bar to !foo^bar, unless bar contains ^ or is a URI'). There is not a huge difference between !prefix!specific versus !prefix^specific, which is the main other difference between the two. I think the former looks neater, and it has the advantage of being a disjoint syntax from the old cut^paste. There are other features introduced with version #9. One is this idea of nodes lacking a tag being assigned one based on the shape of the node. This is OK. I imagine some C programmers will be annoyed they now have to change some code like if (!tag) { .... } to if (strcmp(tag, UNSPECIFIED_SCALAR) == 0 || strcmp(tag, UNSPECIFIED_IMPLICIT) == 0) { ... } but implementers in other languages probably won't notice the difference. I think this aspect of proposal #9 would combine with proposal #7 without too many problems. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |