From: T. S. <tra...@ru...> - 2004-09-07 05:15:32
|
> I'm still not clear how the '?' tags work. Could you explain > a bit more? Sorry I couldn't get back to you sooner. Internet has been down all day. I'= m currently on dial-up. The ? mechinism was inteded to separate unspecified tags from specified tag= s --obviously. But it also served as a way for other tagspaces to inherit t= he YAML tagspace. I realize now that this particular attempt was too hackis= h --and too limited.=20 Still, I think inheritable tabspaces would be useful. One could define a ne= w tagspace based on two or more others (name clashes not withstanding --tha= t's the price of this freedom.) This kind of behavior can certainly be info= rmally implemented by the application --in which case it lies outside the s= cope of YAML's specification. The crucial distinction here is that the beh= avior could be formalized and controllable from the document level. Yes, this is a complication --b/c it is a rich new transformative feature, = not just a syntactical substitute. Nonetheless it is simply a derivation of= a very well accepted, cornerstone practice of OOP: class inheritence. -T. > On Sun, Sep 05, 2004 at 11:08:52PM -0400, T. Onoma wrote: > | Summary: > |=20 > | This is the tenth-pass draft, primarily based on the > | seventh-pass series and the ninth-pass series. > |=20=20=20 > | The focus of this draft is the formalization of the > | YAML tag system, and its requirements for native type > | resolution (proper word?) in conformance to the YAML 1.1 > | specification, which this draft defines. > |=20=20=20 > | # Note: None of this has been approved by Brian yet. Also, the > | # YAML 1.1 notion has not received any feedback yet. It isn't > | # crucial for this proposal, though. > |=20 > | Claims: > |=20 > | - The Application is in _complete_ control of how a YAML > | document gets loaded into native language types. > |=20 > | - The current tag system is limited, and some aspects of it > | are simply hackish. Notably: > |=20=20=20=20=20=20=20=20 > | - Complexity emerging from the 'implicit' typing of nodes > | having the plain scalar style, as outlined in section 3.3=20 > | (Completeness). > |=20=20=20=20=20=20 > | - Appearent "bleeding" of properties of the Presentation > | Model (the style of scalar nodes) into the rest of the model.=20 > |=20=20=20=20=20=20=20=20=20 > | - Hackish attempts, in parts of Section 3.3, to limit the impact > | of above mentioned flaw, namely "tag resolution". > |=20=20=20=20=20=20 > | - The old cut-and-paste tag shortcut is insufficiant in its > | abilites to handle mixtures of different global tags. > |=20=20=20=20 > |=20=20=20=20=20 > | Corollaries: > |=20 > | - It is _clearly_ correct to allow applications to type their > | data according to scalar decoration, i.e. plain or not. > |=20=20=20=20=20=20 > | - YAML's Type Repositoty is especially useful for interoperability > | between variant platforms, but it is no _more_ (or less) > | important than an application's native types. > |=20=20 > |=20=20=20=20 > | Solution Overview: > |=20 > | - Remove all forms of tag shorthands and prefixing. We can leave the > | cut-and-paste mechanism for backward compatability. If at a > | later time it is deemed worthless and unncessary, it can be removed. > |=20=20=20 > | - Introduce a new directive, %TAG, that associates a <handle> with > | a tagURI <prefix>. > |=20=20=20=20=20 > | - There are two primary species of tag, namely specified and unspecif= ied. > | Specified tag begin with an exlamation mark, '!'. Unspecified tags > | are implied and generally not written, but can be. When they are th= ey > | are written they begin with a question mark, '?'. > | > | - The are two general variations of tags. > |=20=20=20 > | - Global tags are those that are globally unique, traditionally, > | these have been URIs; that is, they start with a word followed by= a > | colon and use only URI characters. Strictly speaking, Perl::Packa= ges > | happen to match this production, so they could also be considered > | global even though they are not URIs. > |=20=20=20=20=20=20=20 > | - Local tags are those are all other tags. They only have meaning= =20 > | accordoing a given processing environment. They do not need to be > | globally unique and therefore must be used cautiously in document > | sharing scenarios. > |=20=20=20 > | - There are only four (built-in) unspecifed local tags: > |=20=20=20 > | ?unspecified-mapping > | ?unspecified-sequence > | ?unspecified-plain-scalar > | ?unspecified-decorated-scalar > |=20=20=20=20=20=20 > | Every node _without_ a specific tag implies a tag of an > | unspecifed kind according to it's presentational context.=20 > |=20 > | - Global tags can be "unspecified" as well, in which case they=20 > | are termed "inherint". Like local unspecified tags these > | are usually not written. > |=20=20=20=20=20=20=20 > | - Parsing the tags of a document begins with "cooking",=20 > | or more formally 'tag formalization'. Cooking does two things: > |=20=20=20=20=20 > | - Adds in all correpsonding literal forms of unspecified missing ta= gs. > |=20=20=20=20=20 > | - All handles are substitued for the tagURI's in the TAG directives. > | They are made inherient if that option is specified. > |=20=20=20=20=20 > | - After parsing another process is pplied called "tag specification", > | which I will call "distilling". This is inherintly a higher order p= rocess > | --a transformation, and involves: > |=20=20=20=20=20 > | - Transforming unspecified tags into specified tag > |=20=20=20=20=20 > | - Transforming local tags into global tags > |=20=20=20=20=20 > | The exact transformations are defined by the application. > |=20=20=20=20=20 > |=20=20=20=20=20 > | Solution Details: > |=20 > | Tag Substitution: > |=20=20=20 > | An example of the parsing rule is as follows: > |=20 > | ---=20 > | plain: > | - 'single' > | - "double" > | - | > | literal > | - > > | folded > |=20=20=20=20=20=20 > | is simply syntax sugar for, > |=20 > | --- ?unspecified-mapping { > | ?unspecified-plain-scalar "plain": > | ?unspecified-sequence [ > | ?unspecified-decorated-scalar "single",=20 > | ?unspecified-decorated-scalar "double",=20 > | ?unspecified-decorated-scalar "literal\n", > | ?unspecified-decorated-scalar "folded"=20 > | ] > | } > |=20 > | Both of the documents above have exactly the same YAML > | Representation.=20=20 > |=20 > | - We open up the tag mechanism !tag to allow any non-space > | characters to be used. However, the resulting tag must be > | valid according to the requirements of the URI scheme used. > | The following characters are marked as 'unwise' in RFC2396, > | regardless of the URI scheme: > |=20 > | { } | \ ^ [ ] ` > |=20 > | (However, [ and ] are expected to be used in certain URIs in > | the future). > |=20 > | These characters will provide an 'escape hatch' for current and > | future extensions to YAML. With this change, any URI can be > | directly used as a !tag. We really can't use {} or [] since they > | signify mappings and lists. The \ character is used for escaping, > | and we use | to signify block and the backtick looks too much=20 > | like the single quote to be useful. This leaves the ^ delimiter, > | which was already used for the older cut^paste mechanism. > |=20 > | - Tags specified in the YAML Repository under the yaml.org tagURI sha= ll > | be limited to: > |=20 > | word-char ( '/' | word-char | '#' )* > |=20 > | This allows us to use any of the various non-word ASCII chars to > | introduce additional tag processing mechanisms while still allowing > | yaml.org tags to contgain hierarchy and fragments. Note: This does= not > | endorse the use of hierarchy and fragments in yaml.org tags, just = allows > | their use in the future in case it is discovered to be necessary. > |=20=20=20=20=20 > | - We introduce a new directive 'TAG' which provides a way to shorten > | the data entry of tagURIs. In particular, > |=20 > | declaration :=3D "%TAG" [ WS handle ] WS taggingEntity ":" spec_f= irst [ WS "(?)" ] > |=20=20=20=20=20=20=20 > | Where 'taggingEntity' refers to the same production in the tagURI > | specification and WS is white space. The taggingEntity refers to > | either a domain or email address followed by the minting date; > | see tagURI specification for details. The 'spec_first' refers to z= ero > | or more non-space characters (it is optional). > |=20 > | The 'handle' refers to a sequence of one or more word characters > | [a-zA-Z0-9_] or "!". Optionally the handle can be missing, this=20 > | case is called the 'default prefix' in which case the handle is=20 > | considered to be the empty string ''. In a YAML document,=20 > | each handle must be unique via string comparison. > |=20 > | - We extend the !tag mechanism to allow a single '!' character, > | which is in the reserved characters above, the syntax for this > | special case is, > |=20 > | taguri :=3D '!' handle spec_second > |=20=20=20=20=20=20=20=20 > | In this circumstance, the 'handle' _must_ appear as a handle in one > | of the stream's directives. The 'spec_second', is zero or more > | non-space characters; with the restriction that either spec_first or > | spec_second (or both) must be at least one character. > |=20=20=20=20=20 > | - The optional "(?)" on the end of the TAG directive indicates that > | tags with matching handles should be "cooked" to be unspecified, > | changing the '!' to '?'. > |=20 > |=20=20=20=20=20=20=20=20=20 > | Tag Resolution: > |=20=20=20=20=20 > | - Resolution refers to a process after the application has been > | provided a valid YAML Representation, and before the application= =20 > | has loaded this representation into native data structures. > |=20 > | - An application may choose to alter the input document in any way it > | sees fit, provided that it only uses information provided in the Y= AML > | Representation model for this transformation. In particular, style > | information, key order, and other presentation or serialization > | attributes should not be used to guide the transformation process. > |=20=20=20=20 > | - In particular, if the application chooses to use types from > | the YAML Type Repository, it may choose to use a helper=20 > | document transformation which the parser may provide. > |=20=20=20=20 > | - A YAML parser may wish to provide a 'helper' transformation=20 > | which fills in unspecified tags, and converts short 'local' > | tags which seem to refer to YAML types to their global variety. > |=20=20=20=20 > | - Unspecified tags could be converted as follows: > |=20 > | - ?unspecified-sequence -> 'tag:yaml.org,2002:seq' > | - ?unspecified-mapping -> 'tag:yaml.org,2002:map' > | - ?unspecified-decorated-scalar -> 'tag:yaml.org,2002:str' > |=20 > | - The 'unspecified-plain' tag, if any still remain, is processed by > | the parser against any regular expressions in any YAML types > | from the YAML Type Repository it knows about. This is inheritly > | a fuzzy process; but, a processor should make good and try to > | resolve as many YAML Types as it can. All remaining=20 > | 'unspecified-plain' tags are mapped to 'tag:yaml.org,2002:str' > |=20 > | - For "unspecified" global tags created from "(?)" option on a TAG d= eclaration, > | the portion of the tag between 'tag:' and ':' is transformed to 'y= aml.org,{year}'; > | and local tags tags are transformed into global tags of the form > | 'yaml.org,{year}:tagname', and the "!" is replaced with a "!". > |=20=20=20=20=20=20 > | - The YAML processor then may choose to match any remaining local > | tags against types it knows about from the YAML Type Repository. > | In particular, it could choose to map !int to 'tag:yaml.org,2002:i= nt', > | or, if the YAML processor doesn't know about int, it may just pass. > |=20 > |=20 > | Implications: > |=20=20=20=20=20=20=20 > | - Since the parser's results can always have tags filled-in, and > | deliver content in the exact structure of the 'Node Graph > | Representational Model', we do not need to worry about tag > | resolution. No "bleeding" at all! > |=20 > |=20=20=20=20=20=20 > | Many of ideas from #7 and #9 apply ... > |=20=20=20 >=20 > --=20 > Clark C. Evans Prometheus Research, LLC. > http://www.prometheusresearch.com/ > o office: +1.203.777.2550=20 > ~/ , mobile: +1.203.444.0557=20 > // > (( Prometheus Research: Transforming Data Into Knowledge > \\ , > \/ - Research Exchange Database > /\ - Survey & Assessment Technologies > ` \ - Software Tools for Researchers > ~ * >=20 |
From: Sean O'D. <se...@ce...> - 2004-09-07 16:12:52
|
On Tuesday 07 September 2004 07:42, Clark C. Evans wrote: > > Clearly, if one 'transforms' the graph on the way in, the 'schema' > is being altered, so if one wishes to round-trip, a reverse transform > on the way out would be required to convert the data structure back. > But this, once again, is an application-specific thingy. Schemas, I thought, are for validation and wouldn't alter the graph in any way. A set of transformation rules would be something different from a schema, and it would probably do nothing more than re-tag nodes, so it should be re-emittable all the way back from the fully loaded state. Re-tagged nodes could be loaded with native language data types, depending on what sort of programming interface the loader gives the programmer. Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-09-07 16:44:56
|
On Tue, Sep 07, 2004 at 09:01:49AM -0700, Sean O'Dell wrote: | Schemas, I thought, are for validation and wouldn't alter the | graph in any way. I agree, but people have been using "schema" to also describe the idea that implicit types would be "filled-in" by the schema. Technically, you could say the schema is an identity transform, that is capable of raising exceptions. | A set of transformation rules would be something different from a | schema, and it would probably do nothing more than re-tag nodes, | so it should be re-emittable all the way back from the fully | loaded state. Re-tagged nodes could be loaded with native language | data types, depending on what sort of programming interface the | loader gives the programmer. Yes. And re-tagging nodes is a bonified transformation, and one would rigthly want to define this retagging process as a rewrite of the YAML Representation to an equivalent form. If one were clever, each one of these transforms could be described in such a way that it would be reversable, so that it creates an isomorphism between the two spaces. But in any case, this transform will be application specific, and could get quite involved. Cheers! Clark |
From: Sean O'D. <se...@ce...> - 2004-09-07 17:14:30
|
On Tuesday 07 September 2004 09:44, Clark C. Evans wrote: > On Tue, Sep 07, 2004 at 09:01:49AM -0700, Sean O'Dell wrote: > | Schemas, I thought, are for validation and wouldn't alter the > | graph in any way. > > I agree, but people have been using "schema" to also describe the idea > that implicit types would be "filled-in" by the schema. Technically, > you could say the schema is an identity transform, that is capable > of raising exceptions. So, I'm thinking: parse-> load scalars and resolve tags-> *transform taggings-> *validate-> load native types. Does that sound right? Sean O'Dell |
From: David H. <dav...@bl...> - 2004-09-07 18:26:14
|
Sean O'Dell wrote: > On Tuesday 07 September 2004 07:42, Clark C. Evans wrote: > >>Clearly, if one 'transforms' the graph on the way in, the 'schema' >>is being altered, so if one wishes to round-trip, a reverse transform >>on the way out would be required to convert the data structure back. >>But this, once again, is an application-specific thingy. > > Schemas, I thought, are for validation and wouldn't alter the graph in any > way. A set of transformation rules would be something different from a > schema, and it would probably do nothing more than re-tag nodes, so it should > be re-emittable all the way back from the fully loaded state. As it happens, once you have the mechanisms needed to validate a graph, very little additional mechanism is needed to transform it. A schema language essentially parses the graph, yielding a parse tree in which each parse tree node corresponds to a graph node. At that point you have all the information needed to fill in unspecified types: for each node, the schema production tells you what kind of node it "should" be. In simple cases, all you then need is a mapping from schema productions to tags, which is used to change any unspecified tags. In more complicated cases, the API could call an application-provided function depending on the production for each node. -- David Hopwood <dav...@bl...> |
From: Sean O'D. <se...@ce...> - 2004-09-07 19:19:07
|
On Tuesday 07 September 2004 11:26, David Hopwood wrote: > Sean O'Dell wrote: > > On Tuesday 07 September 2004 07:42, Clark C. Evans wrote: > >>Clearly, if one 'transforms' the graph on the way in, the 'schema' > >>is being altered, so if one wishes to round-trip, a reverse transform > >>on the way out would be required to convert the data structure back. > >>But this, once again, is an application-specific thingy. > > > > Schemas, I thought, are for validation and wouldn't alter the graph in > > any way. A set of transformation rules would be something different from > > a schema, and it would probably do nothing more than re-tag nodes, so it > > should be re-emittable all the way back from the fully loaded state. > > As it happens, once you have the mechanisms needed to validate a graph, > very little additional mechanism is needed to transform it. A schema > language essentially parses the graph, yielding a parse tree in which each > parse tree node corresponds to a graph node. At that point you have all the > information needed to fill in unspecified types: for each node, the schema > production tells you what kind of node it "should" be. In simple cases, all > you then need is a mapping from schema productions to tags, which is used > to change any unspecified tags. In more complicated cases, the API could > call an application-provided function depending on the production for each > node. Actually, validation has branches and conditions that don't map very well to a simple transformation. The parser would do the parsing and a module somewhere between the parser and loader would validate the data against a template of conditions which determine whether or not the data is in the right format and values are within tolerances, and so forth. The other way around is more true. If you have transformation, you have the foundation for validation, although you'd probably want to add more functionality. Sean O'Dell |
From: David H. <dav...@bl...> - 2004-09-08 02:40:09
|
Sean O'Dell wrote: > On Tuesday 07 September 2004 11:26, David Hopwood wrote: >>Sean O'Dell wrote: >> >>>On Tuesday 07 September 2004 07:42, Clark C. Evans wrote: >>> >>>>Clearly, if one 'transforms' the graph on the way in, the 'schema' >>>>is being altered, so if one wishes to round-trip, a reverse transform >>>>on the way out would be required to convert the data structure back. >>>>But this, once again, is an application-specific thingy. >>> >>>Schemas, I thought, are for validation and wouldn't alter the graph in >>>any way. A set of transformation rules would be something different from >>>a schema, and it would probably do nothing more than re-tag nodes, so it >>>should be re-emittable all the way back from the fully loaded state. In general, without having validated the graph against a schema (and as a side effect found which nodes correspond to which schema productions), you don't have enough information to apply a useful transformation. Maybe in simple cases you can do a transformation just by looking at each node individually, but that's very limited. >>As it happens, once you have the mechanisms needed to validate a graph, >>very little additional mechanism is needed to transform it. A schema >>language essentially parses the graph, yielding a parse tree in which each >>parse tree node corresponds to a graph node. I may not have been clear: this parsing has nothing to do with parsing of the YAML presentation. Consider a BNF grammar: it can be used to both validate an input sequence, and (if the grammar is unambiguous) parse it into a tree. More generally, a grammar can take a sequence, tree, graph, or combination of these as input, and only if the data matches the schema, output a tree that describes the data in terms of the schema productions. A class of grammars that is particularly well-suited to this are the Parsing Expression Grammars; see <http://en.wikipedia.org/wiki/Parsing_expression_grammar>. >>At that point you have all the >>information needed to fill in unspecified types: for each node, the schema >>production tells you what kind of node it "should" be. In simple cases, all >>you then need is a mapping from schema productions to tags, which is used >>to change any unspecified tags. In more complicated cases, the API could >>call an application-provided function depending on the production for each >>node. IOW, parsing the graph, which is needed for validation, already does almost everything you need in order to transform the graph. > Actually, validation has branches and conditions that don't map very well to a > simple transformation. The parser would do the parsing and a module > somewhere between the parser and loader would validate the data against a > template of conditions which determine whether or not the data is in the > right format and values are within tolerances, and so forth. The grammar defined using the schema language expresses the necessary branches and conditions. Look at RELAX NG for an example: <http://www.relaxng.org/compact-tutorial-20030326.html>. It's not entirely suitable for YAML, but the principle is the same. -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-07 14:42:12
|
On Mon, Sep 06, 2004 at 10:15:21PM -0700, T. Sawyer wrote: | this is a complication --b/c it is a rich new transformative feature, | not just a syntactical substitute. Nonetheless it is simply a derivation | of a very well accepted, cornerstone practice of OOP: class inheritence. The proposals put forth by you and Sean are both very useful graph rewrite rules -- not simple syntax tricks. While they may not have a home in the specification proper, they seem to be a wonderful beginning of a schema/transformation language. I think this is related to my other post, 'the rise of an Imp', since such a schema language would rightly be a 'transform' applied to a generic YAML graph after it has been parsed, but before it is converted into native objects: Presentation Serialization Representation Native Character -> Event -> Node -> Data Stream Tree Graph Structure (parse) (compose) (construct) Basically, I see these 'transformation' languages fitting in between "compose" and "construct". That is, they would take as input a random-access generic binding of YAML objects, "transform" the graph as needed, and then the restructured graph is passed on to the construction phase before it is given to the application. This is an optional step, and while we may 'mention' this in the spec, it certainly won't get its own diagram or much more than a few lines of "you may do xxx" just to re-enforce the idea of where something like this fits in. This stage need not even 'change' the graph, it may just validate the graph and if the graph passes, becomes a noop. Clearly, if one 'transforms' the graph on the way in, the 'schema' is being altered, so if one wishes to round-trip, a reverse transform on the way out would be required to convert the data structure back. But this, once again, is an application-specific thingy. Cheers, Clark |