Thread: [Yaml-core] Re: Proposal

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> I'm still not clear how the '?' tags work.  Could you explain
> a bit more?

Sorry I couldn't get back to you sooner. Internet has been down all day. I'=
m currently on dial-up.

The ? mechinism was inteded to separate unspecified tags from specified tag=
s --obviously. But it also served as a way for other tagspaces to inherit t=
he YAML tagspace. I realize now that this particular attempt was too hackis=
h --and too limited.=20

Still, I think inheritable tabspaces would be useful. One could define a ne=
w tagspace based on two or more others (name clashes not withstanding --tha=
t's the price of this freedom.) This kind of behavior can certainly be info=
rmally implemented by the application --in which case it lies outside the s=
cope of YAML's specification.  The crucial distinction here is that the beh=
avior could be formalized and controllable from the document level.

Yes, this is a complication --b/c it is a rich new transformative feature, =
not just a syntactical substitute. Nonetheless it is simply a derivation of=
 a very well accepted, cornerstone practice of OOP: class inheritence.

-T.

> On Sun, Sep 05, 2004 at 11:08:52PM -0400, T. Onoma wrote:
> | Summary:
> |=20
> |   This is the tenth-pass draft, primarily based on the
> |   seventh-pass series and the ninth-pass series.
> |=20=20=20
> |   The focus of this draft is the formalization of the
> |   YAML tag system, and its requirements for native type
> |   resolution (proper word?) in conformance to the YAML 1.1
> |   specification, which this draft defines.
> |=20=20=20
> | # Note: None of this has been approved by Brian yet. Also, the
> | # YAML 1.1 notion has not received any feedback yet. It isn't
> | # crucial for this proposal, though.
> |=20
> | Claims:
> |=20
> |   - The Application is in _complete_ control of how a YAML
> |     document gets loaded into native language types.
> |=20
> |   - The current tag system is limited, and some aspects of it
> |     are simply hackish. Notably:
> |=20=20=20=20=20=20=20=20
> |     - Complexity emerging from the 'implicit' typing of nodes
> |       having the plain scalar style, as outlined in section 3.3=20
> |       (Completeness).
> |=20=20=20=20=20=20
> |     - Appearent "bleeding" of properties of the Presentation
> |       Model (the style of scalar nodes) into the rest of the model.=20
> |=20=20=20=20=20=20=20=20=20
> |     - Hackish attempts, in parts of Section 3.3, to limit the impact
> |       of above mentioned flaw, namely "tag resolution".
> |=20=20=20=20=20=20
> |     - The old cut-and-paste tag shortcut is insufficiant in its
> |       abilites to handle mixtures of different global tags.
> |=20=20=20=20
> |=20=20=20=20=20
> | Corollaries:
> |=20
> |   - It is _clearly_ correct to allow applications to type their
> |     data according to scalar decoration, i.e. plain or not.
> |=20=20=20=20=20=20
> |   - YAML's Type Repositoty is especially useful for interoperability
> |     between variant platforms, but it is no _more_ (or less)
> |     important than an application's native types.
> |=20=20
> |=20=20=20=20
> | Solution Overview:
> |=20
> |   - Remove all forms of tag shorthands and prefixing. We can leave the
> |     cut-and-paste mechanism for backward compatability. If at a
> |     later time it is deemed worthless and unncessary, it can be removed.
> |=20=20=20
> |   - Introduce a new directive, %TAG, that associates a <handle> with
> |     a tagURI <prefix>.
> |=20=20=20=20=20
> |   - There are two primary species of tag, namely specified and unspecif=
ied.
> |     Specified tag begin with an exlamation mark, '!'. Unspecified tags
> |     are implied and generally not written, but can be. When they are th=
ey
> |     are written they begin with a question mark, '?'.
> |
> |   - The are two general variations of tags.
> |=20=20=20
> |     - Global tags are those that are globally unique, traditionally,
> |       these have been URIs; that is, they start with a word followed by=
 a
> |       colon and use only URI characters. Strictly speaking, Perl::Packa=
ges
> |       happen to match this production, so they could also be considered
> |       global even though they are not URIs.
> |=20=20=20=20=20=20=20
> |     - Local tags are those are all other tags. They only have meaning=
=20
> |       accordoing a given processing environment.  They do not need to be
> |       globally unique and therefore must be used cautiously in document
> |       sharing scenarios.
> |=20=20=20
> |   - There are only four (built-in) unspecifed local tags:
> |=20=20=20
> |       ?unspecified-mapping
> |       ?unspecified-sequence
> |       ?unspecified-plain-scalar
> |       ?unspecified-decorated-scalar
> |=20=20=20=20=20=20
> |     Every node _without_ a specific tag implies a tag of an
> |     unspecifed kind according to it's presentational context.=20
> |=20
> |   - Global tags can be "unspecified" as well, in which case they=20
> |     are termed "inherint". Like local unspecified tags these
> |     are usually not written.
> |=20=20=20=20=20=20=20
> |   - Parsing the tags of a document begins with "cooking",=20
> |     or more formally 'tag formalization'. Cooking does two things:
> |=20=20=20=20=20
> |     - Adds in all correpsonding literal forms of unspecified missing ta=
gs.
> |=20=20=20=20=20
> |     - All handles are substitued for the tagURI's in the TAG directives.
> |       They are made inherient if that option is specified.
> |=20=20=20=20=20
> |   - After parsing another process is pplied called "tag specification",
> |     which I will call "distilling". This is inherintly a higher order p=
rocess
> |     --a transformation, and involves:
> |=20=20=20=20=20
> |     - Transforming unspecified tags into specified tag
> |=20=20=20=20=20
> |     - Transforming local tags into global tags
> |=20=20=20=20=20
> |     The exact transformations are defined by the application.
> |=20=20=20=20=20
> |=20=20=20=20=20
> | Solution Details:
> |=20
> |   Tag Substitution:
> |=20=20=20
> |     An example of the parsing rule is as follows:
> |=20
> |        ---=20
> |        plain:
> |          - 'single'
> |          - "double"
> |          - |
> |            literal
> |          - >
> |            folded
> |=20=20=20=20=20=20
> |     is simply syntax sugar for,
> |=20
> |        --- ?unspecified-mapping {
> |               ?unspecified-plain-scalar "plain":
> |                   ?unspecified-sequence [
> |                      ?unspecified-decorated-scalar "single",=20
> |                      ?unspecified-decorated-scalar "double",=20
> |                      ?unspecified-decorated-scalar "literal\n",
> |                      ?unspecified-decorated-scalar "folded"=20
> |                   ]
> |            }
> |=20
> |     Both of the documents above have exactly the same YAML
> |     Representation.=20=20
> |=20
> |   - We open up the tag mechanism !tag to allow any non-space
> |     characters to be used. However, the resulting tag must be
> |     valid according to the requirements of the URI scheme used.
> |     The following characters are marked as 'unwise' in RFC2396,
> |     regardless of the URI scheme:
> |=20
> |     { } | \ ^ [ ] `
> |=20
> |     (However, [ and ] are expected to be used in certain URIs in
> |     the future).
> |=20
> |     These characters will provide an 'escape hatch' for current and
> |     future extensions to YAML.  With this change, any URI can be
> |     directly used as a !tag.  We really can't use {} or [] since they
> |     signify mappings and lists. The \ character is used for escaping,
> |     and we use | to signify block and the backtick looks too much=20
> |     like the single quote to be useful.  This leaves the ^ delimiter,
> |     which was already used for the older cut^paste mechanism.
> |=20
> |   - Tags specified in the YAML Repository under the yaml.org tagURI sha=
ll
> |     be limited to:
> |=20
> |         word-char ( '/' | word-char | '#' )*
> |=20
> |      This allows us to use any of the various non-word ASCII chars to
> |      introduce additional tag processing mechanisms while still allowing
> |      yaml.org tags to contgain hierarchy and fragments. Note: This does=
 not
> |      endorse the use of hierarchy and fragments in yaml.org tags, just =
allows
> |      their use in the future in case it is discovered to be necessary.
> |=20=20=20=20=20
> |   - We introduce a new directive 'TAG' which provides a way to shorten
> |     the data entry of tagURIs.  In particular,
> |=20
> |       declaration :=3D "%TAG" [ WS handle ] WS taggingEntity ":" spec_f=
irst [ WS "(?)" ]
> |=20=20=20=20=20=20=20
> |     Where 'taggingEntity' refers to the same production in the tagURI
> |     specification and WS is white space. The taggingEntity refers to
> |     either a domain or email address followed by the minting date;
> |     see tagURI specification for details.  The 'spec_first' refers to z=
ero
> |     or more non-space characters (it is optional).
> |=20
> |     The 'handle' refers to a sequence of one or more word characters
> |     [a-zA-Z0-9_] or "!".  Optionally the handle can be missing, this=20
> |     case is called the 'default prefix' in which case the handle is=20
> |     considered to be the empty string ''.  In a YAML document,=20
> |     each handle must be unique via string comparison.
> |=20
> |   - We extend the !tag mechanism to allow a single '!' character,
> |     which is in the reserved characters above, the syntax for this
> |     special case is,
> |=20
> |        taguri :=3D '!' handle spec_second
> |=20=20=20=20=20=20=20=20
> |     In this circumstance, the 'handle' _must_ appear as a handle in one
> |     of the stream's directives.  The 'spec_second', is zero or more
> |     non-space characters; with the restriction that either spec_first or
> |     spec_second (or both) must be at least one character.
> |=20=20=20=20=20
> |   - The optional "(?)" on the end of the TAG directive indicates that
> |     tags with matching handles should be "cooked" to be unspecified,
> |     changing the '!' to '?'.
> |=20
> |=20=20=20=20=20=20=20=20=20
> |   Tag Resolution:
> |=20=20=20=20=20
> |    - Resolution refers to a process after the application has been
> |      provided a valid YAML Representation, and before the application=
=20
> |      has loaded this representation into native data structures.
> |=20
> |    - An application may choose to alter the input document in any way it
> |      sees fit, provided that it only uses information provided in the Y=
AML
> |      Representation model for this transformation.  In particular, style
> |      information, key order, and other presentation or serialization
> |      attributes should not be used to guide the transformation process.
> |=20=20=20=20
> |    - In particular, if the application chooses to use types from
> |      the YAML Type Repository, it may choose to use a helper=20
> |      document transformation which the parser may provide.
> |=20=20=20=20
> |    - A YAML parser may wish to provide a 'helper' transformation=20
> |      which fills in unspecified tags, and converts short 'local'
> |      tags which seem to refer to YAML types to their global variety.
> |=20=20=20=20
> |    - Unspecified tags could be converted as follows:
> |=20
> |      - ?unspecified-sequence            -> 'tag:yaml.org,2002:seq'
> |      - ?unspecified-mapping             -> 'tag:yaml.org,2002:map'
> |      - ?unspecified-decorated-scalar    -> 'tag:yaml.org,2002:str'
> |=20
> |    - The 'unspecified-plain' tag, if any still remain, is processed by
> |      the parser against any regular expressions in any YAML types
> |      from the YAML Type Repository it knows about.  This is inheritly
> |      a fuzzy process; but, a processor should make good and try to
> |      resolve as many YAML Types as it can.  All remaining=20
> |      'unspecified-plain' tags are mapped to 'tag:yaml.org,2002:str'
> |=20
> |    - For "unspecified" global tags created from "(?)" option on a TAG d=
eclaration,
> |      the portion of the tag between 'tag:' and ':' is transformed to 'y=
aml.org,{year}';
> |      and local tags tags are transformed into global tags of the form
> |      'yaml.org,{year}:tagname', and the "!" is replaced with a "!".
> |=20=20=20=20=20=20
> |    - The YAML processor then may choose to match any remaining local
> |      tags against types it knows about from the YAML Type Repository.
> |      In particular, it could choose to map !int to 'tag:yaml.org,2002:i=
nt',
> |      or, if the YAML processor doesn't know about int, it may just pass.
> |=20
> |=20
> | Implications:
> |=20=20=20=20=20=20=20
> |    - Since the parser's results can always have tags filled-in, and
> |      deliver content in the exact structure of the 'Node Graph
> |      Representational Model', we do not need to worry about tag
> |      resolution.  No "bleeding" at all!
> |=20
> |=20=20=20=20=20=20
> |   Many of ideas from #7 and #9 apply ...
> |=20=20=20
>=20
> --=20
> Clark C. Evans                      Prometheus Research, LLC.
>                                     http://www.prometheusresearch.com/
>     o                               office: +1.203.777.2550=20
>   ~/ ,                              mobile: +1.203.444.0557=20
>  //
> ((   Prometheus Research: Transforming Data Into Knowledge
>  \\  ,
>    \/    - Research Exchange Database
>    /\    - Survey & Assessment Technologies
>    ` \   - Software Tools for Researchers
>     ~ *
>=20

Thread: [Yaml-core] Re: Proposal

yaml-core