From: Clark C. E. <cc...@cl...> - 2004-09-04 00:13:20
|
Thanks Oren. A few exceptions/notes: - !tag values are now open to any non-space character, excepting the hat(^) which has a specific meaning, the slash (\), and the bar(|); tags beginning with an exclamation (!) have specific meaning. - We do not, infact, require that a !tag is any particular URI scheme, nor do we enforce compliance with any scheme; although, we do enforce compliance of the tagURI within a %TAG directive. - Tags which "look like URIs" are defined as those starting with a scheme name, followed by a colon, followed by something that is not a colon. This is resonable since neither the URLs nor URNs (2141) have two adjacent colons. Things that 'look like URIs' are not subject to globalization with a 'default' %TAG. Cheers! Clark On Sat, Sep 04, 2004 at 12:02:20AM +0300, Oren Ben-Kiki wrote: | summary: | | This is the eighth-pass draft, based on the sixth-pass. This pass | primarily incorporates the "!!" concept to solve the problem raised | by Onoma: | | - !!int is cooked as tag:yaml.org,2002:int | | In addition, this draft assumes "YAML 1.1" (see Clark's post). | In a nutshell, this means: | | - YAML version bumped to 1.1 to reflect incompatible changes. | - Directives per stream instead of per document. | - One directive per line (with optional trailing comment) | - Use spaces in directives. | | # note: None of this has been approved by Brian yet. Also, the | # YAML 1.1 notion has not received any feedback yet. It isn't | # crucial for this proposal, though. | | One method of tag globalization is to use 'private' tags in your YAML | document, and use a transformation of sorts (either explicit, or | implicit by the application) to convert one's tags to a globally | unique variety. This method is perfect for small teams where | interoperability isn't a huge problem, and who do not wish to pay the | price of mixing and matching globalized tags. | | The other method, is an XML namespace like mechanism where a tagURI | can be broken into chunks, the first (longer) half of the tag, | containing the taggingEntity, is moved up into the declaration and | given a handle. The second (shorter) half is then used within each | tag as an together with the handle that links it to the longer half. | The combining of the parts is done by the parser, so the application | always sees full tagURIs. | | It is important to keep in mind that documents written using 'private' | tags may later require a tagURI namespace, in order to participate | in a "mix and match" type of document. It is therefore necessary to | be able to easily attach a namespace to a document that wasn't | written with namespaces in mind, by the simple action of adding a | directive to the document rather than by going through the document | line by line. | | Finally, there exists a number of "common" tags that are useful in | most applications. Such tags should be easy to integrate with both | "private" and tagURI-based tags, without forcing the document to | carry additional "noisy" directives. When a document is migrated | from using 'private' tags to using a namespace, the "common" tags | must be unaffected. | | This proposal provides all the above features so that the first class | of people, who do not require globally unique tags, need not be | burdened by them. | | syntax: | | - We open up the tag mechanism !tag to allow any non-space | characters to be used. However, the resulting tag must be | valid according to the requirements of the URI scheme used. | | The following characters are marked as 'unwise' in RFC2396, | regardless of the URI scheme: | | { } | \ ^ [ ] ` | | (However, [ and ] are expected to be used in certain URIs in | the future). | | These characters will provide an 'escape hatch' for current and | future extensions to YAML. With this change, any URI can be | directly used as a !tag. We really can't use {} or [] since they | signify mappings and lists. The \ character is used for escaping, | and we use | to signify block and the backtick looks too much | like the single quote to be useful. This leaves the ^ delimiter, | which was already used for the older cut^paste mechanism. | | - We introduce a new directive 'tag' which provides a way | to shorten the data entry of tagURIs. In particular, | | declaration := "%TAG" WS taggingEntity ":" spec_first [ WS handle ] | | Where 'taggingEntity' refers to the same production in the tagURI | specification and WS is white space. The taggingEntity refers to | either a domain or email address followed by the minting date; | see tagURI specification for details. The 'spec_first' refers to zero | or more non-space characters (it is optional). | | The 'handle' refers to a sequence of one or more word characters | [a-zA-Z0-9_]. Optionally the '^' and handle can be missing, this | case is called the 'default prefix' and the handle is considered | to be the empty string ''. In a YAML document, each handle | must be unique via string comparison. | | - We extend the !tag mechanism to allow a single '^' character, | which is in the reserved characters above, the syntax for this | special case is, | | taguri := '!' handle '^' spec_second | | In this circumstance, the 'handle' _must_ appear as a handle in one | of the stream's directives. The 'spec_second', is zero or more | non-space characters; with the restriction that either spec_first or | spec_second (or both) must be at least one character. | | semantics: | | - For every special tag having a '^', the parser will do special | cooking to join the information specified in the declaration | together with the node's tag, such nodes will be treated as if | they had been tagged, | | cooked := "!tag:' taggingEntity ":" spec_first spec_second | | Note that the 'handle' is not included in this information, it is | considered a detail of the Presentation model, and should not occur | in tools that comply with the Serialization nor Representation | models. Thus, the 'handle' is _not_ part of the core YAML | information model, it is merely a syntax-level trick to ease the | burden of typing and human reading. | | Also note while other URI schemes may appear in a tag, this cooking | mechanism purposefully constructs tagURIs; that is, globally unique | identifiers lacking protocol or access semantics. | | - Tags not containing '^' which "look like a URI" are considered to be | URIs and passed through as-is. Therefore 'tag:' and 'http:' URIs | are unaffected by default prefixing. For this purpose, a tag "looks | like a URI" if it starts with a scheme (regexp: [a-zA-Z0-9,+\-]+) | followed by a single ':', a non-':', non-space character, and an | arbitrary suffix. | | - Tags that start with '!' are considered to belong to yaml.org. Thus | "!!foo" is interpreted as if it has been written | "!tag:yaml.org,2002:foo". | | - If the document has a default prefix (a directive with an empty | handle), then all other tags are cooked according to the '^' rule | above, using the taggingEntity and the spec_first from the directive | with the empty handle. | | - If the document has no default prefix (a directive with an empty | handle), then all other tags are passed-through uncooked. | | design: | | - We are using the directive syntax, because it gives a clear | indication that 'magic' is about to happen. Also, it localizes all | of the declarations up-font. By using a directive, we set the | precedent that other directive mechanisms may be added for other | 'magical' needs if they show as much rationale as this one. This also | allows us to easily identify which documents depend upon this magic. | | - This change makes private, uncooked tags the default, removing | a ton of 'magic' from the average use cases, this should make | YAML easier to grok and configure. | | - The "^" character was chosen because it is not included | in RFC2396's uric production (aka taguri's specific), and | it doesn't look like any of our other indicators. This | character _was_ used for the previous cut^paste mechanism, | but that mechanism is depreciated. | | - We use tagURI specification (http://taguri.org) to define the | unique URIs. This follows previous versions of the YAML spec. | The tagURI is used because it does not imply access semantics | and defines an easily 'mint-able' unique identifier. | | - We purposefully named the directive TAG since it corresponds to the | tagURI. If at a later date and time, we decide on another mechanism, | say one based on HTTP schema access, we can add this directive | independently, and, if appropriate phase out this directive. | | - We created a special shorthand for yaml.org tags to allow them to be | used freely in directive-less 'private' documents and survive a | migration to a default-prefix 'globalized' document unharmed. | | compatibility: | | - This proposal is introduced as part of the YAML 1.1 set of changes. | Documents explicitly makrd as YAML:1.0 must be parsed according | to the old rules. | | - Documents lacking a %YAML directive should be assumed to be | YAML 1.1. However, the processor should make a reasonable | attempt to identify that a YAML 1.0 document is being parsed. | When this discovery is made "too late", the processor should | emit an error and abort parsing. | | - This proposal reverses the meaning of the "!!" notation. In the old | specification, this notation was used for private tags. In this | specification, it is used for yaml.org tags. | | - This proposal uses the same ^ character as the older cut^paste | mechanism. This older syntax trick is not compatible with this | proposal, and is depreciated. During the transition period, we | recommend parser's keep the old cut^paste logic, with an | appropriate warning, unless there is a %TAG directive, in this | case, the usage above is implied. | | - The magical cooking rules in the core specification are also | depreciated with this specification. Since the current version | of the specification does not allow tags "looking like a URI" and | the %TAG parameter, either of these can be used to identify newer | style YAML documents. When read with this semantics, all old-style | tags become private types; as they were literally typed. | | It is recommended that an exception is thrown when a tag is | not found; and give a command line option to provide a set | of type handlers that meets the requirements of the old | resolution mechanism for "YAML Tags". | | For PyYAML, it must switch to using "Class" for its private | tags; thus start to serialize as !Class -- I'm not sure how to | handle Syck, this is a big discussion. | | example: | | The following document, | | %TAG bar.com,2004:timesheet/ meet | %TAG foo.com,2004:shape/ # default | --- !tag:baz.com,2004:mixed/list | - event: !meet^meeting | where: office | time: 2004-09-09 10:00:00 | duration: !!int 1:00 | text: boring | shape: !ellipse | width: !!float 10 | height: 5 | - event: !meet^meeting | where: office | time: 2004-09-09 10:00:00 | duration: !!int 1:00 | text: boring | shape: !rectangle | width: !!float 10 | height: 5 | ... | | would differ in the Presentation Model, but would be identical in the | Serialization and Representation model with, | | --- !tag:baz.com,2004:mixed/list | - event: !tag:bar.com,2004:timesheet/meeting | where: office | time: 2004-09-09 10:00:00 | duration: !tag:yaml.org,2002:int 1:00 | text: boring | shape: !tag:foo.com,2004:shape/ellipse | width: !tag:yaml.org,2002:float 10 | height: 5 | - event: !tag:bar.com,2004:timesheet/meeting | where: office | time: 2004-09-09 10:00:00 | duration: !tag:yaml.org,2002:int 1:00 | text: boring | shape: !tag:foo.com,2004:shape/rectangle | width: !tag:yaml.org,2002:float 10 | height: 5 | ... | | Migration: | | --- | # old tag # Should become... | - !int 23 # Must change to !!int | - !!old-private # Must change to !old-private | - !perl/Foo::Bar # OK, !perl/Foo::Bar, "perl/" can go. | - !python/tuple # OK, !python/tuple, "python/" can go. | - !htsql.org,2004/request # OK, htsql.org,2004/request, should be %TAG. | ... | | "Private": | | --- | - !foo # foo (private) | - !!int 10 # tag:yaml.org,2002:int | - !http://foo.com/bar # http://foo.com/bar | ... | | "Globalized": | | %TAG d9...@ho...,2004-09-02: | --- | - !foo # tag:d9...@ho...,2004-09-02:foo | - !!int 10 # tag:yaml.org,2002:int | - !http://foo.com/bar # http://foo.com/bar | ... | | %TAG d9...@ho...,2004-09-02: me | %TAG bar.com,2004-09-02: them | --- | - !foo # foo (private) | - !me:foo # tag:d9...@ho...,2004-09-02:foo | - !them:baz # tag:bar.com,2004-09-02:baz | - !!int 23 # tag:yaml.org,2002:int | ... | | %TAG d9...@ho...,2004-09-02: # default | %TAG bar.com,2004-09-02:them | --- | - !foo # tag:d9...@ho...,2004-09-02:foo | - !them:baz # tag:bar.com,2004-09-02:baz | - !!int 23 # tag:yaml.org,2002:int | --- | - !foo # tag:d9...@ho...,2004-09-02:foo | ... | | Have fun, | | Oren Ben-Kiki -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |