From: Clark C . E. <cc...@cl...> - 2001-05-18 16:29:48
|
+-------------------------------------------------------------------------+ | Welcome to YAML (tm) -- WORKING DRAFT 0.19a | +-------------------------------------------------------------------------+ | YAML (tm) is a straight-forward data serilization language, offering an | | alternative to XML where markup (named lists and mixed content) are not | | needed. YAML borrows ideas from rfc822, SAX, C, HTML, Perl, and Python. | | | | * YAML texts are brief and readable. | | * YAML uses your language's native data structures. | | * YAML has a simple stream based interface. | | * YAML has a solid information model. | | * YAML is expressive and extensible. | | * YAML is easy to implement. | | | | YAML is a collaboration between Brian Ingerson (author of Data::Denter | | ), Clark Evans, Oren Ben-Kiki, Sjoerd Visscher, and other members of | | the SML-DEV mailing list. YAML explicitly targets the object | | serilization needs of the Python and Perl communities. Implementations | | will be on their way within the next two weeks. | +-------------------------------------------------------------------------+ | News | +-------------------------------------------------------------------------+ | * 17-MAY-2001: YAML now has a mailing list at SourceForge. | | * 18-MAY-2001: YAML has had it's first meeting. The minutes have | | been sent out to the mailing list, and should be appearing in the | | archives soon. | | | +-------------------------------------------------------------------------+ | Key Concepts | +-------------------------------------------------------------------------+ | YAML is founded on several key concepts from very successful languages. | | | | * YAML uses similar type structure as Perl. In YAML, there there | | are three fundamental structures: scalars ($), maps (%), and lists | | (@). YAML also supports references to enable the serilization of | | graphs. Furthermore, each data value can be associated with a class | | name to allow the use of specific data types. | | * YAML uses block scoping similar to Python. In YAML, the extent of | | a node is indicated by its child's nesting level, i.e., what column | | it is in. Block indenting provides for easy inspection of the | | document's structure which helps to identify scope errors. | | * YAML uses similar whitespace handling as HTML. In YAML, sequences | | of spaces, tabs, and carriage return characters are folded into a | | single space during parse. This wonderful technique makes markup | | code readable by enabling indentation and word-wrapping without | | affecting the canonical form of the content. | | * YAML uses similar slash style escape sequences as C. In YAML, \n | | is used to represent a new line, \t is used to represent a tab, and | | \\ is used to represent the slash. In addition, since whitespace is | | folded, YAML uses bash style "\ " to escape additional spaces that | | are part of the content and should not be folded. Lastly, the | | trailing \ is used as a continuation marker, allowing content to be | | broken into multiple lines without introducing unwanted whitespace. | | * YAML allows for a rfc822 compatible header area for comments, | | specific processing instructions, and encoding declarations. This | | provides a flexible and forward looking method to augment the YAML | | parser with other features such as a validator similar to TREX or | | RELAX. Furthermore, this will allow a mail processing system to | | directly use YAML as its input parser. | | * YAML supports binary and formatted text entities with MIME | | multi-part attachments. Each attachment is given an reference | | identifier which can be associated with a location in hierarchical | | YAML content. This allows leaf values which would distrupt the | | in-line structural flow to be handled out of band in a seperate | | block mechanism. | | * YAML has a SAX like sequential "C" API. This C library can be | | used to easily construct native-language representations of a YAML | | serilization. The API also show cases a clever substitutability | | technique which allows schema changes to occur at the leaf nodes in | | a backwards compatible manner without breaking older code. This | | brings resiliance to older code, while allowing the structure of | | your data to grow over time. | | | +-------------------------------------------------------------------------+ | Example: Basic | +-------------------------------------------------------------------------+ | Below is an example of an invoice expressed via YAML. Each value's type | | indicated by either percent (map), or an at (list) sign, or an optional | | dollar sign (scalar). The content for each value follows the indicator | | either on the same line for scalars or on subsequent indented lines. | | The content for a map, which is also the starting production, is a list | | of key value paris. Each key and value are seperated by a colon. | | buyer : % | | address : % | | city : Royal Oak | | line one : 458 Wittigen's Way | | line two : Suite #292 | | postal : 48046 | | state : MI | | family name : Dumars | | given name : Chris | | date : 12-JAN-2001 | | comments : | | Mr. Dumars is frequently gone in the morning | | so it is best advised to try things in late | | afternoon. \nIf Joe isn't around, try his house\ | | keeper, Nancy Billsmer @ (734) 338-4338.\n | | delivery : % | | method : UZS Express Overnight | | price : 45.50 | | invoice : 00034843 | | product : @ | | % | | desc : Grade A, Leather Hide Basketball | | id : BL394D | | price : 450.00 | | quantity : 4 | | % | | desc : Super Hoop (tm) | | id : BL4438H | | price : 2,392.00 | | quantity : 1 | | tax : 0.00 | | total : 4237.50 | | | | Since "product" is a list, it only has values and thus is missing the | | key and colon. Also notice that the "comments" scalar is on multiple | | lines. Since whitespace is folded, the carriage return (\n) is escaped | | and the line ending \ is required to keep housekeeper as a single word. | | By default, the serilizer will sort map keys, although this isn't a | | requirement of the serilization structure. | +-------------------------------------------------------------------------+ | Example: References and Class Names | +-------------------------------------------------------------------------+ | Below is an example of a YAML document which demonstrates the use of | | references and classes. Immediately after an indicator a class name can | | occur and then within parenthesis an optional reference handle. If the | | indicator is a "*", then no further content is allowed, as this | | indicator signifies a reference to another value. The class name may be | | used as a specific language specific binding to a particular object or | | type appropriate class, otherwise it can be considered a comment. The | | production for allowable names and a namespace mechanism have yet to be | | worked out. | | buyer : %person | | comments : | | This is a person object accessable | | through the "buyer" key from the | | top level map. | | family name : Dumars | | given name : Chris | | inline : $(0001) | | This is a folded text entity | | that is associated with a | | reference so that it can be | | re-used later on. | | seller : %person(0002) | | comments: | | This is another person object, only | | that it is given a handle of 0001 as | | well as a class so that it can be | | refered to later. Handles must be | | numeric, and classes cannot start | | with a number. | | family name : Sellers | | given name : Peter | | zzz : | | comments: | | The first two items in this map are references | | The first is to the person object "Peter Sellers". | | The second is to the inline text object "This is..." | | The price scalar below is given a class "price". | | peter : *(0002) | | price : $currency | | 23.34 | | text : *(0001) | +-------------------------------------------------------------------------+ | Example: Block References and Attachments | +-------------------------------------------------------------------------+ | Below is an example of a YAML document which includes the optional | | rfc822 style header, specifically a rfc2046 multipart header. A YAML | | Parser must handle these headers to allow for application specific | | processing instructions, and MIME for raw/binary references. | | Date: Sun, 13 May 2001 23:48:04 -0400 | | MIME-Version: 1.0 | | Content-Type: multipart/related; | | boundary="================================" | | X-YAML-Version: 1.0 | | | | --================================ | | Content-Type: text/plain; id="0001" | | | | XX XXX XXXXX XX XX | | XXX XX X XX X XX | | XX XXXXXX XX X XX | | XX X XX XXXXXXX | | XXXX XXXXX X XX XX | | | | --================================ | | Content-Type: image/gif; id="0002" | | Content-Transfer-Encoding: base64 | | | | DlhGQAAOMAAAICBDaanAJSVAISFP7+/GbOzAJmZAIeHGbMzGbMzGbMzGbMzGbMzGbMzGbM | | CH+Dk1ZGUgd2l0aCBHSU1QACH5BAEKAAYALAAAAAAZAA8AQAR70EgZArlBWHw7Nts1gB6R | | BMlkp4lHJppkNoyW1r5SmcTeV6wUwrFI4VEulSMyRLchhYrYLq4MDKYrm9XuFQuIzLhALA | | +g44FBHybokQGdnivNfhJ8enwFSR12eB4jcWZ3gHeCJQJycXSJEzaIc5SIWz0RADs= | | | | --================================ | | Content-Type: text/x-yaml; id="0003" | | | | an inline : $(0004) | | This is a folded text entity | | that is associated with a | | reference. | | content : | | comment: | | The cyclic item is a reference | | to the top-level map. | | cyclic : *(0003) | | image : *(0002) | | inline : *(0004) | | raw : *(0001) | | title : This contains multiple references | | | +-------------------------------------------------------------------------+ | Information Model | +-------------------------------------------------------------------------+ | A map/list/scalar data structures found in modern programming languages | | such as ML, Python, Perl, and C. This model should also be very | | compatible with relational database tables. Note: This model lacks | | classes and references which are still under consideration. | | Document The the starting production for YAML is a Map. | | Map An un-ordered sequence of zero or more (Key,Value) tuples. | | Where they Key is unique within the sequence and matches the | | Key production. | | Value Exactly one of Scalar, Map, or List | | List An ordered sequence of zero or more Values. | | Scalar Any type directly serilizable through or able to be | | constructed from a sequence of zero or more characters. These | | characters must match the Char production. | | ----------------------------------------------------------------------- | | Default This is a synthesized attribute of every Value. If the Value | | is a Scalar, then the Default property refers to the Value | | itself. If this Value is a List, then the Default refers to | | the Default property of the first Value in its sequence. If | | the Value is a Map, then Default refers to the Default | | property of the Value in its Pair entry lacking a Key. By | | using Default, a Scalar Value can be substituted with a Map | | or a List Value without braking older code. | | | | Take careful note that the information model does not admit a "parent" | | property of each value. Quite the contrary, YAML may be a graph | | structure and is not necessarly a tree. | +-------------------------------------------------------------------------+ | Mapping To Popular Environments | +-------------------------------------------------------------------------+ | For Python, the internal representation has a top-level object is a | | Dictionary, and from there, depending upon each value's indicator, can | | either be a List, Dictionary or a String. It is possible for a schema | | mechansim to be included which affords for more specific decoding into | | classes and types. The default attribute is implemented through a | | stand-alone function. | | | | For Perl, the internal represenation starts with a top-level hash. And | | from there, depending upon the indicators can either be a list, hash, | | or string scalar. Of course, it is also possible for a schema mechanism | | to be included which affords for more specific decoding. The default | | attribute is implemeted through a stand-alone function. | | | | Haven't done Java or Javascript since '98, but I remember Strings, Maps | | and Lists being Objects. So there shouldn't be any problem in Java. | | Javascript is probably in the same boat but I can't veryify since that | | book has mysteriously dissapeared as well. | | | | For ML, C, and C++ all of which lack a built-in, variable type Map and | | List structure require a specific schema to build an internal | | representation. For these languages, a YamlValue type could be created | | with sub-types of Scalar, List, and Map. For C++, STL could make the | | implementation very quick, especially with iterator support. An | | alternative approach would be a class builder... but this, of course, | | requires a bit more smarts and a schema system. | | | | Mapping to a relational database will also require some sort of schema | | to indicate how to pack/unpack. However, given that a tuple (record) is | | easlily associated with a Map, and a relation (table) is easily | | associated with a List, there should not be that much difficulty. | | Mapping NULL values will be represented by a lack of a particular map | | entry. | +-------------------------------------------------------------------------+ | Serilization Format / BNF | +-------------------------------------------------------------------------+ | This section contains the BNF productions for the YAML syntax. Much to | | do... | +-------------------------------------------------------------------------+ | Parser Behavior | +-------------------------------------------------------------------------+ | This section describes how a parser should parse YAML. Much to do... | +-------------------------------------------------------------------------+ | Emitter Behavior / Canonical Form | +-------------------------------------------------------------------------+ | This section describes how an emitter should write YAML into canonical | | form. Includes specific word-wrapping algorithem. Minimal content | | length of 20 chararacters, and does it's best to word-wrap by 76 | | columns. | +-------------------------------------------------------------------------+ | Implementations | +-------------------------------------------------------------------------+ | To do... an implementation in C, C++/STL, Python, Java, and ... | +-------------------------------------------------------------------------+ | Credits | +-------------------------------------------------------------------------+ | This work is the result of long, thoughtful discussions on the SML-DEV | | mailing list. Specific contributors include... (to do) | +-------------------------------------------------------------------------+ | Some thoughts | +-------------------------------------------------------------------------+ | 1. This is very preliminary thoughts on the subject, feedback is very | | welcome. | | 2. Implementations needed... Clark is happy to write the Python,C,& | perhaps even a C++ implementation. Any takers? | | 3. Was thinking hard about using # for a comment indicator, or perhaps | | as a numeric indicator. Benfits? In any case, the BNF should leave | | all of these special characters open to future versions. | | | +-------------------------------------------------------------------------+ | FAQ | +-------------------------------------------------------------------------+ | 1. Don't the indicator characters need to be escaped in the content? | | Answer: No. | | | +-------------------------------------------------------------------------+ | Specific Productions | +-------------------------------------------------------------------------+ | Char :: #x9 | #xA | #xD | [# Any unicode character, | | = x20-#xD7FF] | [#xE000-# excluding surrogate blocks, | | xFFFD] | [#x10000-# FFFE, and FFFF. Where unicode | | x10FFFF] is defined by ISO/IEC | | 10646-2000 | | Characters :: Char* Zero or more characters. | | = | | WhiteChar :: #x20 | #x9 | #xD | #xA A space, tab, new line or | | = carriage return, escaped by \ | | s, \t, \n, and \r | | respectively. | | Whitespace :: WhiteChar+ Any sequence of spaces, tabs, | | = new lines or carriage | | returns. | | Indicator :: '$' | '%' | '@' | '*' The dollar sign indicates a | | = scalar, a percent sign | | indicates a map, an at sign | | indicates a list, and a star | | represents a reference. | | Reserved :: WhiteChar | Indicator | Printable, non-alpha, | | = [#x21-#x2F] | '/' | [# non-numeric ASCII characters | | x3B-#x40] | [#x5B-#x5E] excluding the period, colon, | | | #x60 | [#x7B-#x7F] underscore, and dash. | | Key :: (Char - Reserved)* One or more non-reserved | | = characters. | +-------------------------------------------------------------------------+ |