Thread: [Yaml-core] Loader details in the spec

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I think we need to articulate in the spec the multi-level loader details
we've been talking about, both to give a reference to language-module
implementers and the libyaml maintainer, and also to show users more
specifically what YAML supports.  We need to give examples all the
way down to the native types in one or two of our reference platforms
(I'm using Python below).  If language-specific information in the spec
offends people, we should at least give hints about recommended default
types.

I also think we need to decouple the vocabulary for a user-level
loading/dumping tool from "Parser/Loader/Emitter/Dumper" in the
diagram.  Those four actually convert from one model to another, but
a user-level tool does something different: it exports/imports a
native representation *within* a level.  Since the user doesn't care
what YAML calls its model-thunking routines, I'll use the terms
"load" and "dump" for the user-level tools and let others worry about
what to call the inter-model converters.

In this scenario, "native model" truly disappears because "native" is a
concept orthogonal to the models.

With this in mind, here's a starting point, based on Clark's 4-model
proposal with a level 0 added.  Obviously the models and specifics are
in flux, but we can get the outline down now and update the specifics
later.  (Oh dear, I'm adding pages...)

=====
YAML has four models for representing data, arranged linearally
from lowest (closest to the I/O stream) to highest (most abstract).  
Sometimes these models are called "levels".  Any model may have a
user-accessible loader tool that exports the data from that model into a
language-specific native data structure the application can operate on.
Likewise, the model may have a dumper tool that does the opposite.  We
say *may* because the language implementation (e.g., PyYAML) is not
required to support a loader/dumper for every level, but if it does,
the loader/dumper should support the following features at minimum.
A loader may be a method or function you call that returns a native
data structure, but exactly how you invoke it is up to the language
module.

Although each model has its own internal data structure (not normally 
accessible to the user), YAML has only one persistent structure, the
"stream" contained in a .yml file.  So when the application requests
data at a particular level, YAML must go *through* the lower models
to read it from the stream.

MODEL: Sub-syntax
LEVEL: 0
LOADER READS: input stream
LOADER RETURNS: a string containing the entire document
TYPICAL USES: not many

MODEL: Syntax
LEVEL: 1
LOADER READS: sub-syntax model
LOADER RETURNS: 
 - each scalar as a string.
 - each [] collection as a list of pairs (0-based index, value)
 - each {} collection as a list of pairs (key, value)
 - key order and duplicate keys are preserved for collections
 - no information about type family
 - anchors/aliases are not resolved, they are just strings with "&" and
   "*" prefixes.
 - comments?  Should there be an intermediate level that returns
   comments?  But how will the application distinguish between a
   scalar and a comment if both are strings?  Counting on the "#" prefix
   would be unreliable.
TYPICAL USES: 
  - most of the serial-model uses can also be done here
TYPE INFORMATION:
  - the alternate type-family loader mentioned in the serial model
    could also operate here

MODEL: Serial
LEVEL: 2
LOADER READS: syntax model
LOADER RETURNS:
 - same as syntax model but...
 - anchors/aliases are resolved
 - no comments
TYPICAL USES:
 - A configuration file where all values are considered strings.
   The application handles type conversion itself.  The application
   may want to validate "12345" before converting it to a number, or
   may want to raise a custom validation error.
 - An application that needs key order and duplicate keys preserved.
TYPE INFORMATION:
 - there may be an alternate loader method that returns a parallel
   structure containing the type family of each scalar (as a string)
   rather than the value itself.  But how to encode the type family
   of collections?  If you make the collection's value, there's no
   place to list the type families of the collection's children.

MODEL: General
LEVEL: 3
LOADER READS: serial model
LOADER RETURNS: 
 - I don't know how this is different from the serial model.

MODEL: Functional (aka graph)
LEVEL: 4
LOADER READS: general model
LOADER RETURNS:
 - same as general model but...
 - each scalar is resolved to its implicit/explicit type
 - each [] collection is converted to a list.  order is preserved.
   duplicate keys is not an issue since YAML generated the keys.
 - each {} collection is converted to a dictionary.  caveats:
     - key order is not guaranteed.  Python dictionaries destroy
       key order and it cannot be reconstructed.  PHP arrays preserve
       key order, but the loader may intentionally randomize the order
       to prevent you from exploiting this.
     - duplicate keys are dropped.  Should this be done silently, or
       with a warning, or with an error, or whichever the user chooses?

The loader may give the user options to select from a variety of
alternate native representations.

The dumpers would generally operate in reverse.  Probably they would
need to choose one or two output styles for all scalars, sequences and
maps.  Choosing specific output styles beyond that really requires a schema.

-- 
-Mike (Iron) Orr, ir...@ms...  (if mail problems: ms...@oz...)
   http://iron.cx/     English * Esperanto * Russkiy * Deutsch * Espan~ol

Thread: [Yaml-core] Loader details in the spec

yaml-core