[Yaml-core] A compromise position (two models, two interfaces)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

According to our website, YAML is first and for most a "native 
object" serialization language esp for languages like Perl and Python.
We should stick hard to that goal.

However, that said; we can't stop users of the "side-effects" that
our serialization format and parser create: ordered keys and duplicate 
keys.   Thus, we have either two options; duck and ignore that they 
are doing it; or perhaps try to give them _some_ accomidation.  As
an added "bonus" we become much more XML compatibilish; and this 
could help adoption in those circles.

Now, we can either create another model for this; or we could
phrase the "serial" model in such a way that it allows both
of these effects to exists.  Quite clearly key order comes
for free with the serial model.   And since node equality 
can't be examined without the type resolution at this layer
we could even allow duplicate keys.   This has the added 
advantage of clarifying that parsers _must not_ try to 
eliminate duplicate keys.   In this way, tools written for
the serial model must worry about duplicate keys and 
maintain key order.   

Then, at our more abstract level, our "graph" model we simply
reject those instances of the serial model which have duplicate
keys and we clearly state that node order is not preserved.
So.  These thingys can co-exist (without creating a new model)
just through current clarifications of the existing model. 

At this point, an Application has two choices.  It can use
the YAML Parser only, and take its information from the
Serial model; or it can use the YAML Loader and restrict
itself to maps without duplicates and where key-order is
forbidden... implementations may even want to scramble the
keys just to make it crystal clear.

I would be happy with this compromise -- but it means 
following the policy Neil laid out:

   1.  If an Application wants to use key order as 
       information and/or accept duplicate keys, then it
       must use the Parser interface directly (which
       provides output in the serial model).  

   2.  If an Application agrees with the generic model,
       where key order is not signficant and duplicate
       keys are not allowed, then it may use the Loader
       interface (which provides the graph model)

To implementers this means the following rule to help
keep things straight:

   Your Loader interface should not in any way be using
   key order or allowing for duplicate nodes in the stuff
   it generates.   Ideally, this means the loader should
   only pass the application or object type constructors
   a fully-created random-access hashtable (ideally with
   pre-scrambled keys).   And not provide any loading
   "hooks" which break this process (and hence enable the
   user to accidently violate the model).

Thus, people have it both ways.  If they want to treat YAML
as Syntax, they can use the Parser interface.  If they want
to use YAML as a Serialization Language (with its model) then
they use the Loader interface.   There will be reasons to 
use both interfaces... 

I know it sounds draconian... but I think this is a good
compromise position that allows for both models but still
keeps YAML's vision as a serialization language for 
modern scripting languages pure.

Best,

Clark