From: Clark C. Evans <cce@cl...>  20020909 00:57:03

The YAML model is layered into three models: graph, serial, and syntax. The graph model is the most basic. It says:  YAML has abstract thingys called Nodes.  Nodes have a Type Family  Nodes are either Scalars or Collections  Scalars have a sequence of zero or more Characters  Collections are a mapping from Nodes to Nodes The definition of "mapping" follows the usual mathematical definition of a function. A function has two sets of nodes, domain and range, such that each node in the domain is associated with exactly one node in the range. It also comes with the constraint that no two nodes in the domain of a function are Equal. The common "hashtable" exactly provides the requirement of a mapping, as does a "array" once you consider the domain as a set of nodes with the type family integer. The serial model adds to the graph model the following:  An anchor is needed to mark nodes which occur more than once in the structure (identity needed)  An alias is used to reference subsequent occurances of an anchored node.  Collections are given a key ordering This model adds the necessary items required when flattening the graph model into a sequence. It provides a total ordering across all nodes (when they arrive in a stream or position within a file) The syntax model adds to the serial model stuff for readability:  Comments are added  Each scalar node gets a node style (block, etc.)  Each node may have an additional "format", so that integer type family can be represented in hex, for example In hindsight, the word "generic" model is really not very good as all of the above models can have an "generic" implementation. Thus, I'm going to revert to calling the top level model the "graph" model as it was called along time ago. A few more notes:  The current "graph" model uses Map/List/Scalar instead of Collection/Scalar. I've tedered back and forth between these alternative viewes; the Collection/Scalar is the most general, so this should probably be used at this level (besides, map/list can be type family)  format is in the serial model as an efficiency (understanding that the loader would probably need the format). It propery belongs in the syntax model as it is there for "readability". I hope this helps "clarify" things somewhat. The question is "why have a model". We have these models so we can discuss the minimum requirements of a particular layer. For example, an implementation of the graph model *may* preserve comments or key order if it desires. However, an implementation of the graph model need not do this. And as such, an application program shouldn't count on key order being preserved. To make sure that applicatin's don't shoot themselves in the foot, its probably best that an implementation of the graph model don't support features of the other models without a serious switch/warning. The point of having these models is so that we can then formally define operations on these models; schema validation, path expressions, transformations, in a generic manner which doesn't need to take any particular implementation into account. If we don't have the models, everyone will have different assumptions about what's there and what's not. Why is the graph model so strict? Well, the graph model is essentially a graph where each collection is a mathematical function. By leveraging this absolute and formal definition we make it possible (plauseable) that in the short run (within a year or two) schemas, transforms, and other generic operations can be formalized and given a rigorous definition with properties (such as NP comppleteness) proven. Yes, it is possible to add more things to this model, but for each thing we add, an expection is created that we can check for this thing's existence (graph level schema), find this property (ypath), and preserve these properties during a manipulation (transform library). Something seemingly as simple as a "key order" completely changes the mathematical properties of the graph model and I'd like to keep it as clean as possible. What is the impact of the graph model? Well, it just says that for the best interopreability, your program should only use features of the graph model and not of the lower level models. Why the serial model? Unfortunately, the graph model is only good if you have Random Access to a YAML text. In the real world, your documents often come sequentially ordered as files or over a socket or through events. This model imposes the necessary items that emerge here; but none more. Thus, there may be tools which work at both the graph and serial model; or those which work at either but not both. Why the syntax model? For one example, the "schema" document could have a "syntax model" portion where items like which integer format to use (colors use hex, for example) or what scalar style to use, or some instructions to be injected as comments. Basically, a "syntax fork" of a schema is probably a very valueable thingy. Are you sure you got it right? Quite. Perhaps too confident, but the only way to know for sure is to start building generic tools which can be used across many applications. The generic tools are the customers of the models... not really the applications. Why should an applicatino follow the model? Its a promise. A promise that if they use YAML and follow the model (with some help by the parser implemetnations and user level documentation) then generic yaml tools will be useful to them. What are yaml tool applications? Being able to use your data from multiple languages without a DOM or yamlspecific data object. Validating if a given document fits a particular structure (scheme) so that your application doesn't have to do it. This would be done through a YAML schema. Transforming a given document from one scheme to another when migrating data between applications. Running reports, described in YAML which summarize information do calulations, etc. I hope this helps. Clark i . example) can describe i I'm not s consider the toplevel model the "graph" model The word "generic" model is really . The serial model is what happens when something in the graph model Also two a set of nodes (domain) and a range, such that no also has an Equal property to check if two nodes are equivalent.All YAML stuffs are nodes   There are two kinds of nodes: . The top, most basic layer is the "graph" model which is the simplest 