From: Steve Howell <showell@zi...>  20020909 01:08:16

 Original Message  From: "Clark C. Evans" <cce@...> > The YAML model is layered into three models: graph, serial, and syntax. > > The graph model is the most basic. It says: > >  YAML has abstract thingys called Nodes. >  Nodes have a Type Family >  Nodes are either Scalars or Collections >  Scalars have a sequence of zero or more Characters >  Collections are a mapping from Nodes to Nodes > > The definition of "mapping" follows the usual mathematical > definition of a function. A function has two sets of nodes, > domain and range, such that each node in the domain is > associated with exactly one node in the range. It also > comes with the constraint that no two nodes in the domain > of a function are Equal. > > The common "hashtable" exactly provides the requirement > of a mapping, as does a "array" once you consider the domain > as a set of nodes with the type family integer. > > The serial model adds to the graph model the following: > >  An anchor is needed to mark nodes which occur > more than once in the structure (identity needed) >  An alias is used to reference subsequent occurances of an anchored node. >  Collections are given a key ordering > > This model adds the necessary items required when > flattening the graph model into a sequence. It provides > a total ordering across all nodes (when they arrive in > a stream or position within a file) > > The syntax model adds to the serial model stuff for readability: > >  Comments are added >  Each scalar node gets a node style (block, etc.) >  Each node may have an additional "format", so that > integer type family can be represented in hex, for example > > In hindsight, the word "generic" model is really not very > good as all of the above models can have an "generic" > implementation. Thus, I'm going to revert to calling the > top level model the "graph" model as it was called along > time ago. > > A few more notes: > >  The current "graph" model uses Map/List/Scalar instead > of Collection/Scalar. I've tedered back and forth > between these alternative viewes; the Collection/Scalar > is the most general, so this should probably be used > at this level (besides, map/list can be type family) > >  format is in the serial model as an efficiency (understanding > that the loader would probably need the format). It propery > belongs in the syntax model as it is there for "readability". > > I hope this helps "clarify" things somewhat. The question > is "why have a model". > > We have these models so we can discuss the minimum requirements > of a particular layer. For example, an implementation of the > graph model *may* preserve comments or key order if it desires. > However, an implementation of the graph model need not do this. > And as such, an application program shouldn't count on key order > being preserved. To make sure that applicatin's don't shoot > themselves in the foot, its probably best that an implementation > of the graph model don't support features of the other models > without a serious switch/warning. > > The point of having these models is so that we can then > formally define operations on these models; schema validation, > path expressions, transformations, in a generic manner which > doesn't need to take any particular implementation into account. > If we don't have the models, everyone will have different > assumptions about what's there and what's not. > > Why is the graph model so strict? > > Well, the graph model is essentially a graph where each > collection is a mathematical function. By leveraging this > absolute and formal definition we make it possible (plauseable) > that in the short run (within a year or two) schemas, transforms, > and other generic operations can be formalized and given a > rigorous definition with properties (such as NP comppleteness) > proven. > > Yes, it is possible to add more things to this model, but > for each thing we add, an expection is created that we can > check for this thing's existence (graph level schema), find > this property (ypath), and preserve these properties during > a manipulation (transform library). > > Something seemingly as simple as a "key order" completely > changes the mathematical properties of the graph model > and I'd like to keep it as clean as possible. > > What is the impact of the graph model? > > Well, it just says that for the best interopreability, your > program should only use features of the graph model and > not of the lower level models. > > Why the serial model? > > Unfortunately, the graph model is only good if you have > Random Access to a YAML text. In the real world, your > documents often come sequentially ordered as files or > over a socket or through events. This model imposes > the necessary items that emerge here; but none more. > > Thus, there may be tools which work at both the graph > and serial model; or those which work at either but > not both. > > Why the syntax model? > > For one example, the "schema" document could have a > "syntax model" portion where items like which integer > format to use (colors use hex, for example) or what > scalar style to use, or some instructions to be injected > as comments. Basically, a "syntax fork" of a schema > is probably a very valueable thingy. > > Are you sure you got it right? > > Quite. Perhaps too confident, but the only way to know > for sure is to start building generic tools which can be > used across many applications. The generic tools are > the customers of the models... not really the applications. > > Why should an applicatino follow the model? > > Its a promise. A promise that if they use YAML and follow > the model (with some help by the parser implemetnations and > user level documentation) then generic yaml tools will be > useful to them. > > What are yaml tool applications? > > Being able to use your data from multiple languages without > a DOM or yamlspecific data object. > > Validating if a given document fits a particular structure (scheme) > so that your application doesn't have to do it. This would be > done through a YAML schema. > > Transforming a given document from one scheme to another > when migrating data between applications. > > Running reports, described in YAML which summarize information > do calulations, etc. > > I hope this helps. > > Clark > i > . > example) can describe i > > I'm not > s > > consider the toplevel > model the "graph" model > The word "generic" model is really > > . > The serial model is what happens when something in the > graph model > > Also two > a set of nodes (domain) > and > a range, such that no > > also has an Equal property to check if two nodes > are equivalent.All YAML stuffs are nodes >  >  There are two kinds of nodes: . The top, most basic layer is the "graph" model which is the simplest > > The text of this email would make an excellent introduction to a document that describes the YAML model more rigorously, and which includes more real world examples of the YAML model in use, helping people to solve software problems. Cheers, Steve 