[Yaml-core] formalism in YAML

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Sean, let me respond to your inquiry with a few definitions:

  graph:           A collection of nodes and edges.   (FOLDOC)
  node:            A point or vertex in a graph.      (FOLDOC)
  directed graph:  A graph with one-way edges.        (FOLDOC)
  connected graph: A graph such that there is a path between any pair 
                   of nodes via zero or more other nodes.  (FOLDOC)
  colored graph:   A graph together with a colouring function which
                   assigns to each node in the graph a colour.  (NIST DADS)
  labeled graph:   A graph which has labels associated with 
                   each edge or each vertex.  (NIST DADS)
  function:        A rule of correspondence betweeen two sets such
                   that there is a unique element in the second set
                   assigned to each element in the first set. (Websters)
                   (A more boring definition from FOLDOC is below) 
  rooted graph:    A directed graph, with a special 'root' node
                   such that every node is connected, at least
                   indirectly, through a series of edges 
                   starting at the root.  (my definition)

In YAML we use these terms with exactly these meanings!  
I think of a YAML graph as the 'rooted' variety, that is
a directed graph which is connected.   Further, I picture
a YAML graph has having coloured nodes (the data type)
and each edge being labeled.  A YAML graph has three
sorts of nodes (vertexes):

  (a) scalar values, where the node has only 'incoming' 
      edges visualized as arrows pointing to it;

  (b) sequences (arrays); where the 'values' of the sequence
      are pointed to by outgoing edges labeled using 
      subsequent integers

  (c) mappings (hashtables); where the 'values' of the mapping
      are pointed to by outgoing edges labeled using, a set
      of unique 'keys'.

      The only 'tricky' thing about this visualization is
      that the keys are not just limited to strings, but
      can in fact be other nodes, including sequences and
      hashtables.   This is a slightly mind-bending 
      trick; but it is needed to model complex keys which
      are being found with increasing regularity as
      dictionary keys.

Now to address your questions...

On Mon, Oct 20, 2003 at 07:24:44PM -0700, Sean O'Dell wrote:
| Graphs are also "related" values; such as the result set of a function, 
| generally not from an arbitrary data set such as in YAML (aside from 
| specific specifications).

I think you are using graph in the following sense:

    1. A diagram that exhibits a relationship, often functional, between
       two sets of numbers as a set of points having coordinates
       determined by the relationship. Also called plot.
    2. A pictorial device, such as a pie chart or bar graph, used to
       illustrate quantitative relationships. Also called chart.

          (American Heritage Dictionary)

While this is a good definition, this is not the definition
typically used in 'computer science' or 'abstract math' speak.
The definition of 'graph' in this sense is much more fluid
and fundamental.   While a graph (especially directed graphs)
are most often functional in computer systems, they typically
do not have integers or real numbers for vertexes.

| specifications).  The purpose of a graph is to show relationship.  You would 
| draw up a graph to illustrate how the data in a series of YAML nodes are 
| related, but the series itself is not a graph. 

Ahh, but the point of using a graph *is* to show relationship
between objects -- that is what computer science is all about.
Two objects are connected when they have something 'in common'

For example:

   - first: clark
     last: evans
   - first: oren
     last: ben-kiki
   - first: brian
     last: ingerson

This specific 'series' is illustrating a relation between three
people; and the subordinate mappings, draw a linkage from a
shared anchor 'first' or 'last' to each one of these people.
It is very much a visual picture.  Very much a graph.  In
fact, we even have a way to specify a 'path' in the graph:

    /1/first  -> 'oren'

In YPath, a path named with 'edge labels' can be used to 
uniquely locate any node in a YAML graph from its root.
This is both simple and powerful.  

If you need more convincing, read up a bit on Petri-Nets, as
these are used heavily in manufacturing to solve all sorts
of scheduling constraints on the production line.   It converts
what is essentially an 'algebraic' problem (non-numerical)
into a picture that humans can peek at, poke, and sometimes
even solve.   

| A YAML element is a value and its associated name, which may be qualified 
| with a type identifier and a namespace.  A YAML element may be a simple 
| string value, an array of elements or an associative hash of elements.  YAML 
| elements are grouped into sets and may be arranged according to a particular 
| specification or arbitrarily.  A YAML document is a file or other storage 
| unit which contains a set of elements embodied as a YAML stream.  A YAML 
| stream is a series of Unicode characters which represent the YAML elements, 
| plus delimiting characters such as spaces and newlines character.

Well, two problems:

  1) You use element instead of node (or vertex).  Node is just 
     right for YAML (see above).

  2) Your definition of element is just plain wrong for YAML
     (although it does match how XML uses the word) -- you seem
     to be trying to attach the 'label' and the 'value' of
     a node together (mixing edges and vertexes).  Besides 
     not accurately modeling native data structures, it 
     actually prevents some very useful graphs (or you can still
     represent things, but it takes more elements).  

The rest of your paragaph seems like an alternative way of writing
what I've done, rather than rewriting (which I've done again and again
and again and again based on lots of people's feedback),  I could
actually use low-level help with verifying that it reads well and
makes sense.   So, for example, if you could pick out particular
sentances and explain why they don't make sense or how they could
be imprroved, this woudl help greatly.  Or, alternatively, if you
could suggest a better "outline" this could also be helpful.

Thank you so much for thinking about this stuff.  It is always nice
to have people following along and poking us so we make sure to 
cross all of the eyes and dots to the Ts.

Best,

Clark

PS.  Following is the more formal, boring definition of function, 
to which both our sequence (array) and mapping (hash) satisfy:

      If D and C are sets (the domain and codomain) then a 
      function f from D to C, normally written "f : D -> C" 
      is a subset of D x C such that: 

      1. For each d in D there exists some c in C such 
         that (d,c) is an element of f. I.e. the function 
         is defined for every element of D. 

      2. For each d in D, c1 and c2 in C, if both (d,c1) and 
         (d,c2) are elements of f then c1 = c2. I.e. the function 
         is uniquely defined for every element of D.