## [Yaml-core] Re: formalism in YAML

 [Yaml-core] Re: formalism in YAML From: Sean O'Dell - 2003-10-21 06:31:08 ```On Monday 20 October 2003 10:33 pm, Clark C. Evans wrote: > Sean, let me respond to your inquiry with a few definitions: > > graph: A collection of nodes and edges. (FOLDOC) > node: A point or vertex in a graph. (FOLDOC) > directed graph: A graph with one-way edges. (FOLDOC) > connected graph: A graph such that there is a path between any pair > of nodes via zero or more other nodes. (FOLDOC) > colored graph: A graph together with a colouring function which > assigns to each node in the graph a colour. (NIST DADS) > labeled graph: A graph which has labels associated with > each edge or each vertex. (NIST DADS) > function: A rule of correspondence betweeen two sets such > that there is a unique element in the second set > assigned to each element in the first set. (Websters) > (A more boring definition from FOLDOC is below) > rooted graph: A directed graph, with a special 'root' node > such that every node is connected, at least > indirectly, through a series of edges > starting at the root. (my definition) > > In YAML we use these terms with exactly these meanings! > I think of a YAML graph as the 'rooted' variety, that is > a directed graph which is connected. Further, I picture > a YAML graph has having coloured nodes (the data type) > and each edge being labeled. A YAML graph has three > sorts of nodes (vertexes): > > (a) scalar values, where the node has only 'incoming' > edges visualized as arrows pointing to it; > > (b) sequences (arrays); where the 'values' of the sequence > are pointed to by outgoing edges labeled using > subsequent integers > > (c) mappings (hashtables); where the 'values' of the mapping > are pointed to by outgoing edges labeled using, a set > of unique 'keys'. > > The only 'tricky' thing about this visualization is > that the keys are not just limited to strings, but > can in fact be other nodes, including sequences and > hashtables. This is a slightly mind-bending > trick; but it is needed to model complex keys which > are being found with increasing regularity as > dictionary keys. I just don't see why it has to be that complicated. The programmers who look at YAML for their needs already understand scalar, arrays and hashes. Why not just say "this is a set of strings, arrays and hashes?" > On Mon, Oct 20, 2003 at 07:24:44PM -0700, Sean O'Dell wrote: > | Graphs are also "related" values; such as the result set of a function, > | generally not from an arbitrary data set such as in YAML (aside from > | specific specifications). > > I think you are using graph in the following sense: > > 1. A diagram that exhibits a relationship, often functional, between > two sets of numbers as a set of points having coordinates > determined by the relationship. Also called plot. > 2. A pictorial device, such as a pie chart or bar graph, used to > illustrate quantitative relationships. Also called chart. > > (American Heritage Dictionary) > > While this is a good definition, this is not the definition > typically used in 'computer science' or 'abstract math' speak. > The definition of 'graph' in this sense is much more fluid > and fundamental. While a graph (especially directed graphs) > are most often functional in computer systems, they typically > do not have integers or real numbers for vertexes. I must have been under a rock for in my 20 years of programming then =) ; I've never heard any programmers use graph for anything except to show visually the relationship between related values. > | specifications). The purpose of a graph is to show relationship. You > | would draw up a graph to illustrate how the data in a series of YAML > | nodes are related, but the series itself is not a graph. > > Ahh, but the point of using a graph *is* to show relationship > between objects -- that is what computer science is all about. > Two objects are connected when they have something 'in common' > > For example: > > - first: clark > last: evans > - first: oren > last: ben-kiki > - first: brian > last: ingerson > > This specific 'series' is illustrating a relation between three > people; and the subordinate mappings, draw a linkage from a > shared anchor 'first' or 'last' to each one of these people. > It is very much a visual picture. Very much a graph. In > fact, we even have a way to specify a 'path' in the graph: > > /1/first -> 'oren' > > In YPath, a path named with 'edge labels' can be used to > uniquely locate any node in a YAML graph from its root. > This is both simple and powerful. But that's not a relationship between data. All three people in the array are arbitrary values; they have no relationship. The array itself might be the output of a search for colleagues, and then if you created an analysis of them, that might be a graph. But any collection of values is going to be arbitrary, so you wouldn't really extract a graph from them; and you wouldn't really call the data set itself a graph. > If you need more convincing, read up a bit on Petri-Nets, as > these are used heavily in manufacturing to solve all sorts > of scheduling constraints on the production line. It converts > what is essentially an 'algebraic' problem (non-numerical) > into a picture that humans can peek at, poke, and sometimes > even solve. Then the data set is a group of related values, because they're all the result of a single function operating on a set of input values. The resulting data set can be extracted into a picture for human analysis; this is your graph. Data grouped in a YAML document are not necessarily related in any way except that they belong in the document. If they *are* related, then their relationship is defined above the YAML level. > | A YAML element is a value and its associated name, which may be qualified > | with a type identifier and a namespace. A YAML element may be a simple > | string value, an array of elements or an associative hash of elements. > | YAML elements are grouped into sets and may be arranged according to a > | particular specification or arbitrarily. A YAML document is a file or > | other storage unit which contains a set of elements embodied as a YAML > | stream. A YAML stream is a series of Unicode characters which represent > | the YAML elements, plus delimiting characters such as spaces and newlines > | character. > > Well, two problems: > > 1) You use element instead of node (or vertex). Node is just > right for YAML (see above). I prefer node too. > 2) Your definition of element is just plain wrong for YAML > (although it does match how XML uses the word) -- you seem > to be trying to attach the 'label' and the 'value' of > a node together (mixing edges and vertexes). Besides > not accurately modeling native data structures, it > actually prevents some very useful graphs (or you can still > represent things, but it takes more elements). With or without a label; the verbage can be modified. I forgot about non-labelled data. > The rest of your paragaph seems like an alternative way of writing > what I've done, rather than rewriting (which I've done again and again > and again and again based on lots of people's feedback), I could > actually use low-level help with verifying that it reads well and > makes sense. So, for example, if you could pick out particular > sentances and explain why they don't make sense or how they could > be imprroved, this woudl help greatly. Or, alternatively, if you > could suggest a better "outline" this could also be helpful. I'll re-write it and post the rewrite myself. Use portions, discard portions, or discard it all. =) > Thank you so much for thinking about this stuff. It is always nice > to have people following along and poking us so we make sure to > cross all of the eyes and dots to the Ts. > > Best, > > Clark > > PS. Following is the more formal, boring definition of function, > to which both our sequence (array) and mapping (hash) satisfy: > > If D and C are sets (the domain and codomain) then a > function f from D to C, normally written "f : D -> C" > is a subset of D x C such that: > > 1. For each d in D there exists some c in C such > that (d,c) is an element of f. I.e. the function > is defined for every element of D. > > 2. For each d in D, c1 and c2 in C, if both (d,c1) and > (d,c2) are elements of f then c1 = c2. I.e. the function > is uniquely defined for every element of D. YAML is made up of arbitrary sets of data. I can't argue that this is the definition of a function, but YAML data are not the results of a function. It's all totally arbitrary. Sean O'Dell ```