From: Oren BenKiki <orenbk@ri...>  20021129 12:32:51

I'd like to revive the YPATH thread. Before jumping into details, however, I'd like to establish some common ground first. We have talked a lot about "the graph" we are going to use when we define our generic tools. From my talks with Clark and Brian it seems that regardless of the formal defintion, not everyone has the same notion of what this graph looks like. As the purpose of YPATH is to describe paths in the graph, this has lead to some misunderstandings. So, a brief informal tour into graph theory and its application to YAML... Mathematically, a directed graph is a collection of vertices and edges: G = (V, E), where each edge has a source and target vertice (E is a subset of V x V). Applied to YAML, each node is obviously a vertice, and edges are defined from collections to contained nodes. In addition, both the edges and the vertices may have arbitrary properties. For YAML, the node (vertice) properties are defined in the spec  type family, kind, format, alias, style, etc. There are some subtelties involved, best described using examples. The most trivial nonempty document has just one node: # Document 1  &HELLO "hello" ... And in a graph form there's only one vertice (let's call it after the alias): # Graph 1 HELLO The next step is a simple collection: # Document 2  &MAP &K1 "k1" : &V1 "v1" &K2 "k2" : &V2 "v2" ... Note that K1 and K2 are nodes like any others, and hence deserve their own graph vertices. In graph form this would NOT be: # Graph 2.1 MAP  V1 \  \ K1 / \ K2 V2 The above is emphatically _not_ a "directed graph". I suppose that graph theory has some way to handle this, but my bet is that the way to do it is to convert it to some normalised graph form, so we might as well save ourselves the trouble and start with one. Another way the graph form can NOT take is: # Graph 2.2 MAP  K1  V1 \ \ K2  V2 To show why this doesn't work, let's use a slightly more complex example: # Document 3  &MAP1 &K1 "k1": &MAP2 &K2 "k2" : &V2 "v2" *K2: &V3 "v3" ... The above is a valid YAML document. When we try to cast it into graph form, we run into trouble: # Graph 3.1 MAP1  K1  MAP2  K2  V2 \______________/ \ \ V3 This does not preserve the information that V2 is only accessible through K2 when K2 is accessed through MAP2, and that V3 is only accessible through K2 when K2 is accessed directly through MAP1. (For the nitpickers: for scalar nodes, as K2 is above, we could simply introduce a copy of K2  call it K3  and resolve the problem this way. However, YAML allows for collection keys, for a good reason, so this doesn't solve the problem in general). To fully preserve the document semantics the graph must look like: # Graph 3.2 DOC  (P0)  K1 \ \ \ \ (P2) MAP1 / \ \ / \ \ V3 K2  (P1)  V2 This requires the introduction of "virtual vertices" for each key/value pair (P0  P2). Note that for the original simple example, the graph would look like: # Graph 2.3 MAP  (P0)  V1 \ \ \ \ (P1) K1 / \ / \ K2 V2 Which is exactly the same as graph 2.1, but with vertices added to "normalize" the graph. This seems like the simplest thing that can possibly work. To be able to describe every YAML document as a graph, we also need to handle sequence collections: # Document 4  &SEQ  &V1 "v1"  &V2 "v2" ... The simplest possible way to cast this to a normal directed graph is: # Graph 4.1 SEQ  V1 \ \ V2 This graph, however, does not preserve the index associated with each entry. Similarly to the case for mappings, there are several options. The index can be associated with the edge (similarly to graph 2.1): # Graph 4.2 SEQ  V1 \  \ 0 / \ 1 V2 It can be viewed as a virtual vertice for each value (similarly to graph 2.2): # Graph 4.3 SEQ  (0)  V1 \ \ (1)  V2 Or it can be viewed as a virtual key (similarly to graph 2.3): # Graph 4.4 SEQ  (P0)  V1 \ \ \ \ (P1) (0) / \ / \ (1) V2 While the latter view is the most complex one, it has a big advantage of allowing sequences and mappings to be treated uniformly. YPATH must allow us to select a value nodes in a mapping based on the value of the key node. For example, it should be possible to obtain the value "1" out of the following document by asking for the value associated with the key "x":  x: 1 y: 2 ... Similarly, it should be possible to obtain the value "a" out of the following document by asking for the sequence entry with the index 0:   a  b ... If we use graph 4.4 to describe the second document, we can use the exact same YPATH mechanism in both cases. So much for graph theory. By definition, YPATH is a mechanism for selecting paths (sequence of edges) in the graph. Hence we must to agree on what this graph looks like. As you can see, while it is possible to view any YAML document as a graph, the result isn't as obvious as we would have like it to be (all these "virtual vertices"). This raises a metaissue. Maybe we should rethink the name "graph model"... perhaps the term "functional model" is more suitable. The problem is, I'm not aware of a construct that is similar to a "graph" but that uses a "function" instead of a "vertice" as its basic building block... Nevertheless, that's the best fit for YAML data (and, by implication, for most data represented in computer programs). Surely someone did some work on this? The closest I can come is some vague recollection of the "web" database model. That was mostly forgotten and then reimplemented by OODBs. I never saw a good formalization of that... Thoughts? At any rate, assuming we stick with a graph as the basic construct, I suggest that we use the view demonstrated by graphs 2.3, 3.2 and 4.4 as a common ground when discussing YPATH. This view does not promote or constrain the YPATH syntax/semantics in any way. It is just to allow everyone to define what the proposed syntax/semantics mean in an unambiguos way. Have fun, Oren BenKiki 