Re: [Yaml-core] A beginning of a YPath spec

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi, Trans.

Thanks for your feedback. I'll snip the areas not commented on and get
to the nitty gritty.

>> All parts of a YAML document are represented in its representation
>> graph, and YPath should be able to access any node inside it.
>> Moreover, it should be possible to write YPath expressions that access
>> exactly one node inside it chosen by the user.
>> One difference between YAML and most other data formats is that it can
>> represent composite keys inside mappings. YPath should be able to
>> access the node for such keys, and it should be able to access the
>> node for its matching value inside the mapping. Finally, YPath should
>> be able to access any node within composite keys.
>
> Nice. I'm glad to see this is the first requirement.
>

I thought it would be necessary. Most user cases would be "Find value
X in mapping Y where key is Z", but some people may be interested in Z
(or what's in Z). A lot of the existing YPath syntax didn't cater for
this.

>> 1.1.5   YPath should not be restrictive
>>
>> YPath expressions should be based on the full set of data in the
>> representational graph model. For example, all nodes have a tag, so
>> expressions should be able to target particular nodes based on that
>> tag. In addition, regular expressions should be a part of the
>> language; simple equality checking would not be flexible enough for
>> many users.
>
> Only thing to note about regular expressions is that they can differ
> somewhat between implementations. That's not a big deal since the main
> parts are the same but it's something to be aware of. I'm currently
> working on a second-tier of support for YES to satisfy  Ingy's use
> cases using a variation on Cobol Edited Pictures. That might be useful
> here too and we can standardized that notation.
>

I might have to take the important part of the Python regex syntax
(the language I'm most comfortable with), and run it by you and others
to see if there are major disagreements. Unicode support is going to
be tricky. Would something like "[Α-Ωα-ω]+" throw up a problem in
implementations?

I would like to look at the second-tier of support (if if doesn't
validate any confidentiality agreements).

>> 1.1.6   YPath results are also YAML documents
>>
>> Sounds crazy, but think about it. If YPath returns node-sets, then one
>> can add a node representing the node-set, with references to its
>> content nodes. Ergo: the YPath result is also a representation graph.
>> This allows one to “Construct” it into a native structure.
>> Alternatively, by “Serializing” and “Presenting”, one can make another
>> YAML document. See 3.1 of the YAML spec.
>
> Any data can be represented as YAML, so I'm not sure how this sounds
> crazy. I expect the results to be a sequence of node objects. If I
> wanted to do `results.to_yaml` no problem. So I take it you mean you
> mean that results of `to_yaml` here should give a standard output
> across all implementations of YPath?
>

I think what I meant to get across is that the output of YPath
expressions generally will be YAML documents, rather than simple
strings. Are they going to be identical across implementations? Maybe
if canonical form is used.

>
> I could see that working a few ways b/c you might want all nodes from
> all documents, or you might want the matching nodes grouped by
> document. Or, in the case of a schema the schema should only apply to
> a matching document --this is the one aspect of YES I'm still not
> clear on how to handle.
>

I'm not clear on it either. YPath could have a "stream[x]/..." (x in 0
to n-1, where n is the number of docs) as part of the syntax. Or the
processor could be given a separate command line argument. This could
include "all documents matching a YES schema).

>> 7.      The nodes in an XML document have a clear hierarchical relationship
>> between them. For example, one element can contain another or the
>> reverse, but it is impossible for two elements to contain each other
>> at the same time. In contrast, it is possible for nodes to contain
>> themselves through the use of anchors and aliases. This makes it easy
>> to map “child” or “descendant” relationships, but less easy to find
>> “parent” or “ancestor” relationships. For this reason, YPath does not
>> support these types of axes.
>
> Note sure I fully understand these this point. By parent and ancestor
> do you mean relationships via anchors and references?
>

Yes. To be more precise (using "iff" as "if and only if"):

node A is a child of node B iff:

(a) B is a sequence, and A is an element in the sequence B, or:
(b) B is a mapping, and A is a key and/or a value in the mapping B.

node A is an parent of node B iff B is a child of A.

node A is a descendant of node B iff there exists nodes X1, X2, X3...
Xn such that:

(a) X1 is a child of B; and,
(b) Xy+1 is a child of Xy for y in {1, 2, ... n-1}; and,
(v) A is a child of Xy.

node A is an ancestor of node B iff node B is a descendant of node A.

Nodes - sequences and mappings - need record what other nodes they
contain, so finding children and descendants should be easy to do;
just descend through the tree. However,  YAML implementations may not
record the reverse relationship of ancestors and parents.

It could be supported, but I would prefer to make finding ancestors
and parents an optional part of the spec, in order to reduce overhead.

>> 1.4     Implementation
>>
>> A limited implementation can be found here:http://pyyaml.org/browser/trunk/TestingSuite/ypath.yml?rev=71
>
> What about a section for set operations? Will YPath support union,
> intersection, etc in the notation? Or will that have to be handled by
> the underlying programming language?
>

Oh, yes! Union, Intersection, and other set operations will be supported.

> Also, things like first-child, last-child, etc. I wonder how these
> might be handled.
>

First-child and last-child would only be valid for sequences; mappings
are unordered. So I'd use indices of 0 and -1 respectively.
Provisional YPath syntax: [0], [-1] - subject to change!

As for sibling relationships (another thing I was leery of borrowing
from XPath), I'd have:

A is a next sibling of B iff there exists a sequence C and an integer
n such that:

C[n] = B and C[n+1] = A

A is a previous sibling of B iff B is a next sibling of A.

But I'd prefer to have this as an optional part of the spec.

> I know syntax isn't the main focus here, but it's hard to stay away
> form when we get into the nitty gritty of these things. One of those
> that I've been thinking about are tag references using prefixed `!`,
> e.g. //!!str for all strings in a document (assuming `//` means what
> it does in XPath).
>
Not a bad idea. I'll add it.

> Super great start on this, btw. Looks very promising. I'd recommend
> setting up some type of dev project for it where a reference design
> can be developed.
>

That would be lovely. Keep the ideas coming!

Best regards,
Peter

-- 
Email: pet...@gm...
WWW: http://www.pkmurphy.com.au/