Thread: RE: [Yaml-core] Thinking about YPATH

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Rolf Veen has asked why use '/' for separating path segments. Good question.
I did it "for historical reasons" - people are just used to it for "paths".
At any rate, the choice of characters is less important than the basic
approach.

Clark C . Evans wrote:
> Hmm.  I think we could use [] as the existence test (like XPATH).

I think RegExps already have a syntax for that. One reason I think we should
base YAPTH on regexps - just tailor the matching operators with graph paths
- is that many issues were already hashed to death in regexps. Existance
test is one. I don't have the perl book handy at home; I suspect that
looking at the regexp section there would be very educational.

> I was thinking that /xxx would only match on keys and
> not on values.  If you want ot match on values, I think
> using the exists operator would be ideal.

There is a difference between matching the value and matching the key that
has that value...

> # /two matches "world"
> ---
> one: hello
> two: world 
> ...

If so, what matches "two"? I agree there's an ambiguity... perhaps we should
write something like

/=12
matches
--- 12

To disambiguate leaf matching from key matching. Hmmm.

> | /transfer(int)
> 
> Close.  This would be /[transfer("int")]  since transfers
> belongs in the predicate.

The way I see it, the whole thing is a predicate... I don't see why I need
to make the distinction. in "/key", "key" is a predicate which happens to
match any !str-typed node with the value "key". See below...

> | Question#2: Does
> | 
> | /transfer(ip)
> | match
> | --- 1.2.3.4
> 
> I think so.  If the engine/parser doesn't have the "ip"
> method registered, then transfer("ip") will be an error.

That's not that nice... I may still want to be able to match on nodes with
explicit "!ip" transfer, even if the engine doesn't recognize it. Perhaps we
need two "transfer()" operators, one for explicit and one for implicit? This
is getting sticky, fast...

> No wildcard transfer methods. 

I suppose it makes some sense, except for one possible very useful use case
- suppose I want to match any HTML element, it would be very nice to be able
to write:

/transfer("http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd^")

Matching any transfer by prefix. Again, that's pretty straight-forward for
explicit transfers, but is hell for implicit ones. The swamp is getting
deeper...

> | Question #4: Given we have series/sequences, should we add regexp
> | operators
> | >int, >=int, <int, <=int, !=int?
> 
> Yes, but in the predicate []

Predicates again... See below.

> | Question #5: Is this set ordered? Think of it... it is tricker
> | than it seems at a first glance.
> 
> I was thinking the result of a YPath expression is a 
> unordered stack of nodes, that I'll call a context.
> 
> As for ordering... I think we can add some
> sort of "ORDER BY".  Having the nodes ordered implicitly
> causes alot of work by the xpath processor that is often
> unnecessary in XML (and probably impossible in our case).

This seems like an area we'll be spending quite a bit of time hammering out
the details of. Shudder.

> | Question #6: How about prefixes? Do we allow this:
> | 
> |
> /transfer("http://company.tld/whatever^type")/transfer("^another-type")
> 
> This could work.

I hope so. Imagine the pain matching HTML elements otherwise. In fact, even
*with* it.

> | So, sticking with absolute paths, it seems as though parent() is well
> | defined.

IMPORTANT: In this context parent() is well defined for general graphs!!!

> XPATH has the notion of "axis", thus each path segment
> is (axis '::')? node-expr -- I'm not sure if this fits us.

Neither do I... the syntax is interesting however. Food for thought.

> Is it possible to use ".." for parent? and "." for
> current node?  I know both of these are regex... however.

Right. Also, they are hardly enough.

> | In a sequence, we need to specify next()/prev() entry in a series:
> | ...
> | Likewise we can define 'before()' and 'after()'.
>
> Hmm.  Unlike XML our tree is not ordered (although it
> can be ordered by the keys).  This requires some thought.

Sequences *are* ordered even in the graph model!

> | Question #7: Using any of these function() operators (except for
> | transfer())
> | means that the path is no longer "simple". This has implications in
> | terms of
> | how easily it can be implemented, whether it can be used in a
> | streaming
> | application, etc. I think we should formally define random-access vs.
> | streaming paths.
> 
> Right.

Yes. Sigh. Another complication.

> | Question #8: There are many functions which can be cast in the form
> | <direction>(<distance>)...
> 
> Hmm.

Couldn't phrase it better myself :-)

> | Question #9: Do we do it the Perl way ("there's more than one way")
> | or the
> | Python way ("there's the right way")...
> 
> I think ".." is fine, "..." and "...." tip the scale.

I guess we'll have to see how it all fits together before we get into this
sort of detail.

> | Question #10: obviously the above provides a natural syntax for them.
[Relative paths - not starting with '/']
> | However, in a relative path, going up above the starting point isn't
> | well-defined in the graph model.

This is the only case where working in the graph model gives you trouble,
and only in the limited case where you go up "above" your starting point.

> | So... how to we handle this? Define two
> | classes of relative paths, one which are safe in a graph model and
> | ones that aren't?
>
> The result of a YPath evaluation is a unordered set of contexts
> (a context is a stack of nodes, aka path).
> 
> So, to evaluate a "relative" path, one must pass in a context,
> not just a node.  Note that any node is a context with itself
> as the "top" node.

Hmmm. So this way you could, "logically" anyway, evaluate a YPATH piece by
piece - each step giving the "context" of matching nodes to the next step.

I see potential difficulties with this, mainly that it would raise the
expectation that this "context" would be a manipulatable piece of any
implementation (e.g., being able to give a context as an argument in the
YAPTH API). Did this happen with XPATH? Is it truly necessary?

>| Question #11: How do we handle !include?
> 
> Good question.  #include is non-trivial (see XML Include)
> 
> | Hmmm. YPATH isn't that simple after all, it seems.
> Neither is !include :-)
> 
> Nope. YPATH is a 6 month project.  And its best to define it
> as we implement it so that we can always play with it...

Seems thay way. We do need to agree on the basics. We seem to have two
alternative approaches as to the "infrastructure" used.

You focus on the incremental nature of YPATH - working segment by segment,
using context as the glue between consecutive steps, and employing patterns
as a slave mechanism applied separately at each step.

I wanted to tackle the whole monster in one fell swoop by saying "YPATH is a
kind of regexp". In this view '/' is just a regexp operator, and
*everything* is a pattern. To formalize a bit: each <graph, starting node>
define an (enumerable) set of paths (walks through the graph) starting at
that point. The YPATH regexp pattern is a boolean function which says
"match" or "doesn't match" to each of these paths. Each such path ends at
some point ("existance" is trivially handled by going down-then-up, etc.).
Thus the function defines a set of nodes - these that are at the end of
paths that "match". This is the result of applying the YPATH to the graph
(and starting point).

Now, regexps are a thoroughly investigated field... both in theory and
practice (e.g., there's a well-known syntax for specifying regexps we can
build on instead of inventing new ways to do existance tests, ORs, ANDs,
partial matches; there are excellent libraries implementing various types of
regular expressions which we may be able to build on; and so on).

I'm well aware that XPATH does it your way :-) and I'll admit to being
somewhat naive about the implications of using regexps vs. "context
refinements" as the basic mechanism. I suspect the two are equivalent at
some level - though the syntax would probably end up being different...

I think the issue deserves some discussion, before we delve into the
nitty-gritty of whether we say /.../ or /down(>0)/ or whatever.

Thoughts?

Have fun,

    Oren Ben-Kiki

Thread: RE: [Yaml-core] Thinking about YPATH

yaml-core