On Fri, Mar 08, 2002 at 01:53:47PM -0500, Oren Ben-Kiki wrote:
| Rolf Veen has asked why use '/' for separating path segments. Good question.
| I did it "for historical reasons" - people are just used to it for "paths".
| At any rate, the choice of characters is less important than the basic
Also, from a regex perspective, the slash '/' is traditionally
used as a top level separator, s/this/that/g
| > Hmm. I think we could use  as the existence test (like XPATH).
| I think RegExps already have a syntax for that. One reason I think we should
| base YAPTH on regexps - just tailor the matching operators with graph paths
| - is that many issues were already hashed to death in regexps. Existance
| test is one. I don't have the perl book handy at home; I suspect that
| looking at the regexp section there would be very educational.
As much as I'd love to do that, I think that it would
be a big research project. I'd rather have two libraries,
the first is a simple incremental path syntax, and the second
is regular expressions where we can use an existing library like
the wonderful perl compatible regular expression library (pcre).
The other thing is that regular expressions are very poor with
arithemetic operations and if we want YAML to be used for business
(accounting, bookkeeping, timekeeping) we want to have a full suite
of mathematical operators (preferably using an existing library).
Thus, I'd like to distinguish between a few "sub-parts" of YPATH:
1. The path segment approach, graph oriented navigation,
primarly based on key values.
2. Regular expressions, primarly applied on the *contents*
of a particular leaf node. Most map keys will be fixed by
a schema, thus using regular expressions on them make little
sense... however, built-in regex support for leaves is cool.
3. Full support for boolean and arithemetic operations.
To this end, I was thinking the path sement approach could
largly borrow from XPath. IMHO, Xpath is one of the big
successes of the W3C and we'd do best to emulate it rather
than invent our own. For what it does, XPath is amazing.
Furthermore, many people know it. Now... that said the only
real "big" blemish in XPath is the set-equivalence operator,
which is denoted = and is true when two node sets have one
or more overlapping nodes. This is not intuitive, the operator
is good, but using the '=' indicator is a poor choice at best.
XPath distinguishes between (1) and (3) by using the
predicate notation . IMHO, this will work wonderfully.
When I get time (in a week or so), I'll map out the
archecture of an XPath processor... and how it could
be modified for YPath.
| > I was thinking that /xxx would only match on keys and
| > not on values. If you want ot match on values, I think
| > using the exists operator would be ideal.
| There is a difference between matching the value and matching the key that
| has that value...
Absolutely. Keys will be rather "static" and we don't
need regular expressions for them. Values, on the other
hand, are exactly what pcre is made forr! Since they
are leaf values... we can use pcre directly without
| > # /two matches "world"
| > ---
| > one: hello
| > two: world
| > ...
| If so, what matches "two"? I agree there's an ambiguity...
| perhaps we should write something like
I'm not interested in matching "two". Most of the time
I want the value... that said, I think "two" would be
somehow stuffed into the node's _context_. How context
is structured will have to be explored.
| > Close. This would be /[transfer("int")] since transfers
| > belongs in the predicate.
| The way I see it, the whole thing is a predicate... I don't see why I need
| to make the distinction. in "/key", "key" is a predicate which happens to
| match any !str-typed node with the value "key". See below...
The path expression returns a node-set. The predicate returns
a boolean value. They are different. You want to separate the
path engine from the predicate engine or it'll be a mess...
| Hmmm. So this way you could, "logically" anyway, evaluate a YPATH piece by
| piece - each step giving the "context" of matching nodes to the next step.
Right. It is recursive descent. You actually need generators
to implement this efficiently when you add the union operator
into the mix.
| I see potential difficulties with this, mainly that it would raise the
| expectation that this "context" would be a manipulatable piece of any
| implementation (e.g., being able to give a context as an argument in the
| YAPTH API). Did this happen with XPATH? Is it truly necessary?
A context is always necessary, it is just usually an opaque handle.
| You focus on the incremental nature of YPATH - working segment by segment,
| using context as the glue between consecutive steps, and employing patterns
| as a slave mechanism applied separately at each step.
Right. This has a simple and relatively efficient implemenation
without requiring a schema.
| I wanted to tackle the whole monster in one fell swoop by saying "YPATH is a
| kind of regexp". In this view '/' is just a regexp operator, and
| *everything* is a pattern. To formalize a bit: each <graph, starting node>
| define an (enumerable) set of paths (walks through the graph) starting at
| that point. The YPATH regexp pattern is a boolean function which says
| "match" or "doesn't match" to each of these paths. Each such path ends at
| some point ("existance" is trivially handled by going down-then-up, etc.).
| Thus the function defines a set of nodes - these that are at the end of
| paths that "match". This is the result of applying the YPATH to the graph
| (and starting point).
I don't think the two views are mutually exclusive. I know
I can implement the second view though.... and I have experience
doing it (with XPath).
| Now, regexps are a thoroughly investigated field... both in theory and
| practice (e.g., there's a well-known syntax for specifying regexps we can
| build on instead of inventing new ways to do existance tests, ORs, ANDs,
| partial matches; there are excellent libraries implementing various types of
| regular expressions which we may be able to build on; and so on).
Right. And I'd rather just use regexps on leaf values and not
make a research project... although research projecs are fun.
| I'm well aware that XPATH does it your way :-) and I'll admit to being
| somewhat naive about the implications of using regexps vs. "context
| refinements" as the basic mechanism. I suspect the two are equivalent at
| some level - though the syntax would probably end up being different...
Right. Perhaps a different syntax, perhaps not. There *is* a
difference between node trees that are part of the predicate
and those that are not. In the former, you are only testing
for existance, in the latter you are returning a node set.
There are also many other differences between the two which
| I think the issue deserves some discussion, before we delve into the
| nitty-gritty of whether we say /.../ or /down(>0)/ or whatever.
Yep. Unfortunatly, I don't have a ton of time. Could we just
focus on the parser for a while? YPATH is going to be hard
to work out without an active implementation...