[Yaml-core] Thinking about YPATH

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Assuming we've settled on YAC#22... I could update the spec this weekend.

As for YPATH, I want to suggest the skelaton of an approach. It has some
rough edges...

The basic approach is, of course, a '/' separated path. A single '/'
specifies the root document. I'll ignore paths without a leading '/' for now
- we need to consider them in the light of both !include and the ened for
relative path names. At any rate...

Let;s consider each YAPTh to be a regexp pattern. The '/' would be an
operator which says "look for a child node matching the rest of the regexp".
This makes it easy to write things like

/pat1(/pat2|/pat3/pat4)/pat5

Etc.

Easy cases:

/value
matches
--- value

/12 matches
--- 12

However, we need to allow regular expressions. So in YPATH there are more
characters which need to be quoted: /, *, etc. So:

/val.*e
matches
--- value

/"/value"
matches
--- /value

OK, that's easy enough. Now let's het trickier. There's the transfer method
issue. YPATH should allow specifying just the transfer type:

/!int
matches
--- !int 12

Question #1:

Oops, '!' is already used by regexp. Should we bite the bullet and use

/transfer(int)

instead? This has the advantage it makes quoting transfers easier, as in

/transfer("http://...")

As opposed to:

/!http:\/\/...

I'll assume function-style syntax for all operations for now... we can see
later whether we can use special markers (I kind of doubt that).

Question#2: Does

/transfer(ip)
match
--- 1.2.3.4

Remember, your YPATH engine may not know about the 'ip' implicit type.

Question#3: Does

/transfer(http:)
match
--- !int 12

That is, are wild cards allowed in the transfer pattern? This makes a lot of
trouble (depending on the answer to question #2, a pattern in the transfer
method may be extremely hard to match against).

Question #4: Given we have series/sequences, should we add regexp operators
>int, >=int, <int, <=int, !=int?

/>1
matches
---
- not this
- and not this
- but this
- and this

That seems to be it as long as we are matching against a top-level leaf
value... correct?

OK< branches. First, going down the hierarchy is a matter of specifying
patterns matching the keys:

/a/2/3
matches
---
a:
 - no match
 - no match
 -
   3: pattern matches this value

The result of a match becomes a set of nodes:

/*/1/*
matches
---
a:
  1: this value
b:
  1: and this value

Question #5: Is this set ordered? Think of it... it is tricker than it seems
at a first glance.

Question #6: How about prefixes? Do we allow this:

/transfer("http://company.tld/whatever^type")/transfer("^another-type")

I think the above covers all "simple", absolute paths. The tricky parts
start when we deal with relative paths. Any other issues you can see?

Note that specifying an absolute pathestablishes a parent even in a graph
model:

/a/b
for:
---
  a: &B b
  c: *B

So, sticking with absolute paths, it seems as though parent() is well
defined. ancestor(), descendant() are also easy; "child()" is merely '*' in
a different guise:

/*/b/parent()
matches
---
this: b
and this: b
not this: c

Going up isn't enough, we also need to go "sideways". Moving sideways to a
different key is easy:

/*/b/parent()/c
matches:
---
a: b
c: this key

In a sequence, we need to specify next()/prev() entry in a series:

/*/b/parent()/prev()
matches
---
- this
- b

Likewise we can define 'before()' and 'after()'.

Question #7: Using any of these function() operators (except for transfer())
means that the path is no longer "simple". This has implications in terms of
how easily it can be implemented, whether it can be used in a streaming
application, etc. I think we should formally define random-access vs.
streaming paths.

Question #8: There are many functions which can be cast in the form
<direction>(<distance>), such as:

up(1) - parent()
up(>0) - ancestor()
up(>0) - ancestor-or-self

Etc. We should probably offer the general form... does using '>' '>=' '<'
'<=' make sens to you in this context? Does this mean that: next(>3&<10)
should be allowed? next(>3)&next(<10) would be allowed anyway, I guess, but
is less clear I think.

Question #9: Do we do it the Perl way ("there's more than one way"( or the
Python way ("there's the right way"). Example: do we allow /../ as a
shorthand for /up(1)/? How about /.../ as a shorthand for /down(>0)/, or
/..../ as a shorthand for /up(>0)/? Remember * is already a shorthand for
down(1)...

Any other issues with absolute paths?

About relative paths...

Question #10: obviously the above provides a natural syntax for them.
However, in a relative path, going up above the starting point isn't
well-defined in the graph model. So... how to we handle this? Define two
classes of relative paths, one which are safe in a graph model and oens that
aren't?

Question #11: How do we handle !include? Brian suggested that we just take
relative paths and apply them to the "sequence of documents", as in:

!include some-url#>2&<5

This makes intuitive sense. The wording would be tricky. Also, what happens
if I include:

!include some-url#/*/b

Do I get a sequence? That ties to the question of order above... Does it
still make sense to define both !include-seq and !include-leaf? Is there
sense in defining !include-map?

Hmmm. YPATH isn't that simple after all, it seems. Neither is !include :-)

Have fun,

    Oren Ben-Kiki