From: Oren Ben-K. <or...@ri...> - 2002-03-08 08:27:42
|
Assuming we've settled on YAC#22... I could update the spec this weekend. As for YPATH, I want to suggest the skelaton of an approach. It has some rough edges... The basic approach is, of course, a '/' separated path. A single '/' specifies the root document. I'll ignore paths without a leading '/' for now - we need to consider them in the light of both !include and the ened for relative path names. At any rate... Let;s consider each YAPTh to be a regexp pattern. The '/' would be an operator which says "look for a child node matching the rest of the regexp". This makes it easy to write things like /pat1(/pat2|/pat3/pat4)/pat5 Etc. Easy cases: /value matches --- value /12 matches --- 12 However, we need to allow regular expressions. So in YPATH there are more characters which need to be quoted: /, *, etc. So: /val.*e matches --- value /"/value" matches --- /value OK, that's easy enough. Now let's het trickier. There's the transfer method issue. YPATH should allow specifying just the transfer type: /!int matches --- !int 12 Question #1: Oops, '!' is already used by regexp. Should we bite the bullet and use /transfer(int) instead? This has the advantage it makes quoting transfers easier, as in /transfer("http://...") As opposed to: /!http:\/\/... I'll assume function-style syntax for all operations for now... we can see later whether we can use special markers (I kind of doubt that). Question#2: Does /transfer(ip) match --- 1.2.3.4 Remember, your YPATH engine may not know about the 'ip' implicit type. Question#3: Does /transfer(http:) match --- !int 12 That is, are wild cards allowed in the transfer pattern? This makes a lot of trouble (depending on the answer to question #2, a pattern in the transfer method may be extremely hard to match against). Question #4: Given we have series/sequences, should we add regexp operators >int, >=int, <int, <=int, !=int? />1 matches --- - not this - and not this - but this - and this That seems to be it as long as we are matching against a top-level leaf value... correct? OK< branches. First, going down the hierarchy is a matter of specifying patterns matching the keys: /a/2/3 matches --- a: - no match - no match - 3: pattern matches this value The result of a match becomes a set of nodes: /*/1/* matches --- a: 1: this value b: 1: and this value Question #5: Is this set ordered? Think of it... it is tricker than it seems at a first glance. Question #6: How about prefixes? Do we allow this: /transfer("http://company.tld/whatever^type")/transfer("^another-type") I think the above covers all "simple", absolute paths. The tricky parts start when we deal with relative paths. Any other issues you can see? Note that specifying an absolute pathestablishes a parent even in a graph model: /a/b for: --- a: &B b c: *B So, sticking with absolute paths, it seems as though parent() is well defined. ancestor(), descendant() are also easy; "child()" is merely '*' in a different guise: /*/b/parent() matches --- this: b and this: b not this: c Going up isn't enough, we also need to go "sideways". Moving sideways to a different key is easy: /*/b/parent()/c matches: --- a: b c: this key In a sequence, we need to specify next()/prev() entry in a series: /*/b/parent()/prev() matches --- - this - b Likewise we can define 'before()' and 'after()'. Question #7: Using any of these function() operators (except for transfer()) means that the path is no longer "simple". This has implications in terms of how easily it can be implemented, whether it can be used in a streaming application, etc. I think we should formally define random-access vs. streaming paths. Question #8: There are many functions which can be cast in the form <direction>(<distance>), such as: up(1) - parent() up(>0) - ancestor() up(>0) - ancestor-or-self Etc. We should probably offer the general form... does using '>' '>=' '<' '<=' make sens to you in this context? Does this mean that: next(>3&<10) should be allowed? next(>3)&next(<10) would be allowed anyway, I guess, but is less clear I think. Question #9: Do we do it the Perl way ("there's more than one way"( or the Python way ("there's the right way"). Example: do we allow /../ as a shorthand for /up(1)/? How about /.../ as a shorthand for /down(>0)/, or /..../ as a shorthand for /up(>0)/? Remember * is already a shorthand for down(1)... Any other issues with absolute paths? About relative paths... Question #10: obviously the above provides a natural syntax for them. However, in a relative path, going up above the starting point isn't well-defined in the graph model. So... how to we handle this? Define two classes of relative paths, one which are safe in a graph model and oens that aren't? Question #11: How do we handle !include? Brian suggested that we just take relative paths and apply them to the "sequence of documents", as in: !include some-url#>2&<5 This makes intuitive sense. The wording would be tricky. Also, what happens if I include: !include some-url#/*/b Do I get a sequence? That ties to the question of order above... Does it still make sense to define both !include-seq and !include-leaf? Is there sense in defining !include-map? Hmmm. YPATH isn't that simple after all, it seems. Neither is !include :-) Have fun, Oren Ben-Kiki |