From: Oren Ben-K. <or...@ri...> - 2002-08-05 07:10:30
|
Clark C . Evans [mailto:cc...@cl...] wrote: > summary: > > > This is a brief introduction to YPATH as inspired by XPATH, the > XML Path node selection language. XPATH has a very rich history > and is, IMHO, one of the better things to emerge from XML land. Nice work, Clark. It will take a bit of work to get the details exactly right but I like the approach of YPATH's output being a list of *paths* (a better name, I think, than "contexts"). It makes things clear and simple all around. here are some first-impression observations: - Why do you need the { key: ~, value: *ROOT } pair? It is the first entry in *every* path, and as such it seems it can be omitted. Further, I want to be able to "apply-path-to-graph(path, graph)". Placing a reference to the root at the start of the path kills this useful ability. - Structured keys will make this... interesting... Probably something like ".../[predicate-on-key]/..." would work. You'd probably need to store the entire key inside the resulting path. - Note it is OK to put in the resulting paths a reference to any immutable node in the graph (which would cover most scalars and, hopefully, most structured keys). Mutable stuff needs to be copied, I'm afraid... - There are issues of quoting to be resolved... you wrote: > - item: [] > role: > > This describes a predicate. The predicate acts as > a "filter" on the current selected contexts, knocking > out those that don't return true. Within a predicate > strings are treated as literals, unless prefixed with > ./ or / or some other way that identifies a subordinate > path expression. Which implies some sort of different quoting/interpretation rules for strings inside predicates. I'm not certain what you have in mind, but I think the rules should be the same inside and outside predicates. - You also need predicates matching on type; Boolean operations (and/or/etc.); relative positions in lists; and probably other embellishments. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-08-05 15:21:02
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | - Why do you need the { key: ~, value: *ROOT } pair? It is > | the first entry > | in *every* path, and as such it seems it can be omitted. > | Further, I want to > | be able to "apply-path-to-graph(path, graph)". Placing a > | reference to the > | root at the start of the path kills this useful ability. > > You need to have the root node stored since inside of a YPATH you > may need to jump up to the "top" of the tree within a predicate. I don't see how it follows... I re-thought about it, however, and I see there's not much point in applying to one graph the results of applying a YAPTH to another graph. What would be interesting would be to convert the results of a YPATH into a simpler YPATH, for example /a/../b/c into /a/c. That said, there is still a question of how to represent a path returned by the YPATH engine. Your answer was "a sequence of { key: ..., value: ... } pairs". Which leaves open the questions of: - How to represent going into a list (one possible answer - { key: ~, value: <index> }). - What happens when selecting a *key node* rather than a value node? - And, of course - what happens with structured keys. > | > - item: [] > | > role: > > | > This describes a predicate. The predicate acts as > | > a "filter" on the current selected contexts, knocking > | > out those that don't return true. Within a predicate > | > strings are treated as literals, unless prefixed with > | > ./ or / or some other way that identifies a subordinate > | > path expression. > | > | Which implies some sort of different quoting/interpretation > | rules for > | strings inside predicates. I'm not certain what you have in > | mind, but I > | think the rules should be the same inside and outside predicates. > > The quoting rules are the same. The interpretation will be different > as anything that must be a "path" inside a predicate must start with > a './' or '/' or '../' or some other path-ish marker. I see; as opposed to the start of the whole YPATH expression that may start with a word (key value). Is this distinction really called for? Can foo[bar] mean anything else other than /foo[./bar]? BTW, this touches on David's "default is string" issue. Obviously in YPATH strings values containing a '/' would have to be quoted. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-08-05 16:20:36
|
On Mon, Aug 05, 2002 at 06:22:19PM +0300, Oren Ben-Kiki wrote: | - How to represent going into a list (one possible answer - { key: ~, value: | <index> }). For the 3rd item in a list, anchored &NODE, the matching segment would be: { key: <index>, value *NODE } | - What happens when selecting a *key node* rather than a value node? Neither key nor values are selected. The result is a stack of key/value pairs; thus the { key: ..., value: ... } notation. | - And, of course - what happens with structured keys. Yes, this is problematic on many fronts. It requires tha the result "segment" become a three tuple key/value/switch where switch says if the key or the value is "selected" for child recursion. It's also problematic in how to denote this within the path syntax. Perhaps we could use ^ as a replacement for / when navigating down the key side of things... /key^sub-key/value | > The quoting rules are the same. The interpretation will be different | > as anything that must be a "path" inside a predicate must start with | > a './' or '/' or '../' or some other path-ish marker. | | I see; as opposed to the start of the whole YPATH expression that may start | with a word (key value). Is this distinction really called for? Can foo[bar] | mean anything else other than /foo[./bar]? Well, it could mean a predicate evaluated from a string "bar" which is always true. The current Python implementation uses foo[bar] to test for the existence of bar; but really it doesn't hurt in this case to write foo[./bar] and it is probably more clear. | BTW, this touches on David's "default is string" issue. Obviously in YPATH | strings values containing a '/' would have to be quoted. Yep. It would be cool to have the implicit rules the same for both ypath and yaml... but we diverge here rather quickly so its probably not going to work perfectly. ;( Clark |
From: Oren Ben-K. <or...@ri...> - 2002-08-05 16:59:08
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | - How to represent going into a list (one possible answer - > | { key: ~, value: <index> }). > > > For the 3rd item in a list, anchored &NODE, the matching > segment would be: > > { key: <index>, value *NODE } Ah. Right. But... > | - What happens when selecting a *key node* rather than a value node? > > Neither key nor values are selected. The result is a stack > of key/value pairs; thus the { key: ..., value: ... } notation. *Boggles*. I can't use YPATH to select a key node? I don't think this is reasonable. One *must* be able to use YPATH to specify *any* node in the graph, key nodes included. I suggest that a trailing '/' would mean "the specified node is a key, the selected node is its value" while a lack of a trailing '/' would mean "just select the specified node". This way, given: --- &ROOT &A a: &B b /a would select the A node itself and /a/ would select the B node. I think that's nice and intuitive. As for how this would be reflected in the results, maybe like this: /a: - { key: ~, value: *ROOT } - { key: *A } # Key node selected /a/ (or /a/b): - { key: ~, value: *ROOT } - { key: *A, value: *B } # Value node selected Or maybe even like this: /a: - { role: collection, value: *ROOT } - { role: key, value: *A } /a/ (or /a/b): - { role: branch, value: *ROOT } - { role: key, value: *A } - { role: value, value: *B } Where 'role' can be 'branch', 'index', 'key', or 'value'. Hmmm. This is nicely extendible for handling things like external references... { role: xref, value: *URL } followed by { role: whatever, ... }. And it allows for selecting keys and values without any special case (unlike the key/value pair approach). Thoughts? > | - And, of course - what happens with structured keys. > > Yes, this is problematic on many fronts. It requires that the > result "segment" become a three tuple key/value/switch where switch > says if the key or the value is "selected" for child recursion. Why? It seems much simpler to say that the path contains references to the key node, regardless of whether it is a scalar or a collection. For example: --- &ROOT points: &POINTS &CENTER { x: 1, y: 2 } : center &LEFT { x: 2, y: 1 } : top-left YPATH: /points/[x = 1 & y = 2]/ result: - { key: ~, value: *ROOT } - { key: points, value: *POINTS } - { key: *CENTER, value: center } or maybe: - { role: branch, value: *ROOT } - { role: key, value: points } - { role: branch, value: *POINTS } - { role: key, value: *CENTER } - { role: leaf, value: center } Seems much simpler all around. > It's also problematic in how to denote this within the path > syntax. Perhaps we could use ^ as a replacement for / when > navigating down the key side of things... > > /key^sub-key/value Yuck. What is wrong in treating it as a normal predicate (using []) as I demonstrated above? > | Can foo[bar] > | mean anything else other than /foo[./bar]? > > Well, it could mean a predicate evaluated from a string "bar" which is > always true. What is the point of a "predicate evaluated from a string"? > The current Python implementation uses foo[bar] > to test for the existence of bar; but really it doesn't hurt in this > case to write foo[./bar] and it is probably more clear. But writing /points/[x = 1 & y = 2] is so much nicer than writing /points/[./x = 1 & ./y = 2], don't you think? Let's formalize this a bit. '[' ... ']' is a predicate meaning "true if the YPATH inside, relative to the current node, exists". "foo" is a predicate meaning "true if the current node's value is foo". "<predicate1> / <predicate2>" means "look for a value satisfying predicate 2 that is under a key satisfying predicate 1. Assuming & and | are used for "and" and "or", "/foo[bar]" is a shorthand for "/foo & [./bar]". This makes "/points/[x = 2 & y = 1]" a natural way to handle structured keys... Have fun, Oren Ben-Kiki |
From: <sh...@zi...> - 2002-08-05 17:35:24
|
> > --- &ROOT > points: &POINTS > &CENTER { x: 1, y: 2 } : center > &LEFT { x: 2, y: 1 } : top-left > YPATH: /points/[x = 1 & y = 2]/ > result: > - { key: ~, value: *ROOT } > - { key: points, value: *POINTS } > - { key: *CENTER, value: center } Why does the YPATH result include all the nodes on the way down to the "answer", instead of just the answer? I admit total ignorance about XPATH, but in YAML it seems like a simple query should return simple results. What am I missing here? Shouldn't the result just be: - { key: *CENTER, value: center) |
From: Clark C . E. <cc...@cl...> - 2002-08-05 17:56:30
|
On Mon, Aug 05, 2002 at 10:35:45AM +0000, sh...@zi... wrote: | > --- &ROOT | > points: &POINTS | > &CENTER { x: 1, y: 2 } : center | > &LEFT { x: 2, y: 1 } : top-left | > YPATH: /points/[x = 1 & y = 2]/ | > result: | > - { key: ~, value: *ROOT } | > - { key: points, value: *POINTS } | > - { key: *CENTER, value: center } | | | Why does the YPATH result include all the nodes on the way | down to the "answer", instead of just the answer? Great question. The answer is that a node may appear more than once in the graph, and often times when you are processing the *meaning* of the node depends upon how you got to it. | YAML it seems like a simple query should return simple results.i | What am I missing here? A few answers: - Nothing saying you can't ignore the context and just snag the value, e.g., result[-1]['value'] -- in order to find the value a ypath processor will have to build this context anyway. - In the current YPath implementation, there is a flag which discards the context information and only returns the value of the lowest node. This "discard" is even the default behavior. But alas, the full context is more informative for our discussion. - YPATH is really a building block for other languages, such as a query language. And in these languages, the context of a match is just as important as the value. - YPATH isn't a query language, it's a language for selecting pathways within a YAML graph. It differs from XPATH in this regard (YAML forces this since YAML is a graph not a tree) Yea? Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-05 17:44:21
|
On Mon, Aug 05, 2002 at 08:00:38PM +0300, Oren Ben-Kiki wrote: | > Neither key nor values are selected. The result is a stack | > of key/value pairs; thus the { key: ..., value: ... } notation. | | *Boggles*. I can't use YPATH to select a key node? I don't think this is | reasonable. One *must* be able to use YPATH to specify *any* node in the | graph, key nodes included. This is hard to explain. Since YAML is a graph, we really can't select a node without a context and have it be meaningful. For example: data: &ROOT one: &1 {} two: *1 path: /* result: - /one # [{key: ~, value: *ROOT}, { key: one, value: *1 } ] - /two # [{key: ~, value: *ROOT}, { key: two, value: *1 } ] In both cases, the same node "*1" was at the end of the context. In XPATH this isn't possible since nodes only occur once in the tree, thus YPATH must, of necessity be a different sort of animal. | I suggest that a trailing '/' would mean "the specified node is a key, the | selected node is its value" while a lack of a trailing '/' would mean "just | select the specified node". This way, given: Ok. Let's use the word "primary" to indicate for each segment in a context if it is the key or the value that is subject to recursion. By default, it is the value that is subject to recursion. | --- &ROOT | &A a: &B b | | /a would select the A node itself and /a/ would select the B node. No. By default you want to use the key to select the value... So, you'd want /a to select *B | /a: | - { key: ~, value: *ROOT } | - { key: *A } # Key node selected Ok. | /a/ (or /a/b): | - { key: ~, value: *ROOT } | - { key: *A, value: *B } # Value node selected No. /a would give the context above. /a/ would be an error /a/b would return nothing, since the value of /a, *B isn't a mapping, and thus doesn't have a key "b". | /a: | - { role: collection, value: *ROOT } | - { role: key, value: *A } Hmm. Ok. | | /a/ (or /a/b): | - { role: branch, value: *ROOT } | - { role: key, value: *A } | - { role: value, value: *B } | | Where 'role' can be 'branch', 'index', 'key', or 'value'. Hmmm. This is | nicely extendible for handling things like external references... { role: | xref, value: *URL } followed by { role: whatever, ... }. And it allows for | selecting keys and values without any special case (unlike the key/value | pair approach). Thoughts? Too complicated. Why be that extensible? | > | - And, of course - what happens with structured keys. | > | > Yes, this is problematic on many fronts. It requires that the | > result "segment" become a three tuple key/value/switch where switch | > says if the key or the value is "selected" for child recursion. | | Why? It seems much simpler to say that the path contains references to the | key node, regardless of whether it is a scalar or a collection. For example: | | --- &ROOT | points: &POINTS | &CENTER { x: 1, y: 2 } : center | &LEFT { x: 2, y: 1 } : top-left | YPATH: /points/[x = 1 & y = 2]/ | result: | - { key: ~, value: *ROOT } | - { key: points, value: *POINTS } | - { key: *CENTER, value: center } | | Seems much simpler all around. The problem happens right after /points/ I need to know where to go. There is a missing selector there, so let's add the * since this seems to be what you are after. /points/* result: - - { key: ~, value *ROOT } - { key: points, value *POINTS } - { key: *CENTER, value: center } - - { key: ~, value *ROOT } - { key: points, value *POINTS } - { key: *LEFT, value: top-left } Ok. Now we want to filter this result. So we add a predicate. [x=1] lets say. By default, the recursion happens down the value end of things, and since neither terminal value in the two above are collections, ./x doesn't match anything, and thus, nothing is selected. Let's introduce some character ? for now, to inform of recursion down into the key rather than the value. [./x=1] # goes down the value side... [?/x=1] # goes down the key side... So, /points/*[?/x=1] could select what you had above. | > It's also problematic in how to denote this within the path | > syntax. Perhaps we could use ^ as a replacement for / when | > navigating down the key side of things... | > | > /key^sub-key/value | | Yuck. What is wrong in treating it as a normal predicate (using []) as I | demonstrated above? You are mixing the roles of the predicate and the path segment. They are very different creatures; othogonal issues. Suppose, for example, instead of selecting *CENTER you wanted all y values with an x coordinate of 1. /points/*/?y[../x=1] Let's split this into three chunks /points/* you already know ?y says to recurse into the key rather than the value and select the "y" branch ../x means go up one, and go down the "x" branch Thus, the above would return: - - { key: ~, value *ROOT } - { key: points, value *POINTS } - { key: *CENTER, value: center } - { key: y, value: 2 } | What is the point of a "predicate evaluated from a string"? If you go into python or most wealky typed languages, strings are always considered to be true values. | > The current Python implementation uses foo[bar] | > to test for the existence of bar; but really it doesn't hurt in this | > case to write foo[./bar] and it is probably more clear. | | But writing /points/[x = 1 & y = 2] is so much nicer than writing | /points/[./x = 1 & ./y = 2], don't you think? You haven't addressed how I can indicate to recurse down the *keys* instead of the *values* XPATH doesn't have this concern, but YPATH does. | Let's formalize this a bit. '[' ... ']' is a predicate meaning "true if the | YPATH inside, relative to the current node, exists". "foo" is a predicate | meaning "true if the current node's value is foo". "<predicate1> / | <predicate2>" means "look for a value satisfying predicate 2 that is under a | key satisfying predicate 1. I'd be explict here and just use [.=foo] simple enough. Probably, [foo] by itself should probably jsut be an error. | Assuming & and | are used for "and" and "or", "/foo[bar]" is a shorthand for | "/foo & [./bar]". This makes "/points/[x = 2 & y = 1]" a natural way to | handle structured keys... /foo[bar] is *cannot* be short for /foo & [./bar] & can only appear within a predicate and not within a path segment. predicates: Evaluate to true/false, they are expressions segments: Evaluate to a context -- stack of key/values Totally different animals. A predicate *filters* a segment *selects* they are as different as SELECT and WHERE in SQL. ;) Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-05 18:04:43
|
I have to retire for the day. I have a better explanation in my head... give me another day or so to get it out as text. Clark |
From: Oren Ben-K. <or...@ri...> - 2002-08-06 08:55:56
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | I suggest that a trailing '/' would mean "the specified > | node is a key, the > | selected node is its value" while a lack of a trailing '/' > | would mean "just > | select the specified node". This way, given: > > Ok... > > | --- &ROOT > | &A a: &B b > | > | /a would select the A node itself and /a/ would select the B node. > > No. By default you want to use the key to select the value... > So, you'd want /a to select *B So what is it, OK or not OK? :-) I think we have some core conceptual difference in how we grasp YPATH. The way I see it, YPATH works as a series of "steps" where at each step you are at some node and move ahead to another node. Eventually you end up at some node that is the selected one. The end result of evaluating the YPATH is a direct path from the root node to the selected node. When the node may be reached via several paths, the return path is the one traversed during the YPATH evaluation. In this view, what is between '/'s in a YPATH expression is a "node selector" - it is an instruction on how to select the next node from the set of nodes reachable from the current node. This means there's no inherent distinction between: > predicates: Evaluate to true/false, they are expressions > segments: Evaluate to a context -- stack of key/values > > Totally different animals. A predicate *filters* a segment *selects* > they are as different as SELECT and WHERE in SQL. I don't see why I need more than one animal - selector (predicate?) - that is applied at each point along the path to select the next node in the path. What does one gain by having two separate animals? It seems needless complexity to me. In fact I still can't wrap my head about how these are different, exactly, in your view. I'll try to elaborate my view on this and maybe given that you could better explain to me how your notion works. In my view, at each point we are at some "current node" and have the next "selector" to look at. Now, when the current node of a YPATH is a value node, there are only two options. Either we stay there (the YPATH is done and we have selected this node), or, if the value node is a collection, we can go into it. When the current node is a key node, we have *three* alternatives. We can stay there, selecting the key node. BTW, I don't see how this is done in your notion - and I refuse to give up the ability to select the key node itself. At any rate, another option (the most common step) is to move to the value node associated with the key node. And, if the key node is a collection, we can also move into it. So, for value nodes, it is sufficient to use a '/' to indicate the next selector. Either there is such a '/' and we move "in", or there isn't and we are done. But for key nodes we need two markers. The consistent thing would be to use '/' to move into the key and something else to move into the value (say, ':'). However, this may not be human-friendly. It is nicer to humans to make '/' mean "go into the node" for values and "go into the value node" for keys, with '?' meaning "go into the key node" for keys (as this is a rare operation). And, for extra credit, say that if the final node selected is a key YPATH automatically moves to the value associated with it - unless it is followed by a final trailing '?'. Either way, the [] selector says "a node such that the YPATH specified in the [] is reachable from it". Seems pretty simple to me and for the life of me I can't see why it is a completely different animal from "a node whose scalar value is 'foo'" or "the parent node of this node" or any other "selector". As for Boolean operations, again I see no difference from "a node such that both this YPATH or that YPATH are reachable from it" from "a node such that its name is 'foo' or its name is 'bar'". Example: DOC: &ROOT &POINTS points: &PMAP ? &POINT { &X x: &ONE 1, &Y y: &TWO 2} : &VMAP { &NAME name: &CENTER center } &NAMES names: &NMAP *CENTER: *POINT # "Theoretically correct" syntax: YPATH 1: /names:/center:/x: YPATH 2: /points:/[x=1]/y YPATH 2: /points:/[x=1]:/name: # "Human intuitive" syntax: YPATH 1: /names/center/x YPATH 2: /points/[?x=1]?y? YPATH 2: /points/[?x=1]/name # Results: Result 1: [ *ROOT, *NAMES, *NMAP, *CENTER, *POINT, *X, *ONE ] Result 2: [ *ROOT, *POINTS, *PMAP, *POINT, *Y ] Result 3: [ *ROOT, *POINTS, *PMAP, *POINT, *VMAP, *NAME, *CENTER ] Now, representing results as a simple sequence of nodes isn't complete because there are two possible step directions in a result path from a key node (into the key and to the associated value). Hence we need to associate at least one bit with each transition. Given there are N nodes in a result path there are N-1 such bits. Using a convention that '/' means "into the node itself" and ':' means "to the value associated with the key node", the exact complete description of the above paths would be: Result 1: [ *ROOT, '/', *NAMES, ':', *NMAP, '/', *CENTER, ':', *POINT, '/', *X, ':', *ONE ] Result 2: [ *ROOT, '/', *POINTS, ':', *PMAP, '/', *POINT, '/', *Y ] Result 3: [ *ROOT, '/', *POINTS, ':', *PMAP, '/', *POINT, ':', *VMAP, '/', *NAME, ':', *CENTER ] Of course, it is possible to represent the path as a set of pairs, in two ways: Result 2: - { node: *ROOT, step-to-next: to-inside } - { node: *POINTS, step-to-next: to-value } - { node: *PMAP, step-to-next: to-inside } - { node: *POINT, step-to-next: to-inside } - { node: *Y, step-to-next: none } Or: - { step-from-prev: none, node: *ROOT } - { step-from-prev: to-inside, node: *POINTS } - { step-from-prev: to-value, node: *PMAP } - { step-from-prev: to-inside, node: *POINT } - { step-from-prev: to-inside, node: *Y } OK. I think the above more-or-less conveys my grasp of how this should work. Of course, there are various alternative syntax forms and path representations that would go with it; the above just gives some examples. Also, there are additional operators needed: '//' can be used for "move to any node inside this node" for a value (or "any node inside the value associated with this node" for a key), '??' would mean "move to any node inside this node" for a key node. Using Boolean operators in any steps would also be possible (since selectors are just Boolean functions anyway). And so on. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-08-06 13:56:21
|
On Tue, Aug 06, 2002 at 11:57:09AM +0300, Oren Ben-Kiki wrote: | > | --- &ROOT | > | &A a: &B b | > | | > | /a would select the A node itself and /a/ would select the B node. | > | > No. By default you want to use the key to select the value... | > So, you'd want /a to select *B | | So what is it, OK or not OK? :-) Not OK. /a does not select A, it would select B. | I think we have some core conceptual difference in how we grasp YPATH. | | The way I see it, YPATH works as a series of "steps" where at each step you | are at some node and move ahead to another node. Eventually you end up at | some node that is the selected one. The end result of evaluating the YPATH | is a direct path from the root node to the selected node. When the node may | be reached via several paths, the return path is the one traversed during | the YPATH evaluation. Yes; but I'd normalize the path so that stuff like /*/../*/.. just returns the root node. | selector" - it is an instruction on how to select the next node from the set | of nodes reachable from the current node. This means there's no inherent | distinction between: Exactly. And the stuff between the / starts with a segment which selects the next direction to go, and is followed by zero or more predicates which can be used to filter the result set but which do not select anything. BTW, you completely cut out an example showing the difference between these two without addressing it. | | > predicates: Evaluate to true/false, they are expressions | > segments: Evaluate to a context -- stack of key/values | > | > Totally different animals. A predicate *filters* a segment *selects* | > they are as different as SELECT and WHERE in SQL. | | I don't see why I need more than one animal - selector (predicate?) - that | is applied at each point along the path to select the next node in the path. | What does one gain by having two separate animals? It seems needless | complexity to me. In fact I still can't wrap my head about how these are | different, exactly, in your view. path_segment = selector ( '[' predicate ']' )* selector = key | '*' | '.' | '..' path = '/'? path_segment ( '/' path_segment)* Selections done within the selector are included in the output, but selections/computations done within the predicate are not. | In my view, at each point we are at some "current node" and have the next | "selector" to look at. | | Now, when the current node of a YPATH is a value node, there are only two | options. Either we stay there (the YPATH is done and we have selected this | node), or, if the value node is a collection, we can go into it. Ok. | When the current node is a key node, we have *three* alternatives. We can | stay there, selecting the key node. BTW, I don't see how this is done in | your notion - and I refuse to give up the ability to select the key node | itself. At any rate, another option (the most common step) is to move to the | value node associated with the key node. And, if the key node is a | collection, we can also move into it. Well, we need some way to designate to select the key, I was proposing using '?' for this purpose; just as "." selects the current node, ".." select the parent, "?" moves back to the key. Thus, /a selects B /a/? selects A Since selecting the key is rather rare, this is probably ok. | However, this may not be human-friendly. It is nicer to humans to make '/' | mean "go into the node" for values and "go into the value node" for keys, | with '?' meaning "go into the key node" for keys (as this is a rare | operation). And, for extra credit, say that if the final node selected is a | key YPATH automatically moves to the value associated with it - unless it is | followed by a final trailing '?'. Perfect. | Either way, the [] selector says "a node such that the YPATH specified in | the [] is reachable from it". Yes, that is the essential difference, within [] you are interested in stuff like reachability; but also comparisons and general expression evaluation... stuff that is not necessarly a node. | DOC: &ROOT | &POINTS points: &PMAP | ? &POINT { &X x: &ONE 1, &Y y: &TWO 2} | : &VMAP { &NAME name: &CENTER center } | &NAMES names: &NMAP | *CENTER: *POINT | | # "Theoretically correct" syntax: | YPATH 1: /names:/center:/x: | YPATH 2: /points:/[x=1]/y | YPATH 2: /points:/[x=1]:/name: | | # "Human intuitive" syntax: | YPATH 1: /names/center/x | YPATH 2: /points/[?x=1]?y? | YPATH 2: /points/[?x=1]/name YPATH 1: path: /names/center/x read: - Start at the root - Choose the value for the "names" key, - Choose the value for the "center" key, - Choose the value for the "x" key YPATH 2: path: /points/*/?/y[../x=1] read: - Start at the root - Choose the value for the "points" key - Select each value in the results - Select the current value's key - Select the value for the "y" key - Filter this expression by: - going up to the current node's parent (*POINT) - select the value for the "x" key - if the current node is equal to one, the current node passes the filter YPATH 3: path: /points/*/name[../?/x=1] | # Results: | Result 1: [ *ROOT, *NAMES, *NMAP, *CENTER, *POINT, *X, *ONE ] | Result 2: [ *ROOT, *POINTS, *PMAP, *POINT, *Y ] | Result 3: [ *ROOT, *POINTS, *PMAP, *POINT, *VMAP, *NAME, *CENTER ] I was thinking of Steve's complaint about complexity. How about we have an Xpath return a sequence components: (a) the root node, (b) the path taken, (c) the result. In this way, those who want to ignore one or more compoents can easily do so. Result 1: - root: *ROOT path: [ *NAMES, *CENTER, *X ] node: *ONE Note that in your results for 2 and 3, the node *X doesn't occur... this is beacuse it is in a predicate. | Now, representing results as a simple sequence of nodes isn't complete | because there are two possible step directions in a result path from a key | node (into the key and to the associated value). Hence we need to associate | at least one bit with each transition. Given there are N nodes in a result | path there are N-1 such bits. Right. | Of course, there are various alternative syntax forms and path | representations that would go with it; the above just gives some examples. Nods. But also I think that you see the distinct | Also, there are additional operators needed: '//' can be used for "move to | any node inside this node" for a value (or "any node inside the value | associated with this node" for a key), '??' would mean "move to any node | inside this node" for a key node. Hmm. The traverse operator is indeed less simple. *boggles* Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-06 14:23:49
|
| Well, we need some way to designate to select the key, I was | proposing using '?' for this purpose; just as "." selects the | current node, ".." select the parent, "?" moves back to the key. | Thus, | | /a selects B | /a/? selects A | | Since selecting the key is rather rare, this is probably ok. Thus, /a is actually then equivalent to /*[?=a] | | DOC: &ROOT | | &POINTS points: &PMAP | | ? &POINT { &X x: &ONE 1, &Y y: &TWO 2} | | : &VMAP { &NAME name: &CENTER center } | | &NAMES names: &NMAP | | *CENTER: *POINT | | | YPATH 1: | path: /names/center/x | YPATH 2: | path: /points/*/?/y[../x=1] | YPATH 3: | path: /points/*/name[../?/x=1] alternate: /points/*[?/x=1]/name I wonder if there is a cleaner syntax for managing keys which keeps the "simple" YPath's "simple". Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-06 14:41:19
|
| | Well, we need some way to designate to select the key, I was | | proposing using '?' for this purpose; just as "." selects the | | current node, ".." select the parent, "?" moves back to the key. | | Thus, | | | | /a selects B | | /a/? selects A | | | | Since selecting the key is rather rare, this is probably ok. | | Thus, /a is actually then equivalent to /*[?=a] And for shortness, we could let /[...] be equivalent to /*[...] Thus, /a is short for /[?=a] -- I can buy that. But I think that this syntax is throwing us for a loop. How about we move to a more verbose syntax for a spell, perhaps one using YAML so that the structure is more clear? --- what: / is: select: current-node from: root --- what: /a is: select: value-node from: select: child-pairs from: root where: - operator: equals rhs: select: key-node from: current-node lhs: a Yes, it's verbose, but it could probably help. Also, it is rather canonical... | | | | DOC: &ROOT | | | &POINTS points: &PMAP | | | ? &POINT { &X x: &ONE 1, &Y y: &TWO 2} | | | : &VMAP { &NAME name: &CENTER center } | | | &NAMES names: &NMAP | | | *CENTER: *POINT | | | | | YPATH 1: | | path: /names/center/x | | YPATH 2: | | path: /points/*/?/y[../x=1] alternate: /points/*/?[x=1]/y | | YPATH 3: | | path: /points/*/name[../?/x=1] | | alternate: /points/*[?/x=1]/name alternate: /points/[?/x=1]/name On Tue, Aug 06, 2002 at 11:57:09AM +0300, Oren Ben-Kiki wrote: | # "Human intuitive" syntax: | YPATH 1: /names/center/x | YPATH 2: /points/[?x=1]?y? | YPATH 3: /points/[?x=1]/name It seems that we are close. You seem to be abbreviating ?/x as ?x -- I understand, but don't think that this short-cut is wise... and I can't explain why yet. As for Path #2, I can't figure out how I could possibly interpret what you wrote... --- what: /names/center/x is: select: value-node from: select: value-node from: select: value-node from: select: child-pairs from: root where: - operator: equals rhs: select: key-node from: current-node lhs: names where: - operator: equals rhs: select: key-node from: current-node lhs: center where: - operator: equals rhs: select: key-node from: current-node lhs: x |
From: Brian I. <in...@tt...> - 2002-08-06 18:50:27
|
On 06/08/02 10:07 -0400, Clark C . Evans wrote: > On Tue, Aug 06, 2002 at 11:57:09AM +0300, Oren Ben-Kiki wrote: > | When the current node is a key node, we have *three* alternatives. We can > | stay there, selecting the key node. BTW, I don't see how this is done in > | your notion - and I refuse to give up the ability to select the key node > | itself. Oren, I really wish you would give up here. I don't see a key being an actual node in a mapping, any more than an (implicit) index number is a node in a sequence. Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-08-06 17:08:54
|
First, fun as this is... I think we should settle the "what is a string" issue first so I can update the spec. At any rate... Clark C . Evans [mailto:cc...@cl...] wrote: > | The way I see it, YPATH works as a series of "steps" where > | at each step you > | are at some node and move ahead to another node. Eventually > | you end up at > | some node that is the selected one. The end result of > | evaluating the YPATH > | is a direct path from the root node to the selected node. > | When the node may > | be reached via several paths, the return path is the one > | traversed during > | the YPATH evaluation. > > Yes; but I'd normalize the path so that stuff like /*/../*/.. > just returns the root node. Yes; That's what I meant by saying "the direct path" from the root to the node. > | selector" - it is an instruction on how to select the next > | node from the set > | of nodes reachable from the current node. This means > | there's no inherent distinction between: > > Exactly. And the stuff between the / starts with a segment which > selects the next direction to go, and is followed by zero or more > predicates which can be used to filter the result set but which > do not select anything. I just don't get it. Please explain why selecting on the value of a key is not just another form of the general operation of "filtering a result set". In my view both are a way to select a particular set of zero or more nodes out of a set of candidate nodes. I don't care whether I'm selecting them according to having a specific value, the existence of paths starting in them, having a particular type family, or whatever other criteria we come up with. You said: > path_segment = selector ( '[' predicate ']' )* > selector = key | '*' | '.' | '..' > path = '/'? path_segment ( '/' path_segment)* > > Selections done within the selector are included in the output, > but selections/computations done within the predicate are not. I still don't get it. You artificially divided the set of selectors to two; everything that is based on paths (what you call predicate) and everything that doesn't (presumably what you call selector). You then, for no obvious reason, require that there would be at exactly one from the 'selector' group, with an optional set from the second group (is this really what you had in mind?). It all looks completely arbitrary and complicated. In my view, there's no such distinction. "[relative-path]" is a selector. "foo" is a selector. I can have one. I can have the other. I can have both. I can have two of one and three of the other, as in: /(foo&[?x=1])|(bar&[?y=3])|(baz&([?x=1]&[?y=2]))/x That is: path = '/' path_segment ( ( '/' | '?' ) path_segment )* path_segment = or_cond or_cond = and_cond | and_cond '|' or_cond and_cond = simple_cond | simple_cond '&' simple_cond simple_cond = '(' path_segment ')' | value_cond | type_cond | reachable_cond | rel_cond value_cond = simple_value # Checks node value | '*' # Matches any value. | regexp # The above may be one... | compare_cond # > < etc. for numbers & dates | ... type_cond = '!' simple_value # Specific type | '!' '*' # Seems prudent | '!' regexp # The above may be one... | ... reachable_cond = '[' or_path ']' or_path = and_path '|' and_path and_path = path | path '&' path rel_cond = '..' # parent | ?? # descendent | ?? # ancestor | '@' number # @-1 : prev seq sibling? OK, I don't know what to write for 'descendent' and 'ancestor' above, and some of the details are surely wrong. Nevertheless, I think this is the cleanest, simplest way to go about it. No artificial separation to classes of selectors. For example, I see no big difference between /foo[?x/>2] and /x/>2; you seem to consider comparisons to be something limited to "predicates" because "they aren't nodes". I'm rather baffled by this. > BTW, you completely cut out an example showing the difference > between these two without addressing it. Do you mean: > Let's introduce some character ? for now, to inform > of recursion down into the key rather than the value. > > [./x=1] # goes down the value side... > [?/x=1] # goes down the key side... Yes, sorry about that. I think my proposed syntax (and approach) results in simpler YPATH expressions (to read, write and execute). Compare: Mine: YPATH 1: /names/center/x YPATH 2: /points/[?x=1]?y? YPATH 2: /points/[?x=1]/name Yours: YPATH 1: /names/center/x YPATH 2: /points/*/?/y[../x=1] YPATH 2: /points/*/name[../?/x=1] You have said yourself: > I wonder if there is a cleaner syntax for managing > keys which keeps the "simple" YPath's "simple". There is. See above :-) > Note that in your results for 2 and 3, the node *X doesn't > occur... this is beacuse it is in a predicate. (I'd say: in a reachable path selector). Correct. > I was thinking of Steve's complaint about complexity. How about > we have an Xpath return a sequence components: (a) the root node, > (b) the path taken, (c) the result. In this way, those who want > to ignore one or more compoents can easily do so. > > Result 1: > - > root: *ROOT > path: [ *NAMES, *CENTER, *X ] > node: *ONE You have omitted some nodes from the path (NMAP and POINT). At any rate, using a simple sequence also works: Result 1: [ *ROOT, '/', *NAMES, ':', *NMAP, '/', *CENTER, ':', *POINT, '/', *X, ':', *ONE ] The root is the first member, the selected node is the final member. Pretty easy to get to them if you aren't interested in anything else. > | Of course, there are various alternative syntax forms and path > | representations that would go with it; the above just gives > some examples. > > Nods. But also I think that you see the distinct ? It seems something got clipped here... > Hmm. The traverse operator is indeed less simple. *boggles* I think that if you use /<something>/ to represent it, rather than '//', then it is actually becomes pretty simple to include into the scheme. My scheme, that is :-> > How about we move > to a more verbose syntax for a spell, perhaps one using YAML so > that the structure is more clear? > > --- > what: / > is: > select: current-node > from: root > --- > what: /a > is: > select: value-node > from: > select: child-pairs > from: root > where: > - > operator: equals > rhs: > select: key-node > from: current-node > lhs: a Are you *certain* you don't mean YPATH to be a YQUERY? The above sure smells of it. Ugh! > Yes, it's verbose, but it could probably help. Also, > it is rather canonical... Nice notion and it demonstrates why your approach is complex as, well... Consider: what: /a is: - select: root # selects from keys-of if a collection, # or from the node itself if a scalar. - select-[from-keys-of]-current-node-if: # The selection criteria value-is: a Isn't that *so much* simpler? > It seems that we are close. You seem to be abbreviating > ?/x as ?x -- I understand, but don't think that this > short-cut is wise... and I can't explain why yet. I don't know. I keep getting the feeling that you and I are talking about very different approaches - something basic is different. I don't know if I can phrase it in a single sentence, though. > As for Path #2, I can't figure out how I could possibly > interpret what you wrote... # Note: x/1 rather than x=1. My mistake! what: /points/[?x/1]?y? is: # / - select: root - select-from-[keys-of]-current-node: # /_points_ value-is: points # /points_/_ - select-value-associated-with-current-node - select-[from-keys-of]-current-node-if: # /points/_[_..._]_ path-is-reachable-from-here: # /points/[_?_...] - select: here - select-[from-keys-of]-current-node-if: # /points/[?_x_...] - value-is: x # /points/[?x_/_...] - select-value-associated-with-current-node - select-[from-keys-of]-current-node-if: # /points/[?x/_1_] value-is: 1 # /points/[?x/1]_?_ - select-[from-keys-of]-current-node-if: # /points/[?x/1]?_y_ value-is: y # /points/[?x/1]?y_?_ # DO NOT select-value-associated-with-current-node. # If there wasn't a trailing ?, it would have been # added automatically at the end. You can _see_ the YPATH engine going through the path. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-08-07 04:30:53
|
On Tue, Aug 06, 2002 at 08:10:12PM +0300, Oren Ben-Kiki wrote: | First, fun as this is... I think we should settle the "what is a string" | issue first so I can update the spec. Option A: summary: Leave as is (only starting with alpha, numeric, underscore) good: - Simple rule - No spec changes bad: - Requires quoting of paths, which is a pretty big use-case votes: - Brian Option B: summary: Allow them to start with /.\ (did I miss any?) good: - allows paths without quoting bad: - a bit more complicated - we need a new comment special key, ; instead of // votes: - Option C: summary: Require quoting of strings that begin with a digit good: - makes string quoting rules dead simple bad: - IP addreses and street addresses need to be quoted Overall, if we are indeed going to consider "configuration" as a serious use case, option B is preferred -- having to quote paths is somewhat tedious; furthermore I think option B would be better for transform/query/schema definitions which would use something ypathish on almost every node... The complexity is that its somewhat less intuitive... what needs to be quoted again? Brian's comment: | All in all, I would vote to not add any leading characters to denote | string. Most programming languages seem to use [A-Za-z0-9_] to mean | "word character". I like keeping the ability to say that: "Any string | that doesn't begin with a word character, needs to be quoted." | In other words, I'd like to keep this rule simple to remember. I think adding /.\ inside [] list probably wouldn't hurt _that_ badly, and the up side seems nice... not having to quote paths. Tom Harris concurring: | Can I second this? I have just started using YAML by stealth, dumping some | log files in YAML. As I adopt the position that it is easier to ask for | forgivness than permission (to use a new data format) I can't really put | 'import yaml' at the head of my scripts so that I can use the YAML dumper. | The only thing that I had to write was a string quoter, and some of the | strings are paths. I appreciate the rule being simple, and if possible | easily evaluated without using regular expressions, just common string | operations. Strings quoted unnecessarily are just noise. Nods. I added option C for you. The fact is, what needs to be quoted is already complicated. Assuming these are strings, following is a list of unquoted vs quoted: - '2002-01-02' # quoted due to date - 02-02-02 # unquoted, doesn't match date - '34' # quoted, beacuse of integer - 23 L # unquoted - 2.3.4 # unquoted - '2.3' # quoted due to float So... I put forth that the quoting rules are _already_ quite complicated; the current PyYaml actually has it wrong in a few cases (I just fixed a few tonight in the repository). We got this complicated beacuse people wanted to not have to quote IP addresses and street addresses. Stopping just short of paths seems like it is breaking our trend to favor ease of data entry and "looks". So. I put in my vote for adding paths to the string. But I'm swayed easily to roll back things to just starting with [a-zA-Z_] and forcing IP addresses and street addresses to be quoted, option C. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-08-07 06:04:54
|
| > path_segment = selector ( '[' predicate ']' )* | > selector = key | '*' | '.' | '..' | > path = '/'? path_segment ( '/' path_segment)* | > | > Selections done within the selector are included in the output, | > but selections/computations done within the predicate are not. | | I still don't get it. You artificially divided the set of selectors to two; | everything that is based on paths (what you call predicate) and everything | that doesn't (presumably what you call selector). You then, for no obvious | reason, require that there would be at exactly one from the 'selector' | group, with an optional set from the second group (is this really what you | had in mind?). It all looks completely arbitrary and complicated. I kinda like what you are doing below, let me throw out a simpler abstraction first, however. path = ( root '/' | relative '' ) segment+ segment = axis predicate* axis = self '.' | key '?' | parent '..' | children '*' (value-side) | descendent-or-self '/' (value-side) | key-children '?*' (key-side) | key-descendent-or-self | ancestors For a syntax, use / to split segments, and [] to mark each predicate (implicit _and_ between each predicate). Thus, /a becomes root() children() such that key() equals 'a' / * [ ? = 'a' ] /*[?=a] So, the distinction I'm trying to make between the 'selector' and 'predicate' is that the selector chooses which direction to go, and the predicate fliters this choice based on direct comparisons or any arbitrary expression. Now, we can have the /a "short-hand", but the compiled form would look more like the thingy on the right. So... you are sort of right in that the selector "a" is really both an axis plus a predicate. Thanks for opening my eyes to this... it's clear now. That said, I don't like your notion of putting predicates outside of the [] brackets. Let's keep the stuff inside the [] more "general case" and the stuff to the left of the predicates, the selector, for our syntax-sugar. | In my view, there's no such distinction. "[relative-path]" is a selector. | "foo" is a selector. I can have one. I can have the other. I can have both. | I can have two of one and three of the other, as in: | | /(foo&[?x=1])|(bar&[?y=3])|(baz&([?x=1]&[?y=2]))/x | | That is: | | path = '/' path_segment | ( ( '/' | '?' ) path_segment )* | path_segment = or_cond | or_cond = and_cond | | and_cond '|' or_cond | and_cond = simple_cond | | simple_cond '&' simple_cond | simple_cond = '(' path_segment ')' | | value_cond | | type_cond | | reachable_cond | | rel_cond | value_cond = simple_value # Checks node value | | '*' # Matches any value. | | regexp # The above may be one... | | compare_cond # > < etc. for numbers & dates | | ... | type_cond = '!' simple_value # Specific type | | '!' '*' # Seems prudent | | '!' regexp # The above may be one... | | ... | reachable_cond = '[' or_path ']' | or_path = and_path '|' and_path | and_path = path | | path '&' path | rel_cond = '..' # parent | | ?? # descendent | | ?? # ancestor | | '@' number # @-1 : prev seq sibling? | | OK, I don't know what to write for 'descendent' and 'ancestor' above, and | some of the details are surely wrong. Nevertheless, I think this is the | cleanest, simplest way to go about it. No artificial separation to classes | of selectors. For example, I see no big difference between /foo[?x/>2] and | /x/>2; you seem to consider comparisons to be something limited to | "predicates" because "they aren't nodes". I'm rather baffled by this. /foo[?x/=2] /foo/x/>2 These would be different (I'm sort of guessing on your syntax). In the former, you are selecting the "foo" node only if it passes the given filter. In the latter case, you are selecting foo's child, "x" if it passes the filter >2 ... there is a distinction between paths you use within a predicate but don't select, and those that are not in the predicate. If you re-write these in the canonical form I gave above: /foo[?x/=2] /*[?=foo][./*[?=x]=2] /foo/x/>2 /*[?=foo]/*[?=x][.>2] The former returns nodes from the first level in the search, while the second returns nodes from the second level in the search. | Yes, sorry about that. I think my proposed syntax (and approach) results in | simpler YPATH expressions (to read, write and execute). And also ambiguous; most of your examples assume one match and I think they fall apart in more complicated cases. ... I think it comes down to this, if it is in brackets, its only effect on the result is to filter... paths appearing in the brackets are discarded once their filtering purpose is over. Paths not in the brackets are part of the result... | Mine: | YPATH 1: /names/center/x | YPATH 2: /points/[?x=1]?y? | YPATH 3: /points/[?x=1]/name Ok. I'm still very uncertain what the above is *suppose* to do, let alone what the particular results below mean: | Result 1: [ *ROOT, *NAMES, *NMAP, *CENTER, *POINT, *X, *ONE ] | Result 2: [ *ROOT, *POINTS, *PMAP, *POINT, *Y ] | Result 3: [ *ROOT, *POINTS, *PMAP, *POINT, *VMAP, *NAME, *CENTER ] | > Note that in your results for 2 and 3, the node *X doesn't | > occur... this is beacuse it is in a predicate. | | (I'd say: in a reachable path selector). Correct. Not in many of your examples... | what: /points/[?x/1]?y? | is: | # / | - select: root | - select-from-[keys-of]-current-node: | # /_points_ | value-is: points | # /points_/_ - - select-value-associated-with-current-node + - select-value-associated-with-current-PAIR | - select-[from-keys-of]-current-node-if: | # /points/_[_..._]_ | path-is-reachable-from-here: | # /points/[_?_...] - - select: here + select: keys | - select-[from-keys-of]-current-node-if: | # /points/[?_x_...] | - value-is: x | # /points/[?x_/_...] | - select-value-associated-with-current-node | - select-[from-keys-of]-current-node-if: | # /points/[?x/_1_] | value-is: 1 *NO MATCH* You need an operator here to match a value instead of for a key; I suggest .=x, rewriting your expression: /points[?x/*[.=1]]?y? | # /points/[?x/1]_?_ | - select-[from-keys-of]-current-node-if: | # /points/[?x/1]?_y_ | value-is: y | # /points/[?x/1]?y_?_ | # DO NOT select-value-associated-with-current-node. | # If there wasn't a trailing ?, it would have been | # added automatically at the end. | | You can _see_ the YPATH engine going through the path. Kinda, but I think you have alot of syntax sugar here. Can we find a syntax for discussing this that is easier to grok? You shot down my verbose one cuz you said it looked to querish. You try. Create a syntax where all structure is explict using YAML. Please? ;) Clark |
From: Oren Ben-K. <or...@ri...> - 2002-08-07 07:08:26
|
Brian Ingerson [mailto:in...@tt...] wrote: > > | When the current node is a key node, we have *three* > > | alternatives. We can > > | stay there, selecting the key node. BTW, I don't see how > > | this is done in > > | your notion - and I refuse to give up the ability to > > | select the key node itself. > > Oren, > > I really wish you would give up here. I don't see a key being > an actual > node in a mapping, any more than an (implicit) index number > is a node in a sequence. I beg to differ. In Perl 6, Java, (Python?) and many others one is allowed to place an arbitrary object as a mapping key, including complex objects such as, well, complex numbers, points, user records (with first and last name sub-fields), or whatever else you want. Our information model is explicit in stating that a key in a mapping *is* a node like any other, and our syntax allows for it. It follows we should be able to select such nodes using YPATH. Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-08-07 08:03:35
|
On 07/08/02 10:09 +0300, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > > > | When the current node is a key node, we have *three* > > > | alternatives. We can > > > | stay there, selecting the key node. BTW, I don't see how > > > | this is done in > > > | your notion - and I refuse to give up the ability to > > > | select the key node itself. > > > > Oren, > > > > I really wish you would give up here. I don't see a key being > > an actual > > node in a mapping, any more than an (implicit) index number > > is a node in a sequence. > > I beg to differ. In Perl 6, Java, (Python?) and many others one is allowed > to place an arbitrary object as a mapping key, including complex objects > such as, well, complex numbers, points, user records (with first and last > name sub-fields), or whatever else you want. Our information model is > explicit in stating that a key in a mapping *is* a node like any other, and > our syntax allows for it. It follows we should be able to select such nodes > using YPATH. I'm still not sold. Just because a key is complex doesn't mean I'll want to query it. But I'll suspend my judgement for now until we get something a little more fine tuned to discuss. Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-08-07 10:05:39
|
Clark C . Evans [mailto:cc...@cl...] wrote: > I kinda like what you are doing below, let me throw out a simpler > abstraction first, however. > > path = ( root '/' > | relative '' > ) segment+ > segment = axis predicate* > axis = self '.' > | key '?' > | parent '..' > | children '*' (value-side) > | descendent-or-self '/' (value-side) > | key-children '?*' (key-side) > | key-descendent-or-self > | ancestors Hmmm. You got a point here; I should have cast my BNF in this form, only I had only two "axis" - '/' for "value-side" descendent-or-self and '?' for "key-side" descendent-or-self. You are probably right this isn't enough. The above BNF is buggy; for example, you'd write "<whatever>/.." and not "<whatever>.." that would result from the above. But the principle is OK. So, we both agree that: - The axis determines the set of nodes being examined. - It is followed by a predicate (what I called a 'selector') that filters this set. It now boils down to what sets we define and what are predicates on a set. It gets tricky because we want a nice syntax. If we keep it strict it gets ugly fast. Specifically, '/' seems to mean: - If the current node is a collection, the set of nodes is its keys. - If the current node is a scalar key node, the set of nodes is its value, and if *that* is a collection, its keys. - If the current node is a scalar, no match. Quite a mess! But that's what people expect. As you have shown, using sane axis that don't contain "if this then that" in their definition results in a horrid syntax. We'll have to be creating defining the set of axis to work around this. It won't be easy. What we need to do is enumerate two things. First, the set of distinct, sensible axis (self, keys-of-this-collection, value-of-this-key-node, etc.). We should define some syntax for each - probably using a somewhat verbose scheme. Then we should define shorthand similar to '/' above for combinations that are "useful". I think we both agree here. I was indeed too hasty saying it is possible to resolve this in one simple syntax. Thanks for pointing this out. > For a syntax, use / to split segments, and [] > to mark each predicate (implicit _and_ between > each predicate). You lost me. Implicit? how? I think you are using 'predicate' here in a different way than you did above. > So, the distinction I'm trying to make between the 'selector' > and 'predicate' is that the selector chooses which direction > to go, and the predicate filters this choice based on direct > comparisons or any arbitrary expression. We should be careful with terminology here. I used 'selector' to mean what you defined here as 'predicate' and I'm using 'axis' to describe what you define here as 'selector'. Your BNF above also uses 'axis', *presumably* in the same sense. Let's try to settle wording here before we get completely confused. How about: - axis: specify a set of candidate nodes - predicate: Boolean condition used to filter nodes - selector: No such thing. Using it for either of the above is too confusing. Would that work for you? At any rate... > Thus, > /a becomes root() children() such that key() equals 'a' > / * [ ? = 'a' ] > /*[?=a] > > Now, we can have the /a "short-hand", but the compiled form > would look more like the thingy on the right. Well, we'll have to invest a lot of thought in the "canonical" syntax. An important point you seem to ignore is that shortcuts like '/' above are *powerful* in the sense they adapt to different documents; /a will match both: --- &THIS a --- a: &AND_THIS b ... So, converting such a shorthand to canonical form is *hard*. Consider the /*/a path applied to: --- simple: &MATCH1 a complex: { a: &MATCH2 b } ... A user would expect two matches, but each of these matches would require a different "canonical" step. Therefore /a, in canonical form, should look something like an enumeration of all the possibilities I gave defining the '/' shortcut above, *not* just a single possibility that happens to apply at a given moment. In a word: YIKES! > So... you are sort of right in that the selector "a" > is really both an axis plus a predicate. Thanks for > opening my eyes to this... it's clear now. Yes, and it isn't a pretty sight. It will take a lot of work to find something reasonable here. > That said, I don't like your notion of putting predicates > outside of the [] brackets. Let's keep the stuff inside > the [] more "general case" and the stuff to the left of > the predicates, the selector, for our syntax-sugar. We keep coming back to this. I am *really* missing something basic here. In my view, "a" is a predicate that says "a scalar node whose value is equal to 'a'". "[<path>]" is a predicate that says a "node such that <path> is reachable from it". This <path> can make full use of all YPATH predicates. Yes, *of course* the path matched within the [] is *not* taken to be part of the output. But in *every other respect*, this path obeys the same rules as any other YPATH. I keep getting the feeling that you think that there is some other difference. To better explain my view: *both* are a *function* that takes a path to some current node and a candidate node, and returns "yes" if the candidate node satisfies the predicate and "no" otherwise. It follows that the "is-path-reachable" predicate doesn't insert anything into the final result of the YPATH expression, since *none of the predicates* do such insertions; these insertions are done by the YPATH engine itself *if* the predicates say it is OK to do so. For example, you have written: > /foo[?x/=2] First, this doesn't make sense, since it means "a scalar node whose value is 'foo' and that also contains a key named x whose value is 2". Obviously a node can't be both a scalar and a collection at the same time. Right? I'm also guilty of writing such a notation, and I apologize for it :-) Let's pretend this was written as /foo/[../x/>2], which can work. > /foo/x/>2 > > These would be different (I'm sort of guessing on your > syntax). In the former, you are selecting the "foo" node > only if it passes the given filter. Here we go again. I'm at the root node. '/' tells me to look at some set of nodes apply some predicate(s) on them to select the next node in the final resulting path. 'foo' is just another a *predicate*. It selects all nodes that are scalar nodes and whose value is 'foo'. It isn't special in any way. I do *NOT* select the "foo" node only if it passes some [] filter. I select *any* of the nodes in the set that satisfies *both* the "foo" predicate and the "[]" predicate. There's absolutely nothing whatsoever special about '/foo' as compared to '/[?x/>2]' or as compared to '/!int' or as compared to any other predicate, simple or complex, about the value, the type, reachable paths, or anything else. The expression is better written as /( foo & [../x/>2] ). As a side note, I'm thinking that we don't need the '&' sign at all; a list of predicates would mean "and" and '|' would mean "OR". This would make /foo[../x/>2] legal and have the same semantics as the version with the '&'. This expression would match: --- foo: &MATCH bar x: 3 ... > In the latter case, you are selecting foo's child, "x" if it > passes the filter >2 ... No. The path is "/foo/x/>2". - Start at the root node. - From the set of nodes defined by '/' relative to this node, select the scalar node whose value is 'foo'. - From the set of nodes defined by '/' relative to this node, select the scalar node whose value is 'x'. - From the set of nodes defined by '/' relative to this node, select the scalar node whose value is '>2'. - (Implicit) If this node is a key node, select the value node associated with it. If you think about it for a second, foo's child "x" can't pass the filter ">2" because it is a string whose value is "x". It is the value associated with the "x" node - that is being selected and has to pass the filter. > there is a distinction between paths you use within > a predicate but don't select, and those that are not in the predicate. There's no such thing as "paths used outside a predicate". There are path *segments* used outside a predicate. And of course there's a distinction between segments inside [] and segments outside it. Segments inside [] are part of a complex Boolean expression used to filter out nodes and other than such filtering do not contribute anything to the final result. Segments outside [] predicates contribute nodes to the final result. But that is the *only* difference. > If you re-write these in the canonical form I gave above: > > /foo[?x/=2] /*[?=foo][./*[?=x]=2] > /foo/x/>2 /*[?=foo]/*[?=x][.>2] > > The former returns nodes from the first level in the search, > while the second returns nodes from the second level in the search. Sigh. Again you disregard key nodes as being true nodes. I think it is obvious from the syntax that /foo[?x/>2] selects a node in the first level of the search and "/foo /x />2" selects a node from the *third* level of the search. --- foo: x: > 3 It is the "3" that is selected. Right? :-) In fact, if you count nodes rather than steps, you'll see there are two additional mapping nodes, so the total path has 5 nodes: --- !map &N1 !str &N2 foo: !map &N3 !str &N4 x: !str &N5 3 > | ... I think my proposed syntax (and approach) results in > | simpler YPATH expressions (to read, write and execute). > > And also ambiguous; most of your examples assume one match > and I think they fall apart in more complicated cases. Right. Sorry. I was over-optimistic; I haven't considered that we'll need many shorthand forms for the simpler basic axis forms. It does need more work. > ... > > I think it comes down to this, if it is in brackets, its only > effect on the result is to filter... > paths appearing in the brackets > are discarded once their filtering purpose is over. > Paths not in the brackets are part of the result... *Obviously*. We are in violent agreement here. But that is the only distinction. Other than insertion of path edges to the end result, all axis and predicates should be available and identical inside and outside []. > | Mine: > | YPATH 1: /names/center/x > | YPATH 2: /points/[?x=1]?y? > | YPATH 3: /points/[?x=1]/name > > Ok. I'm still very uncertain what the above is *suppose* to do, I think the step-by-step sequence I gave at the end shows that... Admittedly my definition of the semantics of '/' was naive. The true semantics is much more complex (as I've defined above). > let alone what the particular results below mean: > > | Result 1: [ *ROOT, *NAMES, *NMAP, *CENTER, *POINT, *X, *ONE ] It means: - We have a nodes graph as per the YAML information model. - We started at the *ROOT node. - The *ROOT node has an edge linking it to the *NAMES node. Move along it. - The *NAMES node has an edge linking it to the *NMAP node. Move along it. - The *NMAP node has an edge linking it to the *CENTER node. Move along it. - The *CENTER node has an edge linking it to the *POINT node. Move along it. - The *POINT node has an edge linking it to the *X node. Move along it. - The *X node has an edge linking it to the *ONE node. Move along it. That is: A path through the graph. The more detailed form of the above: Result 1: [ *ROOT, '/', *NAMES, ':', *NMAP, '/', *CENTER, ':', *POINT, '/', *X, ':', *ONE ] Means: - We have a nodes graph as per the YAML information model. - We started at the *ROOT node. - The *ROOT node has an edge linking it to the *NAMES node. This edge is of the type "link from a collection to a contained key node". Move along it. - The *NAMES node has an edge linking it to the *NMAP node. This edge is of the type "link from a key node to the value associated with it". Move along it. - The *NMAP node has an edge linking it to the *CENTER node. This edge is of the type "link from a collection to a contained key node". Move along it. - The *CENTER node has an edge linking it to the *POINT node. This edge is of the type "link from a key node to the value associated with it". Move along it. - The *POINT node has an edge linking it to the *X node. This edge is of the type "link from a collection to a contained key node". . Move along it. - The *X node has an edge linking it to the *ONE node. This edge is of the type "link from a key node to the value associated with it". Move along it. The same path, with an annotation of the type of each edge type. > Can we find a syntax for discussing this that is easier > to grok? You shot down my verbose one cuz you said it > looked to querish. You try. Create a syntax where all > structure is explict using YAML. Please? What makes it complex is the fact that things like '/' get mapped to a selection between several basic operations, sometimes even between different *series* of basic operations. Assuming we want the syntax to work at the basic operation level, we'll have to do something like this: YPATH: # sequence of segments - # Single segment: axis-alternatives: # here can be more than one - [ <basic-step>, ... ] predicate: # Boolean expression tree type: <predicate-type> <param-name>: <param-value> ... Example: /(a|b) YPATH: - axis-alternatives: # This is what '/' means. Yikes! - [ keys-of-collection ] # This is a multi-step. First, get the value node # associated with the key node, then get all the # keys contained in the collection node you have got. - [ value-of-key-node, keys-of-collection ] # This is similar, for when the value node is a scalar. - [ value-of-key-node, scalar-value-node-itself ] predicate: type: or-of-predicates predicates: # Being OR-ed - type: value-of-scalar-node-is value: a - type: value-of-scalar-node-is value: b # Unless there is a trailing '?' or something, # There is an implicit final step as follows: - axis-alternatives: - [ value-of-key-node ] - [ scalar-value-node-itself ] predicate: type: true I think that does the trick. What the YPATH engine does is maintain a set of prefix paths that potentially match the full Path expression. - It starts with the prefix set being just [ [ *START-NODE ] ]. This will be the *ROOT unless Path is called relative to a specific node (e.g., from a [] predicate). - For each segment in the YPATH, - For each potential prefix in the prefix set, - Remove path prefix from the prefix set. - For each alternative axis, - Compute the resulting node set given the prefix path; - For each node in the (possibly empty) node set, - Apply predicate(s) to node; - If it didn't pass, continue; - Add [ <prefix>, node ] to the prefix set; - Or, if the axis was parent or ancestor, remove some nodes from the prefix instead. - Prefix set now contains all matched paths. Note that "apply predicates" may invoke another instance of the YPATH engine relative to the current node (for [] predicates). The computed matched paths are discarded; all that is of interest in is whether there were any. Whew. This thing is tricky. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-08-05 13:19:30
|
On Mon, Aug 05, 2002 at 10:11:56AM +0300, Oren Ben-Kiki wrote: | Clark C . Evans [mailto:cc...@cl...] wrote: | > summary: > | > | > This is a brief introduction to YPATH as inspired by XPATH, the | > XML Path node selection language. XPATH has a very rich history | > and is, IMHO, one of the better things to emerge from XML land. | | Nice work, Clark. It will take a bit of work to get the details exactly | right but I like the approach of YPATH's output being a list of *paths* (a | better name, I think, than "contexts"). It makes things clear and simple all | around. It certainly makes the role of a YPATH language clear and distinct from a YQUERY or CYATL or other "graph building" language. | - Why do you need the { key: ~, value: *ROOT } pair? It is the first entry | in *every* path, and as such it seems it can be omitted. Further, I want to | be able to "apply-path-to-graph(path, graph)". Placing a reference to the | root at the start of the path kills this useful ability. You need to have the root node stored since inside of a YPATH you may need to jump up to the "top" of the tree within a predicate. input: - name: One test: xxx - name: Two test: yyy - name: Three test: xxx path: /*/name[../test=/0/test] returns: - /0/name # One - /2/name # Three | - Structured keys will make this... interesting... Probably something like | ".../[predicate-on-key]/..." would work. You'd probably need to store the | entire key inside the resulting path. Hmm. Well, the construct the current implementation returns is really a stack of (key/value) pairs detailing the current path; as such just a pointer to the key is needed (for the in-memory representation). As for the path representation of a context having strucutred keys... yes, this is a very tough bugger. ;( | - Note it is OK to put in the resulting paths a reference to any immutable | node in the graph (which would cover most scalars and, hopefully, most | structured keys). Mutable stuff needs to be copied, I'm afraid... Well, if it is used as a key, it really can't be mutable... let's keep on this issue as to how to handle strucutred keys; the solution may not be pretty. | - There are issues of quoting to be resolved... you wrote: | | > - item: [] | > role: > | > This describes a predicate. The predicate acts as | > a "filter" on the current selected contexts, knocking | > out those that don't return true. Within a predicate | > strings are treated as literals, unless prefixed with | > ./ or / or some other way that identifies a subordinate | > path expression. | | Which implies some sort of different quoting/interpretation rules for | strings inside predicates. I'm not certain what you have in mind, but I | think the rules should be the same inside and outside predicates. The quoting rules are the same. The interpretation will be different as anything that must be a "path" inside a predicate must start with a './' or '/' or '../' or some other path-ish marker. The current Python implementation is a bit fuzzy here... I don't think making the interpretations uniform is needed as the two constructs are so dramatically different from each other. | - You also need predicates matching on type; Boolean operations | (and/or/etc.); relative positions in lists; and probably other | embellishments. Yes, functions, more operators, and a slurry of other useful stuffs. Best, Clark |
From: Robert B. <rh...@bi...> - 2002-08-05 21:11:14
|
On Mon, Aug 05, 2002 at 09:30:25AM -0400, Clark C . Evans wrote: > On Mon, Aug 05, 2002 at 10:11:56AM +0300, Oren Ben-Kiki wrote: > | Clark C . Evans [mailto:cc...@cl...] wrote: > | > summary: > > | > > | > This is a brief introduction to YPATH as inspired by XPATH, the > | > XML Path node selection language. XPATH has a very rich history > | > and is, IMHO, one of the better things to emerge from XML land. > | > | Nice work, Clark. It will take a bit of work to get the details exactly > | right but I like the approach of YPATH's output being a list of *paths* (a > | better name, I think, than "contexts"). It makes things clear and simple all > | around. > > It certainly makes the role of a YPATH language clear and distinct > from a YQUERY or CYATL or other "graph building" language. Yes, no. I don't know :-) Yes, it may be a good idea to have a small, clear, standalon YPath. But: The pattern I see is that XPath is a good motor to select nodes out of a structured document AND to trigger some action: In XQuery it is the construction of some XML snippet. In XPathScript (-> AxKit) it is a bit of Perl code to do nifty things, in XSLT its the template body. I see here a common pattern and I wonder whether Yaml offers an opportunity to homogenize things. Just my few, worthless stocks. \rho |