RE: [Yaml-core] getting started on ypath

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

First, fun as this is... I think we should settle the "what is a string"
issue first so I can update the spec.

At any rate...

Clark C . Evans [mailto:cc...@cl...] wrote:
> | The way I see it, YPATH works as a series of "steps" where 
> | at each step you
> | are at some node and move ahead to another node. Eventually 
> | you end up at
> | some node that is the selected one. The end result of 
> | evaluating the YPATH
> | is a direct path from the root node to the selected node. 
> | When the node may
> | be reached via several paths, the return path is the one 
> | traversed during
> | the YPATH evaluation.
> 
> Yes; but I'd normalize the path so that stuff like  /*/../*/..
> just returns the root node. 

Yes; That's what I meant by saying "the direct path" from the root to the
node.

> | selector" - it is an instruction on how to select the next 
> | node from the set
> | of nodes reachable from the current node. This means 
> | there's no inherent distinction between:
> 
> Exactly.  And the stuff between the / starts with a segment which
> selects the next direction to go, and is followed by zero or more
> predicates which can be used to filter the result set but which
> do not select anything.

I just don't get it. Please explain why selecting on the value of a key is
not just another form of the general operation of "filtering a result set".
In my view both are a way to select a particular set of zero or more nodes
out of a set of candidate nodes. I don't care whether I'm selecting them
according to having a specific value, the existence of paths starting in
them, having a particular type family, or whatever other criteria we come up
with.

You said:

> path_segment = selector ( '[' predicate ']' )*
> selector     = key | '*' | '.' | '..'
> path         = '/'? path_segment ( '/' path_segment)*
> 
> Selections done within the selector are included in the output,
> but selections/computations done within the predicate are not.

I still don't get it. You artificially divided the set of selectors to two;
everything that is based on paths (what you call predicate) and everything
that doesn't (presumably what you call selector). You then, for no obvious
reason, require that there would be at exactly one from the 'selector'
group, with an optional set from the second group (is this really what you
had in mind?). It all looks completely arbitrary and complicated.

In my view, there's no such distinction. "[relative-path]" is a selector.
"foo" is a selector. I can have one. I can have the other. I can have both.
I can have two of one and three of the other, as in:

/(foo&[?x=1])|(bar&[?y=3])|(baz&([?x=1]&[?y=2]))/x

That is:

path = '/' path_segment
       ( ( '/' | '?' ) path_segment )*
path_segment = or_cond
or_cond = and_cond
        | and_cond '|' or_cond
and_cond = simple_cond
         | simple_cond '&' simple_cond
simple_cond = '(' path_segment ')'
            | value_cond
            | type_cond
            | reachable_cond
            | rel_cond
value_cond = simple_value # Checks node value
           | '*'          # Matches any value.
           | regexp       # The above may be one...
           | compare_cond # > < etc. for numbers & dates
           | ...
type_cond = '!' simple_value # Specific type
          | '!' '*'          # Seems prudent
          | '!' regexp       # The above may be one...
          | ...
reachable_cond = '[' or_path ']'
or_path = and_path '|' and_path
and_path = path
         | path '&' path
rel_cond = '..' # parent
         | ??   # descendent
         | ??   # ancestor
         | '@' number # @-1 : prev seq sibling?

OK, I don't know what to write for 'descendent' and 'ancestor' above, and
some of the details are surely wrong. Nevertheless, I think this is the
cleanest, simplest way to go about it. No artificial separation to classes
of selectors. For example, I see no big difference between /foo[?x/>2] and
/x/>2; you seem to consider comparisons to be something limited to
"predicates" because "they aren't nodes". I'm rather baffled by this.

> BTW, you completely cut out an example showing the difference
> between these two without addressing it.

Do you mean:

> Let's introduce some character ? for now, to inform 
> of recursion down into the key rather than the value.
> 
>   [./x=1]   # goes down the value side...
>   [?/x=1]   # goes down the key side...

Yes, sorry about that. I think my proposed syntax (and approach) results in
simpler YPATH expressions (to read, write and execute). Compare:

Mine:
    YPATH 1: /names/center/x
    YPATH 2: /points/[?x=1]?y?
    YPATH 2: /points/[?x=1]/name

Yours:
    YPATH 1: /names/center/x
    YPATH 2: /points/*/?/y[../x=1]
    YPATH 2: /points/*/name[../?/x=1]

You have said yourself:
> I wonder if there is a cleaner syntax for managing 
> keys which keeps the "simple" YPath's "simple".

There is. See above :-)

> Note that in your results for 2 and 3, the node *X doesn't
> occur... this is beacuse it is in a predicate.

(I'd say: in a reachable path selector). Correct.

> I was thinking of Steve's complaint about complexity.  How about
> we have an Xpath return a sequence components: (a) the root node,
> (b) the path taken, (c) the result.  In this way, those who want
> to ignore one or more compoents can easily do so.
> 
>   Result 1:
>    - 
>     root: *ROOT
>     path: [ *NAMES, *CENTER, *X ] 
>     node: *ONE

You have omitted some nodes from the path (NMAP and POINT). At any rate,
using a simple sequence also works:

Result 1: [ *ROOT, '/', *NAMES, ':', *NMAP, '/', *CENTER, ':', *POINT, '/',
*X, ':', *ONE ]

The root is the first member, the selected node is the final member. Pretty
easy to get to them if you aren't interested in anything else.

> | Of course, there are various alternative syntax forms and path
> | representations that would go with it; the above just gives 
> some examples.
> 
> Nods.  But also I think that you see the distinct

? It seems something got clipped here...

> Hmm.  The traverse operator is indeed less simple.  *boggles*

I think that if you use /<something>/ to represent it, rather than '//',
then it is actually becomes pretty simple to include into the scheme. My
scheme, that is :->

> How about we move
> to a more verbose syntax for a spell, perhaps one using YAML so
> that the structure is more clear?
> 
> ---
> what: /
> is:
>    select: current-node
>    from: root
> ---
> what: /a
> is:
>    select: value-node
>    from:
>        select: child-pairs
>        from: root
>    where:
>      - 
>        operator: equals
>        rhs: 
>           select: key-node
>           from: current-node
>        lhs: a

Are you *certain* you don't mean YPATH to be a YQUERY? The above sure smells
of it. Ugh!

> Yes, it's verbose, but it could probably help.  Also,
> it is rather canonical...

Nice notion and it demonstrates why your approach is complex as, well...
Consider:

what: /a
is:
    - select: root
    # selects from keys-of if a collection,
    # or from the node itself if a scalar.
    - select-[from-keys-of]-current-node-if:
        # The selection criteria
        value-is: a

Isn't that *so much* simpler?

> It seems that we are close.  You seem to be abbreviating
> ?/x as ?x -- I understand, but don't think that this 
> short-cut is wise... and I can't explain why yet.

I don't know. I keep getting the feeling that you and I are talking about
very different approaches - something basic is different. I don't know if I
can phrase it in a single sentence, though.

> As for Path #2, I can't figure out how I could possibly
> interpret what you wrote...  

# Note: x/1 rather than x=1. My mistake!
what: /points/[?x/1]?y?
is:
# /
    - select: root
    - select-from-[keys-of]-current-node:
# /_points_
        value-is: points
# /points_/_
    - select-value-associated-with-current-node
    - select-[from-keys-of]-current-node-if:
# /points/_[_..._]_
        path-is-reachable-from-here:
# /points/[_?_...]
            - select: here
            - select-[from-keys-of]-current-node-if:
# /points/[?_x_...]
                - value-is: x
# /points/[?x_/_...]
            - select-value-associated-with-current-node
            - select-[from-keys-of]-current-node-if:
# /points/[?x/_1_]
                  value-is: 1               
# /points/[?x/1]_?_
    - select-[from-keys-of]-current-node-if:
# /points/[?x/1]?_y_
        value-is: y
# /points/[?x/1]?y_?_
    # DO NOT select-value-associated-with-current-node.
    # If there wasn't a trailing ?, it would have been
    # added automatically at the end.

You can _see_ the YPATH engine going through the path.

Have fun,

	Oren Ben-Kiki