summary: >
This is a brief introduction to YPATH as inspired by XPATH, the
XML Path node selection language. XPATH has a very rich history
and is, IMHO, one of the better things to emerge from XML land.
This is meant as a starting point for discussion. I "cowboyed"
an implementation of this that is included in the python
distribution; but this doesn't mean that this should be final.
Indeed... alot of thought needs to go into this thingy.
xpath:
why: >
This implementation is modeled somewhat after XPATH. Although
I disagree wholeheartedly with where xpath20 is going (it's
moving more towards a full-blown query language), I think that
the original xpath is rather clean and inspiring.
specification: |
http://www.w3.org/TR/xpath
http://www.w3.org/TR/xpath20req
http://www.w3.org/TR/xpath20/
history: >
XPATH was originally part of a larger XML Stylesheet (XSL)
specification that was later broken into three specifications:
XPATH at the bottom, XSLT dependent upon XPATH, and then XSL-FO.
This split of XSLt occured primarly since XPointer (now
somewhat defunct) had a similar set of requirements. As it
turns out, XML Query and many other specifications now rely
directly upon XPath without requiring XSLT.
operation: >
XPath is what I call a node selection language. Each expression
"marks" a set of nodes within an XML tree. In an abstract sense,
XPath doesn't actually modify, merge, or otherwise generate
nodes -- it merely acts as a selector.
imperfections: >
Of course, since is part of the XML family of specifications it
is imperfect. Namely, due to XML's clouded information model,
how XPath behaves is also clouded. Furthermore, the characterization
above isn't quite perfect, since XPath can return scalar values
based on computations on an input tree; count(//.) for example
returns a count of the number of nodes in a given tree.
The current proposal for 2.0 is an absolute mess. They've
included a bunch of XML Query features (building an output tree)
that just don't belong in such a low-level tool. Bad mistake,
it will truely haunt them in the future. Note that James Clark,
the original creator/visionary of XPath/XSLT is no longer part
of this unnecessary expantionist agenda.
overview: >
Without a clear vision for YPATH, we will quickly get ourselves
into very deep and muddy territory. We have the advantage here of
being able to leverage the fantastic work of the XML community, in
that we can pick and choose what we like from XPATH and leave out
those things we dislike. In particular, we must have a clear line
as to what is part of XPath and what is out-of-scope. Please
consider the following as my first take at this scope, the specific
challenges with regard to our scope, and a proposed implementation.
vision: >
YPATH should be a node selection language. It should take a YAML
graph (starting at a root node) and an expression and return a
set of nodes which match the expression.
contexts: >
Unlike XML, YAML is a graph, thus a node may appear more than once
in a given result. As such, it's not adequate to just return the
nodes which match; one must really return the *context* which matches.
By context I mean a stack of key/value pairs leading from the root node
(with null key) to the selected node; where the index is used as the key
for arrays. Thus, YPATH, unlike XPATH should return a set of contexts
rather than a set of nodes.
wrinkles: >
I haven't the foggiest idea how to do structured keys
with YPATH yet...
components: >
A YPath expression is composed recursively of two sorts
of constructs. The first is a "segment" which details how
the YPath processor should move within the graph. Segments
are nested using a slash (/) for a divider. The second is a
"predicate" which evaluates to a true/false value. If the
predicate is true, then the node selected by the nearest segment
is allowed through.
input: |
--- &ROOT
player: &PLAYER
- &SOSA
given: Sammy
family: Sosa
- &KEN
given: Ken
family: Griffey
- &MARK
given: Mark
family: McGwire
examples:
-
path: /
result:
-
- { key: ~, value: *ROOT }
-
path: /nothing
result: ~
-
path: /player
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
-
path: /player/0
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 0 , value: *SOSA }
-
path: /player/*
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 0 , value: *SOSA }
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 1 , value: *KEN }
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 2 , value: *MARK }
-
path: /player/*/given
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 0 , value: *SOSA }
- { key: given , value: Sosa }
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 1 , value: *KEN }
- { key: given , value: Griffy }
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 2 , value: *MARK }
- { key: given , value: McGwire }
-
path: /player/.
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
-
path: /player/*[./given=Sosa]
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 0 , value: *SOSA }
-
path: //*[./given=Sosa]
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 0 , value: *SOSA }
-
path: //*[../given=Sosa]
result:
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 0 , value: *SOSA }
- { key: given , value: Sammy }
-
- { key: ~ , value: *ROOT }
- { key: player , value: *PLAYER }
- { key: 0 , value: *SOSA }
- { key: family , value: Sosa }
basics:
- item: /
role: >
The slash selects the (~,root node} pair and
acts as a divider between path segments
- item: key
role: >
This selects the (key,value) pair directly
under the current collection. If key is an
integer this selects the nth item in a given
list as well.
- item: .
role: >
This selects the current pair once again, useful
for obtaining a pointer to the current context
directly inside a predicate (denoted by [])
- item: ..
role: >
This selects the parent of the current pair.
- item: //
role: >
This traverses all descendent nodes and executes
the remaining path expression on each one, returning
the union.
- item: []
role: >
This describes a predicate. The predicate acts as
a "filter" on the current selected contexts, knocking
out those that don't return true. Within a predicate
strings are treated as literals, unless prefixed with
./ or / or some other way that identifies a subordinate
path expression.
- item: =
role: >
valid within a predicate, is used to compare the contents
of a node's value with a string or with the contents of
another node's value.
notes: >
an open issue is how to treat this when neither the rhs
nor the lhs are constants; xpath treats this as the
"intersects" operator in this case; I think it should
return an error in this case... keeping the intersect
operator distinct.
implementation: >
The current Python package includes a ypath expression
evaluator which does this stuff. Instead of returning
a long list of pairs (as above), the test suite returns
an "absolute" YPATH for each context matched.
notes: >
This is meant as a conversation starter. Please play with
the python implementation to get a "feel" for it. What is
very important when thinking about this is remembering scope.
The goal of this language is *NOT* to build an output. Building
an output is the goal of a query or transform language. This
is a component in such a larger tool. Think of this tool as
a majic marker, you press down the marker at the root node,
and use it to highlight a selection of paths from the root
node down into the leaves. This tool "colors" the graph
if you wish to use the term...
salutation: >
Share & Enjoy,
Clark
--
Clark C. Evans Axista, Inc.
http://www.axista.com 800.926.5525
XCOLLA Collaborative Project Management Software
|