From: Michael B. <mbn...@mb...> - 2003-02-20 12:01:17
|
> mmm, finally what you suggest: > > notices/notice[.//sencence &= 'house']/author > > is the same as i suggest: > > /notices/notice//sencence[. &= 'house']../../author > > and it doesn't return the same number of authors than sencences, so we > don't know which author corrsponds to each sentence. > > I think the solution is what you have said, a nested query. You get all > notices that contains sencences with the word 'house', and for each notice, > you query for the sencences. Then you can compound the result. > > The problem with this solution is that if you have an additional level in > the tree, for example, the first level tag is newspaper, and this newspaper > has a publication year, and you want to show this year together with the > sencences, you have to make three loops: > > 1) Get the newspapers that have sentences that contain the word 'house'. > 2) For each newspaper, get the notices that have sentences that contain the > word 'house'. > 3) For each notice, get the sencences that contain the word 'house'. > > And finally, we can compound the result. > > But i don't see any other solution. > I think what's behind all this is a rather important fact. XPath isn't intended as a self-sufficient query language. It's a way of describing the location of any specified part of any XML document, or, a little more precisely, of specifying any set of such locations that can be described by a single complete XPath expression. As such it is a vital tool in any mechanism for querying (or manipulating) XML, but to do more than meet that basic function it has to either be extended or supplemented by other tools. As I see it, eXist works both by extension and supplementation of XPath. The XPath extensions Wolfgang has so far implemented, however, are confined (and I would say correctly so, in keeping with key principles of XML) to filling out what many regard as a serious and unjustifiable shortcoming of XPath in the current spec: standard XPath is great for navigating document structure, but pretty weak for matching text content or attribute values. Wolfgang's extensions (which, he would be the first to acknowledge, were influenced by others, Howard Katz in particular) address those weaknesses in a way that strikes me as wholly continuous with the existing spec and certainly worthy of serious consideration by the W3C for incorporation into future revisions. However, "all" they do is give the standard XPath syntax more power to specify its targets more precisely in terms of textual content. They don't attempt to extend or modify the approach XPath takes to getting to those targets. In particular, they don't alter the inability of an XPath engine to "backtrack" and start over if an expression fails near its right-hand end. This is the result of (W3C) design decisions, made for complex technical reasons which are mainly inspired by the stress in XML specification circles on relatively easy implementability. So that's where what I call "supplementing" XPath comes in. eXist offers nested queries for precisely this purpose, to prune and/or merge nodesets via repeated applications of separate Xpath expressions, and in association with Cocoon it can use XSLT pipelining, DOM methods and/or SAX filter chains to achieve the same ends. In my uses of eXist, where for reasons I won't go into here I need to minimize dependence on Java, but where my queries are structurally quite similar to those Mario wants to do, I use eXist for the "grunt work" of retrieval, then prune/merge its result sets by using the same sort of XSLT/SAX/DOM tools, but in C/C++ libraries called from Perl. I don't regard the need to do this as in any sense a limitation of eXist. It is, arguably, a limitation of XPath as currently specified, but then the implementation implications of building a significant backtracking capacity into XPath would be pretty severe, and I'm not convinced they are justified. Even when implemented and debugged, an XPath engine capable of retrieving Mario's desired result via a single expression might end up gobbling far more resources than the apparently more laborious triple iteration currently required. In short, eXist, in the best traditions of application design and Open Source, uses existing (sorry!) tools to create a new tool that allows us to do desirable but previously impossible things. But it is itself just a tool among others, and not every tool has or should have aspirations to become a Swiss Army Knife. Indeed many other once trim and lean tools have become bloated and obese as a result of such ambitions. I would like to see Wolfgang expand the implementation of XPath to incorporate more of the spec, especially ability to use all axes, but I for one wouldn't be too keen to encourage him to take extension any further. In my view, it's enough to give us ready integration between eXist and other tools also at our disposal. Michael Beddow |