Re: [Exist-open] Question on text:highlight-matches

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Marcus: as Pierrick has pointed out, it looks as though all you need is a
match marker that your post-query xslt can pick up and munge to meet your
needs, and so I really should have pointed you, as Pierrick has now done, to
the way to turn on the serializer's match-tagging options instead of
suggesting you strike out off-piste in uncertain weather conditions.

Some people wonder, in that case, what the point of the
text:highlight-matches() function is, since it's so difficult to use unless
your resired result node is the parent of the text node that contains the
matches. Once answer is in the hint given in the docs about circumstances
where the serialization-time option doesn't work. These mainly arise when
the result being returned is derived by further XQuery processing, maybe
against a constructed intermediate node-set, of the initial results obtained
by matches on the search terms. In such a case, the nodes the serializer
gets to see may well have lost the match-marking info they picked up in the
initial processing phase, and you need to intercept them straight after that
phase and mark them yourself in a way that will propagate into the final
output.

For those who do need to take the route of a custom node filter, one
addition to the sketchmap Pierrick has already provided earlier in this
thread

[PB]> a function which takes an
> element (let's keep it simple ;-) as its argument.

As a hint to progress beyond the point where the design of such a function
has to stop being simple (because you need to pass in and filter a node and
all its various children) take a look at the function Patrik Nyman posted
here last week (originally written by David Sewell for the TEI-L) which is
an implementation of a custom filter showing how filtering  can (=has to) be
tackled recursively.

Since your docs look highly "data-centric" you aren't ever likely to need to
meed the specific use-case David's code addresses, but nevertheless the
basic approach is what you would need to adapt if you wanted the maximum
level of control over match-tagging.

To non TEI people it may not be apparent why David's filter is needed at
all, so a brief oversimplifying intro may help.

Suppose we have somewhere in a document a structure like
The text <pb/> of a paragraph in a <pb/> print edition with very small
pages
The task is to extract the first <pb/> node together with (only, but all)
the nodes that come between it and the next <pb/> node, while nevertheless
returning the full structural environment of those nodes. It's the latter
requirement that makes the filtering necessary, because we want the 
element, but we want only a subset of its descendant nodes, which are
bounded by empty <pb/> elements rather than being enclosed in an
easy-to-process single element of their own. Here (I said this was
oversimplified...) that means we want to output
<pb/>of a paragraph in a

Davids' function achieves that by passing in (arguments 1 and 2) the id's of
the start and end <pb>s and (argument 3) a node which is known to be an
ancestor node of the innermost element that contains both the start and end
<pb/>s. It then recurses over the tree of which argument 3 is the root,
examining each node, doing whatever is appropriate for that node, then
calling itself with a new container node parameter until the terminating
condition for the recursion is satisfied. As the recursion unrolls, the
desired filtered subtree is output.

Now David's code, being designed in the limit case to process a whole TEI
document, needs to handle many cases not relevant to match-marking needs,
but the smallest item it ever needs to handle is an entire text node treated
as an atom. Hence, it doesn't attempt to examine or manipulate the content
of any text node it comes across -- it simply passes the node through.
Whereas for the custom-match marking case, it would be necessary to pass the
text nodes (only) through the text:highlight-matches() function and an
appropriate callback as and when they are encountered in the recursive pass.
That would allow text:highlight-matches() to do its stuff on those nodes,
while allowing them to be output in their full structural environment, which
is what you were aiming for.

Michael Beddow

Re: [Exist-open] Question on text:highlight-matches

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Question on text:highlight-matches