From: Michael B. <mbe...@mb...> - 2007-05-10 09:53:42
|
Marcus: as Pierrick has pointed out, it looks as though all you need is a match marker that your post-query xslt can pick up and munge to meet your needs, and so I really should have pointed you, as Pierrick has now done, to the way to turn on the serializer's match-tagging options instead of suggesting you strike out off-piste in uncertain weather conditions. Some people wonder, in that case, what the point of the text:highlight-matches() function is, since it's so difficult to use unless your resired result node is the parent of the text node that contains the matches. Once answer is in the hint given in the docs about circumstances where the serialization-time option doesn't work. These mainly arise when the result being returned is derived by further XQuery processing, maybe against a constructed intermediate node-set, of the initial results obtained by matches on the search terms. In such a case, the nodes the serializer gets to see may well have lost the match-marking info they picked up in the initial processing phase, and you need to intercept them straight after that phase and mark them yourself in a way that will propagate into the final output. For those who do need to take the route of a custom node filter, one addition to the sketchmap Pierrick has already provided earlier in this thread [PB]> a function which takes an > element (let's keep it simple ;-) as its argument. As a hint to progress beyond the point where the design of such a function has to stop being simple (because you need to pass in and filter a node and all its various children) take a look at the function Patrik Nyman posted here last week (originally written by David Sewell for the TEI-L) which is an implementation of a custom filter showing how filtering can (=has to) be tackled recursively. Since your docs look highly "data-centric" you aren't ever likely to need to meed the specific use-case David's code addresses, but nevertheless the basic approach is what you would need to adapt if you wanted the maximum level of control over match-tagging. To non TEI people it may not be apparent why David's filter is needed at all, so a brief oversimplifying intro may help. Suppose we have somewhere in a document a structure like <p>The text <pb/> of a paragraph in a <pb/> print edition with very small pages</p> The task is to extract the first <pb/> node together with (only, but all) the nodes that come between it and the next <pb/> node, while nevertheless returning the full structural environment of those nodes. It's the latter requirement that makes the filtering necessary, because we want the <p> element, but we want only a subset of its descendant nodes, which are bounded by empty <pb/> elements rather than being enclosed in an easy-to-process single element of their own. Here (I said this was oversimplified...) that means we want to output <p><pb/>of a paragraph in a</p> Davids' function achieves that by passing in (arguments 1 and 2) the id's of the start and end <pb>s and (argument 3) a node which is known to be an ancestor node of the innermost element that contains both the start and end <pb/>s. It then recurses over the tree of which argument 3 is the root, examining each node, doing whatever is appropriate for that node, then calling itself with a new container node parameter until the terminating condition for the recursion is satisfied. As the recursion unrolls, the desired filtered subtree is output. Now David's code, being designed in the limit case to process a whole TEI document, needs to handle many cases not relevant to match-marking needs, but the smallest item it ever needs to handle is an entire text node treated as an atom. Hence, it doesn't attempt to examine or manipulate the content of any text node it comes across -- it simply passes the node through. Whereas for the custom-match marking case, it would be necessary to pass the text nodes (only) through the text:highlight-matches() function and an appropriate callback as and when they are encountered in the recursive pass. That would allow text:highlight-matches() to do its stuff on those nodes, while allowing them to be output in their full structural environment, which is what you were aiming for. Michael Beddow |