From: Roy W. <gar...@ya...> - 2010-10-11 16:05:08
|
On 07/10/2010 18:21, Joe Wicentowski wrote: > Hi Greg, > >> True, but experience shows that the reverse performs much better, eg. >> //h1[ft:query(., $query)][@class eq 'subject'] > Great point. In principle, though, Wolfang's Performance Tuning > article states that filters should be in order of greatest selectivity > (http://exist-db.org/tuning.html#N1028C). So if only 10% of your h1 > elements have a @class predicate with the matching value, while 90% > have a hit on the query, the class filter would be best first. In > deciding on the order of your filters, you need to "guess" about the > selectivity of each one, based on your knowledge of the data and the > likely queries. > > Of course, this assumes the Lucene and range indexes have identical > efficiency, and I don't know if that's the case. > > (Roy - either way, you'll want a lucene index on h1, and a range index > on @class.) > > Cheers, > Joe > Thanks for the feedback. OK, so to extrapolate my case. I can't create an index just on h1 because class="subject" could define other elements with different tags (my source is a bit messy!). So it could be: <content> <h1 class="subject">Better flood protection for Bala</h1> </content> <content> <div id="item-title"> <p class="subject">Weeding out of water invaders</p> </div> </content> So my solution is to index content and query it thus: collection("/db/coll")//content[ft:query(., '"flood protection"')]/descendant::*[@class='subject'] This isn't quite correct though because what I really want to do is ft:query only within those elements where class=subject but it doesn't sound like that's possible in a single pass. It looks like I would be better off extracting the "subject" elements and normalizing them elsewhere in each document and creating an index around that. -- Roy |