## Re: [saxon] performance of running many key/set operations in saxon

 Re: [saxon] performance of running many key/set operations in saxon From: James A. Robinson - 2008-02-11 18:58:11 Ah, well I'm afraid I did simplify the problem a tiny bit. We've got a case where fragments (figures) can be associated with journals, volumes, or issues, etc. But the way I read your suggestion, it sounds very much like the solution I ended up using: creating a composite key which can indicate value/no-value as appropriate down the hierarchy. It's just in my case I reduced it to a single key which artificially represents all the tiers. > role > role + journal > role + journal + volume > role + journal + volume + issue > etc > > then when your input contains a @volume but not @issue, you > know you need the role+journal+volume key > > I don't think you need individual keys and intersects here... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax)

 Re: [saxon] performance of running many key/set operations in saxon From: James A. Robinson - 2008-02-11 18:54:48 Thanks very much for the detailed response. I think I'll try to reproduce the original logic when I get a chance later this week, and take a look at the explain output to see if it matches up with the possible trouble areas you outline. Jim > The intersect operator in Saxon is implemented using a sort/merge approach: > the two input sequences are sorted into document order, and then both are > scanned, picking out nodes that appear in both sets. So the cost is at least > linear with the number of nodes in the two input sequences, which in your > case might be fairly large. > > With the code as shown above, it looks as if the sorting of the input > sequences is unnecessary, because the result of the key() function is > already sorted. However, it's possible that once you rearrange the code into > a recursive function call, Saxon loses the information that the inputs are > sorted, and does an unnecessary re-sort, which will add to the cost. You > should be able to determine whether there is such a sort by looking at the > "explain" output. The cost is also going to depend heavily on whether the > variables holding the intermediate results are materialized in memory or > whether it can all be pipelined - that can depend on so many factors I would > really need to run the code to see how it behaves. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax)
 Re: [saxon] performance of running many key/set operations in saxon From: James A. Robinson - 2008-02-11 18:58:11 Ah, well I'm afraid I did simplify the problem a tiny bit. We've got a case where fragments (figures) can be associated with journals, volumes, or issues, etc. But the way I read your suggestion, it sounds very much like the solution I ended up using: creating a composite key which can indicate value/no-value as appropriate down the hierarchy. It's just in my case I reduced it to a single key which artificially represents all the tiers. > role > role + journal > role + journal + volume > role + journal + volume + issue > etc > > then when your input contains a @volume but not @issue, you > know you need the role+journal+volume key > > I don't think you need individual keys and intersects here... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax)