## saxon-help

 Re: [saxon] performance of running many key/set operations in saxon From: James A. Robinson - 2008-02-11 18:54:48 ```Thanks very much for the detailed response. I think I'll try to reproduce the original logic when I get a chance later this week, and take a look at the explain output to see if it matches up with the possible trouble areas you outline. Jim > The intersect operator in Saxon is implemented using a sort/merge approach: > the two input sequences are sorted into document order, and then both are > scanned, picking out nodes that appear in both sets. So the cost is at least > linear with the number of nodes in the two input sequences, which in your > case might be fairly large. > > With the code as shown above, it looks as if the sorting of the input > sequences is unnecessary, because the result of the key() function is > already sorted. However, it's possible that once you rearrange the code into > a recursive function call, Saxon loses the information that the inputs are > sorted, and does an unnecessary re-sort, which will add to the cost. You > should be able to determine whether there is such a sort by looking at the > "explain" output. The cost is also going to depend heavily on whether the > variables holding the intermediate results are materialized in memory or > whether it can all be pipelined - that can depend on so many factors I would > really need to run the code to see how it behaves. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax) ```
 Re: [saxon] performance of running many key/set operations in saxon From: James A. Robinson - 2008-02-11 18:58:11 ```Ah, well I'm afraid I did simplify the problem a tiny bit. We've got a case where fragments (figures) can be associated with journals, volumes, or issues, etc. But the way I read your suggestion, it sounds very much like the solution I ended up using: creating a composite key which can indicate value/no-value as appropriate down the hierarchy. It's just in my case I reduced it to a single key which artificially represents all the tiers. > role > role + journal > role + journal + volume > role + journal + volume + issue > etc > > then when your input contains a @volume but not @issue, you > know you need the role+journal+volume key > > I don't think you need individual keys and intersects here... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax) ```
 Re: [saxon] performance of running many key/set operations in saxon From: James A. Robinson - 2008-02-15 15:21:36 ```Well, I'm pretty embarassed about this, but I'm having trouble getting the 20-seconds response times again. After reading Dr. Kay's response I decided to run some tests and look at the explain output. Since I didn't have the original stylesheet on hand (having had rewritten it with the faster single-xs:string key approach), I had recoded the work from memory. I was finding this new code was taking about 990 ms to run, so only about 200 ms slower than the single key lookup. This was far more in line with what I would have expected. I had the original code dug up from backup tapes, and have found that code does run slower, just not at the 20 second range. It does run about 5 seconds though. So either I hyperinflated the numbers in my head when writing up the problems, or my machine was particularlly busy doing something when I ran the test and I just didn't realize it... So I'm left with three stylesheets. Call them slow (5 sec), faster (990 ms), and fastest (770 ms). The difference be the first two and the last is that the code is completely different, the fastest stylesheet uses the more efficent approach of building a single key, a compound xs:string value. The first two stylesheets are nearly identical, but with one crucial difference. In both stylesheets I have a helper function which kicks off the recursive key/intersect operation. In the faster (990 ms) one I coded up the call to the recursive function as: In the slowest (5 sec) one I had coded up this call as: The difference of course is that first key() call in the faster stylesheet returns a smaller set of items, sometimes a much smaller set, than the original code which simply selected everything on the first pass. On Fri, 08 Feb 2008 11:36:58 -0800 I wrote: < < ... < < I had two thoughts on how to solve the problem, both involving < xsl:key/fn:key lookups. One solution was to write a function which < could, when given a or a element, compute an the < same xs:string identifier: < < ... < < The other solution I thought might work involved building a separate < xsl:key for each indexing element < < ... < < and then write a simple recursive function to use the itersection < operator to winnow out the entries which wasn't in each key table. < < ... < < It turns out that on my particular computer, for 111 lookups into a < compound document of 671 c:child elements, the "single xs:string as a < key method" takes about 770 milliseconds to run, where the recursive < function method takes about 20 seconds. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax) ```
 Re: [saxon] performance of running many key/set operations in saxon From: Andrew Welch - 2008-02-15 15:44:47 ```Instead of: \$keys[position() ne 1] you might want to use: subsequence(\$keys, 2) although they may well end up generating the same code... (it did trim a few hundeths off of my sudoku solver though) -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ```
 Re: [saxon] performance of running many key/set operations in saxon From: Michael Kay - 2008-02-15 15:58:54 ```> Instead of: > > \$keys[position() ne 1] > > you might want to use: > > subsequence(\$keys, 2) > > although they may well end up generating the same code... They should be 100% equivalent in recent Saxon releases. I tend to write remove(\$keys, 1) as the shortest of the trio. Michael Kay http://www.saxonica.com/ ```
 Re: [saxon] performance of running many key/set operations in saxon From: James A. Robinson - 2008-02-15 15:52:29 ```It appears that both optimize out to a specialized 'remove' function: < OPT ====================================== < OPT : At line 87 of file:/Users/jimr/Desktop/sample-merge/sample-merge.faster.xsl < OPT : Rewrote Filter Expression as: < OPT ====== Expression after rewrite ====== < < < < I like your suggestion though, as it seems *clearer* to me to use your technique instead of the [position() != 1] (which I started using back in XSLT 1.0 days and never stopped to reconsider). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax) ```