Thank you so much.  Your quick responsiveness is almost inhuman considering the volume of work/email you must process daily!

 

Steve

 

 

From: Michael Kay [mailto:mike@saxonica.com]
Sent: Tuesday, November 11, 2008 5:43 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.

 

It actually relates to the problem described here:

 

http://saxonica.blogharbor.com/blog/_archives/2006/8/13/2226871.html

 

(I enjoyed the title of that blog...)

 

In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero."

 

But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.

 

Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.

 

It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java.

 

Meanwhile, until the fix appears in 9.2, please note that doing [string(@x) eq 'abcd'] can be significantly faster than [@a = 'abcd'].

 

Michael Kay

http://www.saxonica.com/