Re: [saxon] impact of cardinality checks and implicit casting

 Re: [saxon] impact of cardinality checks and implicit casting From: Michael Kay - 2007-01-10 09:53:30 ```> The 1st expression is letting Saxon doing all the casting, > and appears to therefore be doing at least some work which is > unnecessary. However, various attempts at casting, while > changing the 'explain' output, don't necessarily appear to > indicate any *better* solution. > > I'm left unsure f whether or not there is any *real* > difference between the two? E.g., is there any point in > casting the returned \$c to xs:integer? :) > > 1) for \$c in distinct-values( > for \$t in //text() return string-to-codepoints(\$t)) > return if (\$c > 127) then \$c else () [ > let \$codepoints[refCount=1] as xs:integer* := > treat as xs:integer > convert untyped atomic items to xs:integer > filter [] > function distinct-values > for \$t as text() in > path / > / descendant::text() > return > function string-to-codepoints > checkCardinality (zero or one) > convert untyped atomic items to xs:string > atomize > \$t > operator singleton gt > . > 127 > ] The first part of the explain output here: > let \$codepoints[refCount=1] as xs:integer* := > treat as xs:integer > convert untyped atomic items to xs:integer appears to relate to . I'm a little puzzled by the presence of the "convert" and "treat", it seems static analysis should make them unnecessary (but there's always room for improvement!) The expansion of your expression is: > filter [] > function distinct-values > for \$t as text() in > path / > / descendant::text() > return > function string-to-codepoints > checkCardinality (zero or one) > convert untyped atomic items to xs:string > atomize > \$t > operator singleton gt > . > 127 The only inefficiency here is the "checkCardinality (zero or one)" which seems unnecessary: the system should know that when you atomize a text node (unlike atomizing an element or attribute), the result is always a singleton. But it's not a big cost. > filter [] > function distinct-values > for \$t as text() in > path / > / > descendant::text() > return > function string-to-codepoints > cast as xs:string > atomize singleton > \$t > operator singleton gt > . > 127 This version, in which you did the cast to string explicitly, is probably a fraction more efficient, but I doubt it's measurable. The xs:integer(\$c) seems to have been optimized away because the system has worked out that \$c is always going to be an integer anyway. Any inefficiencies in optimizing this are minute in comparison with the costs you are incurring by generating and compiling a stylesheet each time you transform an input document. If you're looking for performance, then writing a custom Writer in Java to handle the final serialization would be a much better approach. Michael Kay http://www.saxonica.com/ > ] > > > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0" > xmlns:xs="http://www.w3.org/2001/XMLSchema"; > xmlns:oxsl="uri.temp-output-namespace"> > > > > select=" > for \$c in distinct-values( > for \$t in //text() return > string-to-codepoints(xs:string(\$t))) > return if (\$c > 127) then \$c else ()" /> > > xsl:character-map to translate non-ASCII Unicode codepoints to > numeric character references when outputting text. > > > > > > string="{concat('&#', . ,';')}" /> > > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > James A. Robinson jim.robinson@... > Stanford University HighWire Press http://highwire.stanford.edu/ > +1 650 7237294 (Work) +1 650 7259335 (Fax) > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT Join > SourceForge.net's Techsay panel and you'll get the chance to > share your opinions on IT & business topics through brief > surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help ```

 [saxon] impact of cardinality checks and implicit casting From: James A. Robinson - 2007-01-10 04:37:47 ```First, I wanted to say thank you to David Carlisle and Dr. Kay for pointing out good solutions to the 'adding a namespace' issue I was having with QName attribute values, and to Dr. Kay for also explaining what was going on inside Saxon. Today I've been having fun using the Saxon 'explain' feature. I *love* the -e flag available in the XQuery command line. For XSLT it looks like one has to enable 'saxon:explain="yes"' on a per template basis, does anyone know if there is any way to 'explain everything' for a stylesheet? Some code I was looking at today dealt with solutions for pulling out the distinct non-ASCII codepoints in the text() of a document. I finally got it boiled down to an XPath selection grabbing the distinct codepoints above 127, and I spent some time looking at what Saxon says it is doing to process the expression. There are small differences depending on how I cast things, and I was wondering if there is any practical difference between these two evaluated functions? The 1st expression is letting Saxon doing all the casting, and appears to therefore be doing at least some work which is unnecessary. However, various attempts at casting, while changing the 'explain' output, don't necessarily appear to indicate any *better* solution. I'm left unsure f whether or not there is any *real* difference between the two? E.g., is there any point in casting the returned \$c to xs:integer? :) 1) for \$c in distinct-values( for \$t in //text() return string-to-codepoints(\$t)) return if (\$c > 127) then \$c else () [ let \$codepoints[refCount=1] as xs:integer* := treat as xs:integer convert untyped atomic items to xs:integer filter [] function distinct-values for \$t as text() in path / / descendant::text() return function string-to-codepoints checkCardinality (zero or one) convert untyped atomic items to xs:string atomize \$t operator singleton gt . 127 ] 2) for \$c in distinct-values( for \$t in //text() return string-to-codepoints(xs:string(\$t))) return if (\$c > 127) then xs:integer(\$c) else () [ let \$codepoints[refCount=1] as xs:integer* := treat as xs:integer convert untyped atomic items to xs:integer filter [] function distinct-values for \$t as text() in path / / descendant::text() return function string-to-codepoints cast as xs:string atomize singleton \$t operator singleton gt . 127 ] xsl:character-map to translate non-ASCII Unicode codepoints to numeric character references when outputting text. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson jim.robinson@... Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax) ```
 Re: [saxon] impact of cardinality checks and implicit casting From: Florent Georges - 2007-01-10 09:07:53 ```"James A. Robinson" wrote: Hi > Today I've been having fun using the Saxon 'explain' > feature. I *love* the -e flag available in the XQuery > command line. For XSLT it looks like one has to enable > 'saxon:explain="yes"' on a per template basis, does anyone > know if there is any way to 'explain everything' for a > stylesheet? From what I understand from the sources, there is no way to set the explain property of an instruction but by setting @saxon:explain. So you would not be able to write your own transformer that set that property on all top-level instructions. Regards, --drkm __________________________________________________ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail ```
 Re: [saxon] impact of cardinality checks and implicit casting From: Michael Kay - 2007-01-10 09:16:59 ```> > Today I've been having fun using the Saxon 'explain' > > feature. I *love* the -e flag available in the XQuery > command line. > > For XSLT it looks like one has to enable 'saxon:explain="yes"' on a > > per template basis, does anyone know if there is any way to > 'explain > > everything' for a stylesheet? > > From what I understand from the sources, there is no way to > set the explain property of an instruction but by setting > @saxon:explain. Yes, there's no obvious way of doing it today. I assumed, I think, that the output would usually be unmanageably large. But it wouldn't be too difficult to write a stylesheet that transforms the XSLT input by adding saxon:explain="yes" to every xsl:template and xsl:function. Michael Kay ```
 Re: [saxon] impact of cardinality checks and implicit casting From: David Carlisle - 2007-01-10 09:31:52 ```Not sure whether it's any quicker but you could write for \$t in //text() return string-to-codepoints(\$t)) return if (\$c > 127) then \$c else () as string-to-codepoints(/)[.>127] David ```
 Re: [saxon] impact of cardinality checks and implicit casting From: Michael Kay - 2007-01-10 09:53:30 ```> The 1st expression is letting Saxon doing all the casting, > and appears to therefore be doing at least some work which is > unnecessary. However, various attempts at casting, while > changing the 'explain' output, don't necessarily appear to > indicate any *better* solution. > > I'm left unsure f whether or not there is any *real* > difference between the two? E.g., is there any point in > casting the returned \$c to xs:integer? :) > > 1) for \$c in distinct-values( > for \$t in //text() return string-to-codepoints(\$t)) > return if (\$c > 127) then \$c else () [ > let \$codepoints[refCount=1] as xs:integer* := > treat as xs:integer > convert untyped atomic items to xs:integer > filter [] > function distinct-values > for \$t as text() in > path / > / descendant::text() > return > function string-to-codepoints > checkCardinality (zero or one) > convert untyped atomic items to xs:string > atomize > \$t > operator singleton gt > . > 127 > ] The first part of the explain output here: > let \$codepoints[refCount=1] as xs:integer* := > treat as xs:integer > convert untyped atomic items to xs:integer appears to relate to . I'm a little puzzled by the presence of the "convert" and "treat", it seems static analysis should make them unnecessary (but there's always room for improvement!) The expansion of your expression is: > filter [] > function distinct-values > for \$t as text() in > path / > / descendant::text() > return > function string-to-codepoints > checkCardinality (zero or one) > convert untyped atomic items to xs:string > atomize > \$t > operator singleton gt > . > 127 The only inefficiency here is the "checkCardinality (zero or one)" which seems unnecessary: the system should know that when you atomize a text node (unlike atomizing an element or attribute), the result is always a singleton. But it's not a big cost. > filter [] > function distinct-values > for \$t as text() in > path / > / > descendant::text() > return > function string-to-codepoints > cast as xs:string > atomize singleton > \$t > operator singleton gt > . > 127 This version, in which you did the cast to string explicitly, is probably a fraction more efficient, but I doubt it's measurable. The xs:integer(\$c) seems to have been optimized away because the system has worked out that \$c is always going to be an integer anyway. Any inefficiencies in optimizing this are minute in comparison with the costs you are incurring by generating and compiling a stylesheet each time you transform an input document. If you're looking for performance, then writing a custom Writer in Java to handle the final serialization would be a much better approach. Michael Kay http://www.saxonica.com/ > ] > > > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0" > xmlns:xs="http://www.w3.org/2001/XMLSchema"; > xmlns:oxsl="uri.temp-output-namespace"> > > > > select=" > for \$c in distinct-values( > for \$t in //text() return > string-to-codepoints(xs:string(\$t))) > return if (\$c > 127) then \$c else ()" /> > > xsl:character-map to translate non-ASCII Unicode codepoints to > numeric character references when outputting text. > > > > > > string="{concat('&#', . ,';')}" /> > > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > James A. Robinson jim.robinson@... > Stanford University HighWire Press http://highwire.stanford.edu/ > +1 650 7237294 (Work) +1 650 7259335 (Fax) > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT Join > SourceForge.net's Techsay panel and you'll get the chance to > share your opinions on IT & business topics through brief > surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help ```
 Re: [saxon] impact of cardinality checks and implicit casting From: David Carlisle - 2007-01-10 10:54:53 ```incidentally I'm not sure that you need a character map for this by the time you've broken out every character's code point, you can just output the syntax directly without setting up the tables for the serialiser. string-join(for \$c in string-to-codepoints(/) return if (\$c < 127) then codepoints-to-string(\$c) else concat('&#',\$c,';'),'')" David ```