Thanks, a very simple bug internally; I've logged it at https://saxonica.plan.io/issues/1636 and have committed a patch.

Internally, the bug occurs when you try to append the contents of a java.lang.StringBuffer to an instance of Saxon's LargeStringBuffer. Saxon doesn't make much use of java.lang.StringBuffer, preferring its own FastStringBuffer class. The StringBuffer gets into the system in this case from the third-party Unicode normalization library code invoked by the normalize-unicode() function. It doesn't happen if byte code generation is switched off (and therefore in HE mode or with opt:9), but that's only because of an inefficiency in the interpreted case where it quite unnecessarily converts the StringBuffer to a String.

Saxon's LargeStringBuffer is used in a TinyTree when the size exceeds 65K bytes. That's why reducing the size prevented the bug manifesting.

Michael Kay
Saxonica

On 13/10/2012 12:04, Imsieke, Gerrit, le-tex wrote:
We experienced a strange bug and created a repro which is attached.

This is what happens:

$ saxon-EE-9.4.0.6 -s:test.html -xsl:test.xsl -it:main
java.lang.IllegalArgumentException: Unknown kind of CharSequence
        at net.sf.saxon.tree.tiny.LargeStringBuffer.append(LargeStringBuffer.java:147)
        at net.sf.saxon.tree.tiny.TinyTree.appendChars(TinyTree.java:405)


As far as I can tell, the following conditions must be met in order for the bug to manifest:

– The total text length needs to exceed a certain threshold. In attached XHTML file, remove a (non-ignorable) character and the bug will disappear.

– The text must contain multi-byte UTF-8 characters. The example contains interspersed U+2014 chars. The bug was reproduced with U+2013 and U+00AD but not with U+007D.

– It was reproduced with Saxon EE 9.4.0.4 and EE 9.4.0.6, but not with the corresponding HE versions or the EE versions in unlicensed mode. It couldn’t be reproduced with EE 9.2 and 9.3.

– The bug appears when using normalize-unicode(), but the actual choice of the Unicode normalization form doesn’t seem to matter.

– The bug disappears when using -opt:9 instead of -opt:10

– The bug doesn’t appear when transforming without storing the intermediate document in a variable (as seen when calling Saxon without -it:main).

I’m using Saxon on Windows / Cygwin. Please see below for Java version and Saxon invocation.

Gerrit


$ java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode)


$ cat `which saxon-EE-9.4.0.6`
#!/bin/bash
java \
   -cp 'c:/Programme/saxon/EE-9.4.0.6/saxon9ee.jar;c:/cygwin/usr/share/xml/tagsoup-1.2.jar;c:/Programme/xml-commons-resolver/resolver.jar;c:/Programme/xml-commons-resolver/' \
   -Dfile.encoding=UTF8 \
   -Xmx1024m -Xss512k \
   com.saxonica.Transform \
   -x:org.apache.xml.resolver.tools.ResolvingXMLReader \
   -y:org.apache.xml.resolver.tools.ResolvingXMLReader \
   -r:org.apache.xml.resolver.tools.CatalogResolver \
   -strip:ignorable \
   -l \
   -expand:off \
   -opt:10 \
   "$@"




------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev


_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help