This is now logged here

and a patch has been committed.

(The new regex engine introduces new classes for handling Unicode strings and these are also being used for operations such as substring() and translate() that are sensitive to surrogate pairs; the bug arose in the course of this change.)

Michael Kay

On 19 Apr 2013, at 18:47, Gunther Rademacher wrote:

Thank you for Saxon 9.5!
I have started to use it, replacing, but my test suite is showing a few issues. Will try to isolate them as much as possible. The first one is a problem with the substring function, when addressing beyond the end of a string containing non-UCS-2 chars. A REx generated parser contains this line of XQuery
     let $c0 := (string-to-codepoints(substring($input, $current, 1)), 0)[1]
or this one of XSLT
     <xsl:variable name="c0" select="(string-to-codepoints(substring($input, $current, 1)), 0)[1]"/>
for accessing codepoints individually, with a 0 terminator. Either of them fails when going for the terminator, when the input contains non-BMP characters. This is reproducible by
When run from the XQuery command line, the stack trace is
     java.lang.IndexOutOfBoundsException: endIndex=2; sequence size=1
        at net.sf.saxon.regex.GeneralUnicodeString.substring(
        at net.sf.saxon.functions.Substring.substring(
        at net.sf.saxon.functions.Substring.evaluateItem(
        at net.sf.saxon.functions.Substring.evaluateItem(
        at net.sf.saxon.expr.Expression.iterate(
        at net.sf.saxon.expr.FunctionCall.preEvaluate(
        at net.sf.saxon.expr.FunctionCall.typeCheck(
        at net.sf.saxon.functions.Substring.typeCheck(
        at net.sf.saxon.expr.parser.ExpressionVisitor.typeCheck(
        at net.sf.saxon.query.XQueryExpression.<init>(
        at net.sf.saxon.query.QueryParser.makeXQueryExpression(
        at net.sf.saxon.query.StaticQueryContext.compileQuery(
        at net.sf.saxon.Query.compileQuery(
        at net.sf.saxon.Query.doQuery(
        at net.sf.saxon.Query.main(
Best regards
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
saxon-help mailing list archived at