This is now logged here

https://saxonica.plan.io/issues/1735

and a patch has been committed.

(The new regex engine introduces new classes for handling Unicode strings and these are also being used for operations such as substring() and translate() that are sensitive to surrogate pairs; the bug arose in the course of this change.)

Michael Kay
Saxonica


On 19 Apr 2013, at 18:47, Gunther Rademacher wrote:

Thank you for Saxon 9.5!
 
I have started to use it, replacing 9.4.0.7, but my test suite is showing a few issues. Will try to isolate them as much as possible. The first one is a problem with the substring function, when addressing beyond the end of a string containing non-UCS-2 chars. A REx generated parser contains this line of XQuery
 
     let $c0 := (string-to-codepoints(substring($input, $current, 1)), 0)[1]
 
or this one of XSLT
 
     <xsl:variable name="c0" select="(string-to-codepoints(substring($input, $current, 1)), 0)[1]"/>
 
for accessing codepoints individually, with a 0 terminator. Either of them fails when going for the terminator, when the input contains non-BMP characters. This is reproducible by
 
     substring('&#x10000;',2,1)
 
When run from the XQuery command line, the stack trace is
 
     java.lang.IndexOutOfBoundsException: endIndex=2; sequence size=1
        at net.sf.saxon.regex.GeneralUnicodeString.substring(GeneralUnicodeString.java:37)
        at net.sf.saxon.functions.Substring.substring(Substring.java:225)
        at net.sf.saxon.functions.Substring.evaluateItem(Substring.java:85)
        at net.sf.saxon.functions.Substring.evaluateItem(Substring.java:28)
        at net.sf.saxon.expr.Expression.iterate(Expression.java:466)
        at net.sf.saxon.expr.FunctionCall.preEvaluate(FunctionCall.java:206)
        at net.sf.saxon.expr.FunctionCall.typeCheck(FunctionCall.java:144)
        at net.sf.saxon.functions.Substring.typeCheck(Substring.java:38)
        at net.sf.saxon.expr.parser.ExpressionVisitor.typeCheck(ExpressionVisitor.java:217)
        at net.sf.saxon.query.XQueryExpression.<init>(XQueryExpression.java:83)
        at net.sf.saxon.query.QueryParser.makeXQueryExpression(QueryParser.java:162)
        at net.sf.saxon.query.StaticQueryContext.compileQuery(StaticQueryContext.java:526)
        at net.sf.saxon.Query.compileQuery(Query.java:702)
        at net.sf.saxon.Query.doQuery(Query.java:332)
        at net.sf.saxon.Query.main(Query.java:107)
 
Best regards
Gunther
 
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help