Hi Saxon users,
This may sound a bit like a cross post (see "[xsl] Why does '#' start a
comment in regular expressions with 'x' modifier flag? How can I match
'#'?" earlier today on the XSL list), however, while researching it, I
found several issues that I believe are genuine bugs. I use Saxon 8.7.3
1. The character '#' (number sign, U+23) acts as comment to end-of-line
in 'x' mode
<xsl:analyze-string select="'#test'" regex='#test' flags="x">
Will raise the error: "XTDE1150, The regular expression must not be one
that matches a zero-length string.".
There is currently no workaround for using both the flag 'x' and the
character '#', because an escape and/or character reference are not
2. Backreferences that reference no matching parentheses should not
raise an error, but match a zero-length string.
This is according to
http://www.w3.org/TR/xpath-functions/#regex-syntax, which states "If no
string is matched by the
nth capturing subexpression, the
back-reference is interpreted as matching a zero-length string.".
If N parenthesized expressions exist, then backreference N+1 will be
allowed and match an empty string. N+2 or higher will raise the error
"No such group yet exists at this point in the pattern near index xx"
Treated correctly : "(.)\1" and "\1x" and "(.)\2"
Treated incorrectly: "(.)\3" and "\2x" and "(.)\234"
3. A non-matching backreference, when not raising an error, fails the
Using the expression "(.)\2" always fails. The behavior in this respect
is not clear from the specs, which states that "\2", when not referring
to anything, should match an empty string. Imho, this means that it
should not interfere with the rest of the expression. I.e., "(.)\2" is
equal to "(.)". Otherwise: it never matches anything (is that more
PS: by now, I believe that my version might be the problem. I will
assess the above with Saxon 8.8.