Hi Saxon users,

This may sound a bit like a cross post (see "[xsl] Why does '#' start a comment in regular expressions with 'x' modifier flag? How can I match '#'?" earlier today on the XSL list), however, while researching it, I found several issues that I believe are genuine bugs. I use Saxon 8.7.3

1. The character '#' (number sign, U+23) acts as comment to end-of-line in 'x' mode
<xsl:analyze-string select="'#test'" regex='#test' flags="x">
Will raise the error: "XTDE1150, The regular expression must not be one that matches a zero-length string.".

There is currently no workaround for using both the flag 'x' and the character '#', because an escape and/or character reference are not allowed either.

2. Backreferences that reference no matching parentheses should not raise an error, but match a zero-length string.
This is according to http://www.w3.org/TR/xpath-functions/#regex-syntax, which states "If no string is matched by the nth capturing subexpression, the back-reference is interpreted as matching a zero-length string.".

If N parenthesized expressions exist, then backreference N+1 will be allowed and match an empty string. N+2 or higher will raise the error "No such group yet exists at this point in the pattern near index xx"

Treated correctly  : "(.)\1" and "\1x" and "(.)\2"
Treated incorrectly: "(.)\3" and "\2x" and "(.)\234"

3. A non-matching backreference, when not raising an error, fails the regex
Using the expression "(.)\2" always fails. The behavior in this respect is not clear from the specs, which states that "\2", when not referring to anything, should match an empty string. Imho, this means that it should not interfere with the rest of the expression. I.e., "(.)\2" is equal to "(.)". Otherwise: it never matches anything (is that more desirable?).

-- Abel

PS: by now, I believe that my version might be the problem. I will assess the above with Saxon 8.8.