Thanks for explaining this to me. Obviously I missed the "first match" =
rule, and
I must admit that I don't like it. In fact I was relying on the "longest =
match" rule=20
in a piece of code that composes more complex regular expressions from=20
simpler ones.
As an example, consider the Wildcard production from the XQuery spec:
[80] Wildcard ::=3D "*"
 (NCName ":" "*")
 ("*" ":" NCName )
Supposed you want to tranform this into a regular expression that =
matches
the longest possible input, you might end up with something like
"^(\*N:\*\*:N)"
where N is the subexpression corresponding to NCName. Now under the =
"first=20
match" rule, I can't see a way to express "the longest of either" in a =
single=20
regular expression without understanding how the subexpressions might
overlap.=20
Of course the production can be rewritten to completely avoid the =
overlap, but
my general approach breaks here, because I cannot properly map the "" =
EBNF=20
operator.
Best regards,
Gunther
________________________________
From: saxonhelpbounces@... =
[mailto:saxonhelpbounces@...] On Behalf Of Michael =
Kay
Sent: Wednesday, January 16, 2008 9:18 AM
To: 'Mailing list for SAXON XSLT queries'
Subject: Re: [saxon] Size of matches of a regular expression
This is as specified. See =
http://www.w3.org/TR/xpathfunctions/#funcreplace :
=20
If two alternatives within the pattern both match at the same position =
in the $input, then the match that is chosen is the one matched by the =
first alternative.
=20
This rule also appears in my XPath book  page 448.
=20
The "longest match" rule applies only to the interpretation of =
quantifiers, not to the treatment of alternatives.
=20
Michael Kay
http://www.saxonica.com/
________________________________
From: saxonhelpbounces@... =
[mailto:saxonhelpbounces@...] On Behalf Of =
Rademacher, Gunther
Sent: 16 January 2008 02:03
To: saxonhelp@...
Subject: [saxon] Size of matches of a regular expression
=09
=09
My understanding was that a regular expression will always match the =
longest=20
possible substring, unless the reluctant qualifiers are used, in which =
case it will=20
match as short as possible.=20
Now I found that Saxon (tested with both 8.8 and 9.0.01) behaves =
differently, in=20
that it chooses the first matching branch of a choice, regardless of =
the length=20
consideration, e.g.=20
replace("ABC", "AAB", "X")=20
returns "XBC", but when using the longest match, it should be "XC". =
Similarly,=20
when the reluctant qualifier is used,=20
replace("ABC", "(ABA){1}?", "X")=20
returns "XC", but with the shortest possible match, it should be "XBC". =
Best regards,=20
Gunther=20
=09
Software AG  Sitz/Registered office: Uhlandstra=DFe 12, 64297 =
Darmstadt, Germany,  Registergericht/Commercial register: Darmstadt HRB =
1562  Vorstand/ Management Board: KarlHeinz Streibich =
(Vorsitzender/Chairman), David Broadbent, Mark Edwards, Dr. Peter =
K=FCrpick, David Mitchell, Arnd Zinnhardt;  Aufsichtsratsvorsitzender/ =
Chairman of the Supervisory Board: Frank F. Beelitz  =
http://www.softwareag.com <http://www.softwareag.com/>=20
=09
=09
=09
