Below is the pcretest output (7.0 version):
* The first pattern works correctly: a|(ab) matches both 'a' and 'ab'.
* The second pattern also works correctly:
(a|(ab)) | ( y(a|(ab)) )
matches 'a', 'ab', 'ya', 'yab'.
* Trouble starts with the third pattern, when a subroutine is used for a|(ab): now the string 'yab' doesn't match (but should).
* Trouble remains when using a named subroutine.
* All's fixed if the a|(ab) is changed to (ab?).
Seems like a lookahead problem in the context of subroutines, with maybe caching in the mix...
PCRE version 7.0 18-Dec-2006
/^(a|(ab))$/
a
0: a
1: a
ab
0: ab
1: ab
2: ab
/^((a|(ab))|(y(a|(ab))))$/
a
0: a
1: a
2: a
ab
0: ab
1: ab
2: ab
3: ab
ya
0: ya
1: ya
2: <unset>
3: <unset>
4: ya
5: a
yab
0: yab
1: yab
2: <unset>
3: <unset>
4: yab
5: ab
6: ab
/^((a|(ab))|(y(?1)))$/
a
0: a
1: a
2: a
ab
0: ab
1: ab
2: ab
3: ab
ya
0: ya
1: ya
2: <unset>
3: <unset>
4: ya
yab
No match
/^((?P<PAT>(a|(ab)))|(y(?P>PAT)))$/
a
0: a
1: a
2: a
3: a
ab
0: ab
1: ab
2: ab
3: ab
4: ab
ya
0: ya
1: ya
2: <unset>
3: <unset>
4: <unset>
5: ya
yab
No match
/^((ab?)|(y(?1)))$/
a
0: a
1: a
2: a
ab
0: ab
1: ab
2: ab
ya
0: ya
1: ya
2: <unset>
3: ya
yab
0: yab
1: yab
2: <unset>
3: yab
Logged In: NO
Here is a quote from the pcrepattern man page: "Like recursive subpatterns, a "subroutine" call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent matching failure." That is why #3 fails: having matched "a" it cannot be re-entered to try for "ab". (Comment by PH)