#30 Subroutine bug with certain patterns.

open
nobody
None
7
2007-01-29
2007-01-29
Apostolos Lerios
No

Below is the pcretest output (7.0 version):

* The first pattern works correctly: a|(ab) matches both 'a' and 'ab'.

* The second pattern also works correctly:
(a|(ab)) | ( y(a|(ab)) )
matches 'a', 'ab', 'ya', 'yab'.

* Trouble starts with the third pattern, when a subroutine is used for a|(ab): now the string 'yab' doesn't match (but should).

* Trouble remains when using a named subroutine.

* All's fixed if the a|(ab) is changed to (ab?).

Seems like a lookahead problem in the context of subroutines, with maybe caching in the mix...

PCRE version 7.0 18-Dec-2006

/^(a|(ab))$/
a
0: a
1: a
ab
0: ab
1: ab
2: ab

/^((a|(ab))|(y(a|(ab))))$/
a
0: a
1: a
2: a
ab
0: ab
1: ab
2: ab
3: ab
ya
0: ya
1: ya
2: <unset>
3: <unset>
4: ya
5: a
yab
0: yab
1: yab
2: <unset>
3: <unset>
4: yab
5: ab
6: ab

/^((a|(ab))|(y(?1)))$/
a
0: a
1: a
2: a
ab
0: ab
1: ab
2: ab
3: ab
ya
0: ya
1: ya
2: <unset>
3: <unset>
4: ya
yab
No match

/^((?P<PAT>(a|(ab)))|(y(?P>PAT)))$/
a
0: a
1: a
2: a
3: a
ab
0: ab
1: ab
2: ab
3: ab
4: ab
ya
0: ya
1: ya
2: <unset>
3: <unset>
4: <unset>
5: ya
yab
No match

/^((ab?)|(y(?1)))$/
a
0: a
1: a
2: a
ab
0: ab
1: ab
2: ab
ya
0: ya
1: ya
2: <unset>
3: ya
yab
0: yab
1: yab
2: <unset>
3: yab

Discussion

    • priority: 5 --> 7
     
  • Logged In: NO

    Here is a quote from the pcrepattern man page: "Like recursive subpatterns, a "subroutine" call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent matching failure." That is why #3 fails: having matched "a" it cannot be re-entered to try for "ab". (Comment by PH)