Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#28 Bug in forward references in backreferences

open
nobody
None
5
2006-11-27
2006-11-27
Msmo
No

Forward referencing a backreference (either by name or number) shows inconsistent results on iteration. Here's a PHP example:

-------------------------
1) Forward referencing with + iteration

preg_match( '/(?:\2?(a)(b))+/', 'babbab', $match); print_r($match);

Array
(
[0] => abbab
[1] => a
[2] => b
)

-------------------------
2) Forward referencing with * iteration (Changed + to *)

preg_match( '/(?:\2?(a)(b))*/', 'babbab', $match); print_r($match);

Array
(
[0] =>
)

-------------------------
3) Forward referencing with * iteration acts like it was anchored (Removing the "b" from start of subject)

preg_match( '/(?:\2?(a)(b))*/', 'abbab', $match); print_r($match);

Array
(
[0] => abbab
[1] => a
[2] => b
)

Forward references like these seem to be allowed for (?P=name) and \N (where N<=9), showing an error on the second example. However, with N >= 10 it's interpreted as octal unless the Nth group has been opened before referencing.

-------------------------
4) Using \10 before opening the 10th subgroup.

preg_match( '/(?:\10?(a)()()()()()()()()(b))+/', 'abbab', $match); print_r($match);

Array
(
[0] => ab
[1] => a
[2] =>
.
.
[9] =>
[10] => b
)

-------------------------
5) Using \10 before opening the 10th subgroup (Matches the octal).

preg_match( '/(?:\10?(a)()()()()()()()()(b))+/', "ab\10ab", $match); print_r($match);

Array
(
[0] => ab°ab ° == chr(8)
[1] => a
.
.
[10] => b
)

-------------------------
6) Using \10 after opening the 10th subgroup (Forward references).

preg_match( '/(?:()()()()()()()()()(\10?a)(b))+/', 'aaabaabaaab', $match); print_r($match);

Array
(
[0] => abaabaaab
.
.
[10] => aaa
[11] => b
)

Discussion

  • Msmo
    Msmo
    2006-11-28

    Logged In: YES
    user_id=1655128
    Originator: YES

    ***I forgot the last example:

    7) \10 with * iteration (the one showing the error):

    preg_match( '/(?:()()()()()()()()()(\10?a)(b))*/', 'aaabaabaaab', $match); print_r($match);

    Array
    (
    [0] =>
    )

     
  • Logged In: NO

    I have only taken a quick look at #1 and #2. There is no error. PCRE, like Perl, finds the first leftmost match. In #2 it can match right at the start, by matching zero times, so it does. In #1 it has to match at least one character, so it can't match at the start, but has to move on to the second character. This is nothing to do with references. (Comment by PH)

     
  • Logged In: YES
    user_id=669310
    Originator: NO

    Further follow-ups to this bug should be, and has been, directed to the
    "official" PCRE bugtracker at http://bugs.exim.org/537