Forward referencing a backreference (either by name or number) shows inconsistent results on iteration. Here's a PHP example:
-------------------------
1) Forward referencing with + iteration
preg_match( '/(?:\2?(a)(b))+/', 'babbab', $match); print_r($match);
Array
(
[0] => abbab
[1] => a
[2] => b
)
-------------------------
2) Forward referencing with * iteration (Changed + to *)
preg_match( '/(?:\2?(a)(b))*/', 'babbab', $match); print_r($match);
Array
(
[0] =>
)
-------------------------
3) Forward referencing with * iteration acts like it was anchored (Removing the "b" from start of subject)
preg_match( '/(?:\2?(a)(b))*/', 'abbab', $match); print_r($match);
Array
(
[0] => abbab
[1] => a
[2] => b
)
Forward references like these seem to be allowed for (?P=name) and \N (where N<=9), showing an error on the second example. However, with N >= 10 it's interpreted as octal unless the Nth group has been opened before referencing.
-------------------------
4) Using \10 before opening the 10th subgroup.
preg_match( '/(?:\10?(a)()()()()()()()()(b))+/', 'abbab', $match); print_r($match);
Array
(
[0] => ab
[1] => a
[2] =>
.
.
[9] =>
[10] => b
)
-------------------------
5) Using \10 before opening the 10th subgroup (Matches the octal).
preg_match( '/(?:\10?(a)()()()()()()()()(b))+/', "ab\10ab", $match); print_r($match);
Array
(
[0] => ab°ab ° == chr(8)
[1] => a
.
.
[10] => b
)
-------------------------
6) Using \10 after opening the 10th subgroup (Forward references).
preg_match( '/(?:()()()()()()()()()(\10?a)(b))+/', 'aaabaabaaab', $match); print_r($match);
Array
(
[0] => abaabaaab
.
.
[10] => aaa
[11] => b
)
Logged In: YES
user_id=1655128
Originator: YES
***I forgot the last example:
7) \10 with * iteration (the one showing the error):
preg_match( '/(?:()()()()()()()()()(\10?a)(b))*/', 'aaabaabaaab', $match); print_r($match);
Array
(
[0] =>
)
Logged In: NO
I have only taken a quick look at #1 and #2. There is no error. PCRE, like Perl, finds the first leftmost match. In #2 it can match right at the start, by matching zero times, so it does. In #1 it has to match at least one character, so it can't match at the start, but has to move on to the second character. This is nothing to do with references. (Comment by PH)
Logged In: YES
user_id=669310
Originator: NO
Further follow-ups to this bug should be, and has been, directed to the
"official" PCRE bugtracker at http://bugs.exim.org/537