BUG using the pattern "\s"
PERL 5 regular expression pattern matching
Brought to you by:
mish_the_fish
Tested with PCRE 6.7 -> 4.5
when replacing an utf-8 string containing the character 'à' (hex: c3a0)
With the function preg_replace, and the pattern '\s', it changes the
second byte of this character.
Using the pattern ' \t\f\r\n' which is supposed to be the same as \s it
works perfectly.
I have tried with other utf-8 characters and it seems to work.
Reproduce code:
---------------
<?
$text = utf8_encode("this is a test àt");
echo bin2hex($text)."\r\n";
$text1 = preg_replace("'([\t\f\r\n])+'", " ", $text);
echo bin2hex($text1)."\r\n";
echo $text1."\r\n";;
$text2 = preg_replace("'([\s])+'", " ", $text);
echo bin2hex($text2)."\r\n";
echo $text2;
?>
Logged In: NO
This cannot be a PCRE bug because PCRE does not provide replacement facilities; they are left to the calling application.