#2651 Abondon use of preg_replace with /e modifier

Produces PHP errors
open-accepted
nobody
5
2010-05-13
2009-07-08
BenBE
No

When running PHP with the Suhosin patch[1] there is a security function that allows you to disable the use of the /e modifier (eval replacement) to protect against Code Injection attacks that may result from improperly escaped replacement sequences if the pattern allows for matching of quotes while not properly escaping them.

An extreme example would be an preg_replace like the following:
$evald = preg_replace('/(.*)/e', '\\1', $code);
While nobody would actually write this this is an oversimplified version of
$evald = preg_replace('/^(.*)\./e', 'echo \'I matched \\1.\';', $code);
which could be exploited by something like $code="'.file_get_contents('/etc/passwd').'

Therefore, for security reasons and to make SquirrelMail work with Suhosin patch the preg_replace_callback function should be used instead which improves security by circumventing such escaping problems and at the same time improving readability of your code.

Please note that policy violations of Suhosin policies produce Fatal errors if they are enforced and thus lead to instantanious script termination.

Simular issues have been fixed in WordPress recently. Have a look there[2] as the bug report there also includes an easy way to look for LOC that use /e in their expressions. Note that the expression mentioned there doesn't find all occurances though.

An example (although not the most optimized one) for e.g. functions/decode/utf_8.php might introduce 3 callback functions (can be done if one function too)

---
//Callback functions necessary for preg_replace_callback
function charset_decode_utf_8_4byte($match) {
// decode four byte unicode characters
return '&#'.((ord($match[1])-240)*262144+(ord($match[2])-128)*4096+(ord($match[3])-128)*64+(ord($match[4])-128)).';';
}

function charset_decode_utf_8_3byte($match) {
// decode three byte unicode characters
return '&#'.((ord($match[1])-224)*4096+(ord($match[2])-128)*64+(ord($match[3])-128)).';';
}

function charset_decode_utf_8_2byte($match) {
// decode two byte unicode characters
return '&#'.((ord($match[1])-192)*64+(ord($match[2])-128)).';';
}
---

and updating the preg_replace lines in the decoding function like this:

---
// decode four byte unicode characters
$string = preg_replace_callback("/([\360-\367])([\200-\277])([\200-\277])([\200-\277])/",
'charset_decode_utf_8_4byte', $string);

// decode three byte unicode characters
$string = preg_replace_callback("/([\340-\357])([\200-\277])([\200-\277])/",
'charset_decode_utf_8_3byte', $string);

// decode two byte unicode characters
$string = preg_replace_callback("/([\300-\337])([\200-\277])/",
'charset_decode_utf_8_2byte', $string);
---

For further information have a look at the documentation of preg_replace_callback[3] in the PHP manual

[1] http://www.hardened-php.net/suhosin/
[2] http://www.german-proxy.de/browse.php/Oi8vY29y/ZS50cmFj/LndvcmRw/cmVzcy5v/cmcvdGlj/a2V0Lzg2/ODk_3D/b29/
[3] http://php.net/preg_replace_callback

Discussion

  • BenBE
    BenBE
    2009-10-03

    A small patch especially needed to get the UTF8 encoding/decoding working with Suhosin and disabled /e modifier:

    (Description of the patch in German, for questions feel free to contact me):
    http://blog.benny-baumann.de/?p=332

    A simular issue (including a description how to detect such uses more or less automated can be found at http://core.trac.wordpress.org/ticket/8689

     
  • BenBE
    BenBE
    2010-05-13

    Please include the following patches (in addition to the one already mentioned in the description) as described in http://blog.benny-baumann.de/?p=634

    Please note that those patches use ranges covering \240 despite the note regarding RH73 - if they don't manage to write proper code, they should fix it.

    In case of questions simply contact me or comment here.

     
  • BenBE
    BenBE
    2010-05-13

    • labels: --> Message Display
    • status: open --> open-accepted