From: Paul L. <pa...@sq...> - 2021-05-27 01:32:08
|
On Wed, May 26, 2021 11:50 pm, Alexey Shpakovsky via squirrelmail-devel wrote: > Hi SquirrelMail devs, > > Today investigating a decoding failure on one of spam messages I found an > unexpected behaviour of htmlspecialchars function which affects > SquirrelMail. I didn't read your mail in detail, because I believe this issue is solved by the patch you can find here: https://sourceforge.net/p/squirrelmail/bugs/2806/?page=3 Patch for 1.4.x: quoted_printable_fix-1.4.x-version_3.diff If not, please make sure you're using the newest 1.4.23-svn snapshot with this patch. Cheers, -- Paul Lesniewski SquirrelMail Team Please support Open Source Software by donating to SquirrelMail! http://squirrelmail.org/donate_paul_lesniewski.php > Consider this sample code (you can paste it into w3schools > php tryit editor if don't want to run it on your machine): > > <?php > $str=base64_decode('1OXk5fDg6/zt4P8g8evz5uHgIO/uIPLw8+TzIOgg5+Dt//Lu8fLo'); > echo 'before: ' . bin2hex($str) . '<br>'; > $esc=htmlspecialchars($str, ENT_COMPAT | ENT_SUBSTITUTE, 'utf-8'); > echo 'after: ' . bin2hex($esc) . '<br>'; > ?> > > The "before" and "after" lines show hex representations of the same string > (each byte of the string is encoded with two characters) before and after > it get processed by the htmlspecialchars function. And they are vastly > different: > > before: d4e5e4e5f0e0ebfc... > after: efbfbdefbfbdefbfbd... > > The base64-decoded string is a valid cp1251 string, you can see some > Cyrillic letters by adding this line to the code above: > > echo 'actual: ' . iconv('cp1251', 'utf-8', $str) . '<br>'; > > No HTML special characters present in that string. However, because of > ENT_SUBSTITUTE flag and the last 'utf-8' argument, htmlspecialchars > function replaces all byte sequences which are invalid in utf-8 with a > Unicode Replacement Character. And a valid cp1251 string is full of them! > > To fix it, in the code above one should specify 'cp1251' instead of > 'utf-8' as the last argument to htmlspecialchars function. > > > > How this affects SquirrelMail? > > > > In version 1.4, function charset_decode calls sm_encode_html_special_chars > without passing any character encoding to it: > > https://sourceforge.net/p/squirrelmail/code/HEAD/tree/branches/SM-1_4-STABLE/squirrelmail/functions/i18n.php#l187 > > what makes sm_encode_html_special_chars function default to > $default_encoding which is, I believe, utf-8 in most cases, and later gets > passed to htmlspecialchars: > > https://sourceforge.net/p/squirrelmail/code/HEAD/tree/branches/SM-1_4-STABLE/squirrelmail/functions/strings.php#l1559 > > And charset_decode is called by decodeHeader when it encounters a > base64-encoded header: > > https://sourceforge.net/p/squirrelmail/code/HEAD/tree/branches/SM-1_4-STABLE/squirrelmail/functions/mime.php#l727 > > Example of the affected header (spammer's email redacted): > > From: > "=?windows-1251?B?1OXk5fDg6/zt4P8g8evz5uHgIO/uIPLw8+TzIOgg5+Dt//Lu8fLo?=" > <sp...@ex...> > > > > I believe this can be fixed by function charset_decode passing some > encoding argument to sm_encode_html_special_chars - either some hardcoded > "ISO-8859-1" which is likely to have all characters allowed, or actual > charset passed to charset_decode by its caller - but note that it looks > like htmlspecialchars supports less encodings than charset_decode does, > hence probably first approach is better. > > > Thanks for reading so far! > > Any thoughts? > > Alexey. |