From: <re...@ki...> - 2005-10-17 21:06:12
|
This is a (text-only) retry. The version on the list got pretty messed = up. Sorry for that! ________________________________ Von: Ren=E9 C. Kiesler [mailto:re...@ki...]=20 Gesendet: Freitag, 14. Oktober 2005 23:27 An: 'php...@li...' Betreff: mostly fixed the code-tag... (long) Vertraulichkeit: Pers=F6nlich ...but at what price? I'm not sure and like to call for a discussion = about it. =20 feed for example the following as a message-body to phpwsbb: =20 [code] function parseInput($text, $allowedTags=3DNULL){ $text =3D PHPWS_Text::stripSlashQuotes($text); =20 =20 if(preg_match("/src=3D([\"']{0,1}).*(?<=3D[=3D\"']\?|index.php|module=3D)= .*([\"']{0, 1})/Ui", $text) || preg_match("/onload=3D/i", $text)) $text =3D preg_replace("/<img.+>/Uei", "", $text); =20 if ($allowedTags =3D=3D "none") $allowedTagString =3D NULL; elseif (is_array($allowedTags)) $allowedTagString =3D implode("", $allowedTags); elseif (is_string($allowedTags)) $allowedTagString =3D $allowedTags; else { $allowedTagString =3D $GLOBALS["core"]->text->allowed_tags; /* If the user is allowed to use an extended set of tags, add them = in */ if ($_SESSION['OBJ_user']->allow_access('users', 'extendedTags')) = { $allowedTagString .=3D = $GLOBALS['core']->text->allowed_extra_tags; /* Process all javascripted code to ensure comparison syntax doesn't get stripped */ $text =3D preg_replace("/(<script.*>)(.*)(<\/script>)/iseU", = "'\\1' . str_replace('\n', '', PHPWS_Text::utfEncode('\\2')) . '\\3'", $text); } } =20 $text =3D preg_replace("/(\[code\])(.*)(\[\/code\])/seU", "'\\1' . str_replace('\n', '', PHPWS_Text::utfEncode('\\2')) . '\\3'", $text); $text =3D str_replace("'", "'", $text); =20 /* Deities don't get any tags stripped from their text */ if ($_SESSION['OBJ_user']->isDeity()) return $text; =20 return strip_tags($text, $allowedTagString); } [/code] =20 this should be pretty much the parseInput function after applying the = super hack. =20 Note, that the code-tag should preserve everything literally. All the = code, all the line feeds, all the tags. Everything. =20 I've created a test-thread with about the same content at http://www.kiesler.at/phpwsbb~PHPWSBB_MAN_OP~view~PHPWS_MAN_ITEMS~508~pag= e~l ast.html =20 I've enabled anonymous posts, feel free to play around with it. =20 =20 =20 Here's the result: =20 function parseInput($text, $allowedTags=3DNULL){ $text =3D PHPWS_Text::stripSlashQuotes($text); if(preg_match("/src=3D([\"']{0,1}).*(?<=3D[=3D\"']\?|index.php|module=3D)= .*([\"']{0, 1})/Ui", $text) || preg_match("/onload=3D/i", $text)) $text = =3D preg_replace("//Uei", "", $text); if ($allowedTags =3D=3D "none") $allowedTagString =3D NULL; elseif (is_array($allowedTags)) $allowedTagString =3D implode("", $allowedTags); elseif (is_string($allowedTags)) $allowedTagString =3D $allowedTags; = else { $allowedTagString =3D $GLOBALS["core"]->text->allowed_tags; /* If = the user is allowed to use an extended set of tags, add them in */ if ($_SESSION['OBJ_user']->allow_access('users', 'extendedTags')) { $allowedTagString .=3D $GLOBALS['core']->text->allowed_extra_tags; = /* Process all javascripted code to ensure comparison syntax doesn't get stripped */ $text =3D preg_replace("/(NOSCRIPT.*>)(.*)(<\/script>)/iseU", "'' . = str_replace('\n', '', PHPWS_Text::utfEncode('')) . ''", $text); } } $text = =3D preg_replace("/(\[code\])(.*)(\[\/code\])/seU", "'' . str_replace('\n', = '', PHPWS_Text::utfEncode('')) . ''", $text); $text =3D str_replace("'", = "'", $text); /* Deities don't get any tags stripped from their text */ = if ($_SESSION['OBJ_user']->isDeity()) return $text; return strip_tags($text, $allowedTagString); }=20 =20 =20 a beauty, isn't it? =20 Note, that: =20 - all the linefeeds are gone - /(<script.*> got replaced by /(NOSCRIPT.*> - \\1, \\2, etc. get removed alltogether -- without any replacement - and probably a few other things, as I've already fixed the = output-function of the code-tag here. =20 all of that happens in parseInput() of Text.php and in cleanArray() of security.php. =20 =20 =20 =20 =20 What I did to fix it (somewhat): =20 in parseInput(): =20 - removed the line " $text =3D preg_replace("/(\[code\])(.*)(\[\/code\])/seU", "'\\1' . = str_replace('\n', '', PHPWS_Text::utfEncode('\\2')) . '\\3'", $text);" to preserve = linespaces - removed "stripSlashQuotes" =20 =20 in cleanArray(): =20 threw out all the preg_replaces and replaced them with a htmlspecialchars_uni() that looks like this: =20 function htmlspecialchars_uni($text) { $text=3Dpreg_replace('/&(?!#[0-9]+;)/si', '&', $text); return(str_replace(array('<', '>', '"'), array('<', '>', '"'), $text)); } that way, everything evil should be encoded but preserved. And do no = harm, right? =20 I've also disabled the BBCode [code][/code] as it does nothing more but = put "<code></code>" around the text. This explains the formatting. =20 Instead, I've used this in parseOutput(), which I've ported over from = phpBB: =20 =20 function parseOutput($text, $printTags=3DFALSE){ require_once("HTML/BBCodeParser.php"); =20 =20 =20 /* this algorithm has been brutally ripped out of phpBB */ =20 $match_count=3Dpreg_match_all("#\[code\](.*?)\[/code\]#si", = $text, $matches); =20 $code_start_html=3D"<div class=3D'code_tag'>code:". "<div>\n"; $code_end_html=3D"</div></div>\n"; =20 for($i=3D0; $i<$match_count; $i++) { $before_replace=3D$after_replace=3D$matches[1][$i]; =20 $after_replace=3Dstr_replace(" ", " ", = $after_replace); $after_replace=3Dstr_replace(" ", " ", = $after_replace); =20 $after_replace=3Dstr_replace("<", "<", = $after_replace); $after_replace=3Dstr_replace(">", ">", = $after_replace); =20 $after_replace=3Dstr_replace("\t", " ", $after_replace); =20 $after_replace=3Dstr_replace("\n", "<br />\n", $after_replace); =20 $after_replace=3Dpreg_replace("/^ {1}/m", ' ', $after_replace); $str_to_match=3D"[code]".$before_replace."[/code]"; =20 = $replacement=3D$code_start_html.$after_replace.$code_end_html; =20 $text=3Dstr_replace($str_to_match, $replacement, $text); } =20 $text=3Dstr_replace("[code]", $code_start_html, $text); $text=3Dstr_replace("[/code]", $code_end_html, $text); =20 =20 =20 // Set up BBCodeParser =20 /* WARNING !! =20 you need to disable the code-tag in BBCodeParser.ini. Not = doing so would result in a second iteration of code-tag-parsing. = code-tags within code-tags would be parsed which could result in = uglyness. */ =20 $config =3D parse_ini_file(PHPWS_SOURCE_DIR . = "/conf/BBCodeParser.ini", true); $options =3D &PEAR::getStaticProperty("HTML_BBCodeParser", = "_options"); =20 =20 =20 =20 well, all of that fixes the display of the code-tag to a great extent. = The \\1, etc. are still eaten up though, maybe the template-classes are = guilty of that? They can be seen in the database. Also, the ' isn't = preserved literally but converted to a ' which is also bad. But it's a lot better = now, never the less. =20 Did I open any security holes that haven't been there before? Any ideas = how to fix them? I'd really like to have a working code-tag but don't want = to be hacked again. =20 vBulletin, InVision PowerBoard and phpBB can do it, so should we. =20 =20 regards, =20 Ren=E9! |