[Phpwebsite-developers] WG: mostly fixed the code-tag... (long)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

This is a (text-only) retry. The version on the list got pretty messed =
up.
Sorry for that!

________________________________

Von: Ren=E9 C. Kiesler [mailto:re...@ki...]=20
Gesendet: Freitag, 14. Oktober 2005 23:27
An: 'php...@li...'
Betreff: mostly fixed the code-tag... (long)
Vertraulichkeit: Pers=F6nlich

...but at what price? I'm not sure and like to call for a discussion =
about
it.
=20
feed for example the following as a message-body to phpwsbb:
=20
[code]  function parseInput($text, $allowedTags=3DNULL){
    $text =3D PHPWS_Text::stripSlashQuotes($text);
=20
=20
if(preg_match("/src=3D([\"']{0,1}).*(?<=3D[=3D\"']\?|index.php|module=3D)=
.*([\"']{0,
1})/Ui", $text) ||
       preg_match("/onload=3D/i", $text))
      $text =3D preg_replace("/<img.+>/Uei", "", $text);
=20
    if ($allowedTags =3D=3D "none")
      $allowedTagString =3D NULL;
    elseif (is_array($allowedTags))
      $allowedTagString =3D implode("", $allowedTags);
    elseif (is_string($allowedTags))
      $allowedTagString =3D $allowedTags;
    else {
      $allowedTagString =3D $GLOBALS["core"]->text->allowed_tags;
      /* If the user is allowed to use an extended set of tags, add them =
in
*/
      if ($_SESSION['OBJ_user']->allow_access('users', 'extendedTags')) =
{
        $allowedTagString .=3D =
$GLOBALS['core']->text->allowed_extra_tags;
        /*  Process all javascripted code to ensure comparison syntax
doesn't get stripped */
        $text =3D preg_replace("/(<script.*>)(.*)(<\/script>)/iseU", =
"'\\1' .
str_replace('\n', '', PHPWS_Text::utfEncode('\\2')) . '\\3'", $text);
      }
    }
=20
    $text =3D preg_replace("/(\[code\])(.*)(\[\/code\])/seU", "'\\1' .
str_replace('\n', '', PHPWS_Text::utfEncode('\\2')) . '\\3'", $text);
    $text =3D str_replace("'", "&#39;", $text);
=20
    /* Deities don't get any tags stripped from their text */
    if ($_SESSION['OBJ_user']->isDeity())
      return $text;
=20
    return strip_tags($text, $allowedTagString);
  }
[/code]
=20
this should be pretty much the parseInput function after applying the =
super
hack.
=20
Note, that the code-tag should preserve everything literally. All the =
code,
all the line feeds, all the tags. Everything.
=20
I've created a test-thread with about the same content at
http://www.kiesler.at/phpwsbb~PHPWSBB_MAN_OP~view~PHPWS_MAN_ITEMS~508~pag=
e~l
ast.html
=20
I've enabled anonymous posts, feel free to play around with it.
=20
=20
=20
Here's the result:
=20
   function parseInput($text, $allowedTags=3DNULL){     $text =3D
PHPWS_Text::stripSlashQuotes($text);
if(preg_match("/src=3D([\"']{0,1}).*(?<=3D[=3D\"']\?|index.php|module=3D)=
.*([\"']{0,
1})/Ui", $text) ||        preg_match("/onload=3D/i", $text))       $text =
=3D
preg_replace("//Uei", "", $text);     if ($allowedTags =3D=3D "none")
$allowedTagString =3D NULL;     elseif (is_array($allowedTags))
$allowedTagString =3D implode("", $allowedTags);     elseif
(is_string($allowedTags))       $allowedTagString =3D $allowedTags;     =
else {
$allowedTagString =3D $GLOBALS["core"]->text->allowed_tags;       /* If =
the
user is allowed to use an extended set of tags, add them in */       if
($_SESSION['OBJ_user']->allow_access('users', 'extendedTags')) {
$allowedTagString .=3D $GLOBALS['core']->text->allowed_extra_tags;       =
  /*
Process all javascripted code to ensure comparison syntax doesn't get
stripped */         $text =3D
preg_replace("/(NOSCRIPT.*>)(.*)(<\/script>)/iseU", "'' . =
str_replace('\n',
'', PHPWS_Text::utfEncode('')) . ''", $text);       }     }     $text =
=3D
preg_replace("/(\[code\])(.*)(\[\/code\])/seU", "'' . str_replace('\n', =
'',
PHPWS_Text::utfEncode('')) . ''", $text);     $text =3D str_replace("'", =
"'",
$text);     /* Deities don't get any tags stripped from their text */    =
 if
($_SESSION['OBJ_user']->isDeity())       return $text;     return
strip_tags($text, $allowedTagString);   }=20
=20
=20
a beauty, isn't it?
=20
Note, that:
=20
- all the linefeeds are gone
- /(<script.*> got replaced by /(NOSCRIPT.*>
- \\1, \\2, etc. get removed alltogether -- without any replacement
- and probably a few other things, as I've already fixed the =
output-function
of the code-tag here.
=20
all of that happens in parseInput() of Text.php and in cleanArray() of
security.php.
=20
=20
=20
=20
=20
What I did to fix it (somewhat):
=20
in parseInput():
=20
- removed the line "    $text =3D
preg_replace("/(\[code\])(.*)(\[\/code\])/seU", "'\\1' . =
str_replace('\n',
'', PHPWS_Text::utfEncode('\\2')) . '\\3'", $text);" to preserve =
linespaces
- removed "stripSlashQuotes"
=20
=20
in cleanArray():
=20
threw out all the preg_replaces and replaced them with a
htmlspecialchars_uni() that looks like this:
=20
function htmlspecialchars_uni($text) {
        $text=3Dpreg_replace('/&(?!#[0-9]+;)/si', '&amp;', $text);
        return(str_replace(array('<', '>', '"'), array('&lt;', '&gt;',
'&quot;'), $text));
}

that way, everything evil should be encoded but preserved. And do no =
harm,
right?
=20
I've also disabled the BBCode [code][/code] as it does nothing more but =
put
"<code></code>" around the text. This explains the formatting.
=20
Instead, I've used this in parseOutput(), which I've ported over from =
phpBB:
=20
=20
  function parseOutput($text, $printTags=3DFALSE){
    require_once("HTML/BBCodeParser.php");
=20
=20
=20

        /* this algorithm has been brutally ripped out of phpBB */
=20
        $match_count=3Dpreg_match_all("#\[code\](.*?)\[/code\]#si", =
$text,
$matches);
=20
        $code_start_html=3D"<div class=3D'code_tag'>code:".
                                "<div>\n";
        $code_end_html=3D"</div></div>\n";
=20
        for($i=3D0; $i<$match_count; $i++) {
                $before_replace=3D$after_replace=3D$matches[1][$i];
=20
                $after_replace=3Dstr_replace("  ", "&nbsp; ", =
$after_replace);
                $after_replace=3Dstr_replace("  ", " &nbsp;", =
$after_replace);
=20
                $after_replace=3Dstr_replace("<", "&lt;", =
$after_replace);
                $after_replace=3Dstr_replace(">", "&gt;", =
$after_replace);
=20
                $after_replace=3Dstr_replace("\t", "&nbsp; &nbsp;",
$after_replace);
=20
                $after_replace=3Dstr_replace("\n", "<br />\n",
$after_replace);
=20
                $after_replace=3Dpreg_replace("/^ {1}/m", '&nbsp;',
$after_replace);
                $str_to_match=3D"[code]".$before_replace."[/code]";
=20

                =
$replacement=3D$code_start_html.$after_replace.$code_end_html;
=20
                $text=3Dstr_replace($str_to_match, $replacement, $text);
        }
=20
        $text=3Dstr_replace("[code]", $code_start_html, $text);
        $text=3Dstr_replace("[/code]", $code_end_html, $text);
=20
=20
=20
    // Set up BBCodeParser
=20
        /* WARNING !!
=20
           you need to disable the code-tag in BBCodeParser.ini. Not =
doing
so
           would result in a second iteration of code-tag-parsing. =
code-tags
           within code-tags would be parsed which could result in =
uglyness.
        */
=20

    $config =3D parse_ini_file(PHPWS_SOURCE_DIR . =
"/conf/BBCodeParser.ini",
true);
    $options =3D &PEAR::getStaticProperty("HTML_BBCodeParser", =
"_options");
=20
=20
=20
=20
well, all of that fixes the display of the code-tag to a great extent. =
The
\\1, etc. are still eaten up though, maybe the template-classes are =
guilty
of that? They can be seen in the database. Also, the &#39; isn't =
preserved
literally but converted to a ' which is also bad. But it's a lot better =
now,
never the less.
=20
Did I open any security holes that haven't been there before? Any ideas =
how
to fix them? I'd really like to have a working code-tag but don't want =
to be
hacked again.
=20
vBulletin, InVision PowerBoard and phpBB can do it, so should we.
=20
=20
regards,
=20
Ren=E9!