Menu

#188 Regression in 2.20, "prune"-tags do not get removed anymore

v2.21
closed-fixed
nobody
None
5
2017-05-11
2017-05-09
No

Having a String like:

"<p>alert using script: <scr<script>ipt>alert(\"Hello\");</scr<script>ipt></p>\n"

cleaned with HtmlCleaner up to version 2.19, you get:

"<p>alert using script:<scr></scr></p>"

The <script> tag is correctly sanitized for XSS prevention.</p> <p>With version 2.20, I now get:</p> <div class="codehilite"><pre><span></span><code>&quot;<span class="nt">&lt;p&gt;</span>alert<span class="w"> </span>using<span class="w"> </span>script:<span class="nt">&lt;scr</span><span class="err">&lt;script</span><span class="nt">&gt;</span>ipt<span class="ni">&amp;#62;</span>alert(<span class="ni">&amp;#34;</span>Hello<span class="ni">&amp;#34;</span>);<span class="err">&lt;</span>/scr<span class="nt">&lt;script&gt;</span>ipt<span class="ni">&amp;#62;</span><span class="nt">&lt;/p&gt;</span>&quot; </code></pre></div> <p>from what I think that there is potential for XSS attacks when using such a sanitized string in a webpage.</p> <p>We are using the following config:</p> <div class="codehilite"><pre><span></span><code><span class="w"> </span><span class="nx">public</span><span class="w"> </span><span class="nx">static</span><span class="w"> </span><span class="nx">CleanerProperties</span><span class="w"> </span><span class="nx">createTypicalCleanerProperties</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">CleanerProperties</span><span class="w"> </span><span class="nx">props</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">new</span><span class="w"> </span><span class="nx">CleanerProperties</span><span class="p">();</span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setOmitHtmlEnvelope</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setOmitXmlDeclaration</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setUseEmptyElementTags</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setAdvancedXmlEscape</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setRecognizeUnicodeChars</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setTransResCharsToNCR</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setTranslateSpecialEntities</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setTransSpecialEntitiesToNCR</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setOmitDoctypeDeclaration</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setUseCdataForScriptAndStyle</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setOmitUnknownTags</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setTreatUnknownTagsAsContent</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setOmitDeprecatedTags</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setTreatDeprecatedTagsAsContent</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setOmitComments</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setAllowMultiWordAttributes</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setAllowHtmlInsideAttributes</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setIgnoreQuestAndExclam</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setNamespacesAware</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setHyphenReplacementInComment</span><span class="p">(</span><span class="s">&quot;=&quot;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Comma-separated list of tags that will be completely removed (with all nested elements) from XML tree after parsing. </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setPruneTags</span><span class="p">(</span><span class="s">&quot;script,object,embed,applet,link,style,form,input,iframe,frame&quot;</span><span class="p">);</span><span class="w"> </span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setBooleanAttributeValues</span><span class="p">(</span><span class="nx">CleanerProperties</span><span class="p">.</span><span class="nx">BOOL_ATT_SELF</span><span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setAddNewlineToHeadAndBody</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span> <span class="w"> </span><span class="nx">CleanerTransformations</span><span class="w"> </span><span class="nx">trans</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">new</span><span class="w"> </span><span class="nx">CleanerTransformations</span><span class="p">();</span> <span class="w"> </span><span class="nx">trans</span><span class="p">.</span><span class="nx">addTransformation</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">TagTransformation</span><span class="p">(</span><span class="s">&quot;i&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;em&quot;</span><span class="p">));</span><span class="w"> </span> <span class="w"> </span><span class="nx">trans</span><span class="p">.</span><span class="nx">addTransformation</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">TagTransformation</span><span class="p">(</span><span class="s">&quot;b&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;strong&quot;</span><span class="p">));</span> <span class="w"> </span><span class="nx">props</span><span class="p">.</span><span class="nx">setCleanerTransformations</span><span class="p">(</span><span class="nx">trans</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">props</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </code></pre></div> <p>Seems that prune tags behaves different now.</p></script>

Discussion

  • Scott Wilson

    Scott Wilson - 2017-05-09

    Thanks for the report Markus. The issue was caused by the HTML WHATWG spec allowing angle brackets within attribute names, and HC using the same rules for determining valid element names. I've fixed this now. and will make a new release shortly.

     
  • Scott Wilson

    Scott Wilson - 2017-05-11
    • status: open --> closed-fixed
    • Group: v2.20 --> v2.21
     
  • Scott Wilson

    Scott Wilson - 2017-05-11

    Fixed in 2.21

     

Log in to post a comment.

MongoDB Logo MongoDB