Menu

#154 single quote character must not get serialized as ' by html serializers

v2.16
closed-fixed
nobody
None
5
2015-10-23
2015-10-08
No

Hi guys,

We've run into the issue that single quote character gets serialized as ' by the html serializers. Attached you'll find a patch and unit test for this. The unit test also documents the behavior for the XML serializer. The behavior of the XML serializer is unchanged by this patch.

Cheers, Oscar

1 Attachments

Discussion

  • Seanster

    Seanster - 2015-10-08

    I think the way your posting is rendered reverses your argument, but the title remains correct (at least in my browser).

    This will conflict with my patch on bug #118. My patch will also fix this but I single out the apostrophe specifically, just in case something changes in the special entities list later on. I do like that you made a test case for it.

    Here's the source code comment where the apostrophe was added to the special entities list:

        // this is xml only -- apos appearing in html needs to be converted to ' or maybe ' to be universally safe
    // may need to special case for html attributes that use ' as surrounding delimeter on attribute value [...snip]
    

    haha

     
  • Oscar Scholten

    Oscar Scholten - 2015-10-09

    Lol@the rendering of my post.

    It actually is a nice illustration of what I want to avoid. I'm 100% sure I entered & a m p ; in the first sentence but "some" process converted that into a ' (at least that is how it now shows up in the raw html of this page). I'd like to be able to use HtmlCleaner in such a way that it fixes any incorrect HTML, but leaves all characters "as-is".

    I've applied your patch, double checked my test case and all other tests we have that use HtmlCleaner and all is green :-) I'm ok with closing this issue as a duplicate of 118.

    About the JavaScript link mentioned in the code comment you mentioned: as all attributes are always serialized/normalized using double quotes, using single quotes inside of attribute values should be ok both in HTML as well as XML. To illustrate the following test case:

    @Test
    public void attributeSerialization() throws IOException{
        final String original =     "<p data-double-quote-attr=\"foo&quot;bar'baz\" data-single-quote-attr='foo\"bar&apos;baz'>text</p>";
        final String expectedHtml = "<p data-double-quote-attr=\"foo&quot;bar'baz\" data-single-quote-attr=\"foo&quot;bar'baz\">text</p>";
        final String expectedXml =  "<p data-double-quote-attr=\"foo&quot;bar&apos;baz\" data-single-quote-attr=\"foo&quot;bar&apos;baz\">text</p>";
    
        cleaner.getProperties().setOmitHtmlEnvelope(true);
        TagNode node = cleaner.clean(original);
        StringWriter writer = new StringWriter();
        serializer = new SimpleHtmlSerializer(cleaner.getProperties());
        serializer.write(node, writer, "UTF-8");
        assertEquals(expectedHtml, writer.toString());
    
        writer = new StringWriter();
        serializer = new SimpleXmlSerializer(cleaner.getProperties());
        serializer.write(node, writer, "UTF-8");
        assertEquals(expectedXml, writer.toString());
    }
    
     
  • Oscar Scholten

    Oscar Scholten - 2015-10-09

    s / & a m p ; / & a p o s ;
    sigh friday afternoon here ...

     
  • Scott Wilson

    Scott Wilson - 2015-10-23

    Hi Oscar,

    I've applied Seanster's patch for issue 118.

    The test case for XmlSerialiser is a good one - I've added it to the serialisation test cases but commented out for now until I've figured out if there any side effects of applying the rules to Xml serialisers as well as Html.

    S

     
  • Scott Wilson

    Scott Wilson - 2015-10-23
    • Group: v2.14 --> v2.16
     
  • Scott Wilson

    Scott Wilson - 2015-10-23
    • status: open --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB