Hi there (Hi Scott! :))
I'm trying to eliminate the need for a custom DomSerializer in the XWiki project, see https://github.com/xwiki/xwiki-commons/blob/master/xwiki-commons-core/xwiki-commons-xml/src/main/java/org/xwiki/xml/internal/html/XWikiDOMSerializer.java
When I try to use the DomSerializer from HTML Cleaner, I have the following errors when running the XWiki test suite for CDATA.
Input:
<script type="text/javascript"><![CDATA[ alert("Hello World") ]]></script>
Expected:
<script type="text/javascript">//<![CDATA[ alert("Hello World") //]]></script>
Actual:
<html><head></head><body><script type="text/javascript"><![CDATA[ alert("Hello World") ]]></script></body></html>
Rationale: Generate a javascript comment in front on the CDATA block so that it works in IE and when serving XHTML under a mimetype of HTML.
Some other test related to this use case:
Input:
<script type="text/javascript">//<![CDATA[ alert("Hello World") //]]></script>
Expected:
<script type="text/javascript">//<![CDATA[ alert("Hello World") //]]></script>
Rationale: Verify that // are kept.
Input:
<script type="text/javascript">/*<![CDATA[*/ alert("Hello World") /*]]>*/</script>
Expected:
<script type="text/javascript">//<![CDATA[ alert("Hello World") //]]></script>
Rationale: normalize the JS comment
Input:
<script type="text/javascript"> /*<![CDATA[*/ function escapeForXML(origtext) { return origtext.replace(/\\&/g,'&'+'amp;').replace(/</g,'&'+'lt;') .replace(/>/g,'&'+'gt;').replace(/\'/g,'&'+'apos;') .replace(/\"/g,'&'+'quot;');} /*]]>*/ </script>
Expected:
<script type="text/javascript"> //<![CDATA[ function escapeForXML(origtext) { return origtext.replace(/\\&/g,'&'+'amp;').replace(/</g,'&'+'lt;') .replace(/>/g,'&'+'gt;').replace(/\'/g,'&'+'apos;') .replace(/\"/g,'&'+'quot;');} //]]> </script>
Rationale: Same as above on a more complex use case.
Input:
"<script><></script>
Expected:
<script>//<![CDATA[<>//]]></script>
Actual:
<script><![CDATA[<>]]></script>
Rationale: Same as above, generate JS comments
Input:
<script><></script>
Expected:
<script>//<![CDATA[<>//]]></script>
Actual:
<script><![CDATA[<>]]></script>
Input:
<style type="text/css"><![CDATA[ a { color: red; } ]]></style>
Expected:
<style type="text/css">//<![CDATA[ a { color: red; } //]]></style>
Actual:
<style type="text/css"><![CDATA[ a { color: red; } ]]></style>
It would be so awesome if HTML Cleaner could support those use cases :)
WDYT? Would you agree to support those use cases?
Note that the implementation is in https://github.com/xwiki/xwiki-commons/blob/master/xwiki-commons-core/xwiki-commons-xml/src/main/java/org/xwiki/xml/internal/html/XWikiDOMSerializer.java and you could copy paste that code if you were to agree to support it. However you'd probably want to make it conditional to setting some configuration options.
Thanks a lot
No problem, I've added another constructor to DomSerializer with a "deserializeCdataEntities" option. If set to true it will run the content of the Cdata section through the entity deserialization code. If you want to change how it does that, you can override the deserializeCdataEntities(String) method. I haven't given it much testing yet, but give it a go.
For the commenting style, I can probably make it configurable; to be safe I'd need to inject a newline after each CData token just to ensure nothing odd happens to the script (which I think your version did too, hence the extra whitespace...)
This is awesome, thanks! I've tried it and it works just fine ;)
So the only difference left are thed comment style (and ofc the ensuing newlines that needs to be put after the CDATA opening and before the CDATA closing).
We're getting there! :)
I really appreciate that you followed up on this and made efforts to accomodate the xwiki needs. We owe you some beers when we'll meet one day :) If you need help with XWiki let me know, anytime.
Thanks
-Vincent