Hi,
When the CompactHtmlSerializer does its job, it attempts to handle pre tags and spaces in a smart way. However, in real HTML found on the dirty web, the way to interpret spaces can also be governed by the “whitespace” CSS property (in particular “whitespace: pre”). In such cases, the CompactHtmlSerializer breaks the HTML by making the hopeful but wrong assumption that it's self-contained.
The issue is at least present from 2.9 to 2.21 (I haven't checked earlier versions) and I'm pretty sure it's the result to the following code:
87 if (childrenIt.hasNext()) {
88 if ( !Utils.isWhitespaceString(childrenIt.next()) ) {
89 writer.write("\n"); // <- oops, significant in a "whitespace: pre" context
90 }
91 childrenIt.previous();
92 }
The SimpleHtmlSerializer does not exhibit this issue.
Thanks,
Thanks Arkanosis,
Its definitely a problem if we're introducing characters into the output that are going to break the content. I'll try a few test cases with that line removed and see what it does to the output.
Commenting out that statement doesn't seem to cause any issues, so I've committed the change to trunk.