Menu

#108 DomSerializer fails to add subnodes when DocType is HTML5

v2.30
open
nobody
None
5
2023-06-19
2014-01-28
No

When using an input document that uses the HTML5 DocType ("html"), DomSerializer silently fails to add any subnodes to the result document. This would seem to be an issue with Java's DOMImplementation, however we should have a workaround in place for this.

Discussion

  • Scott Wilson

    Scott Wilson - 2014-03-18

    Punting to 2.9 for now - this needs to go in release notes

     
  • Scott Wilson

    Scott Wilson - 2014-03-18
    • Group: v 2.8 --> v 2.9
     
  • Tahseen Mohammad

    I just started investigating htmlcleaner to replace jtidy. The htmls I am cleaning are using html5 DocType '<!DOCTYPE html>'. Yet I seem to be able to use DomSerializer to convert them to Docment from TagNode. Am I missing something here? I don't want to switch right now if under some circumstances the DomSerializer fails.

     
  • Scott Wilson

    Scott Wilson - 2014-04-17

    Hi Tahseen,

    The issue is with the underlying Java DOM implementation rather than HtmlCleaner itself; I suspect this was fixed at a certain point in the JDK, JRE or default XML API implementation, but there doesn't seem to be an easy way to find out when apart from compiling and testing under different environments to see where the problem occurs. (I normally build and test under JDK6; next I'll upgrade to JDK7 and see if it still occurs.)

    As you now have it working in your current environment I suspect it won't fail unless you revert to an earlier JDK, JRE or XML implementation in your environment.

    There is a workaround, which is to add public and system identifiers as " " (a single space) rather than "" or null when creating the Document. I'd rather not code that into HC however if this is already solved for most modern environments.

    Hope this helps,

    S

     
  • Tahseen Mohammad

    Hi Scott,

    Yes, I noticed that and tried with JDK 6 but bug was probably fixed before 1.6.0_51. I assumed so, but given my lack of knowledge I wanted your input. Thanks for such a quick reply and ofcourse the library itself :).

     
  • Scott Wilson

    Scott Wilson - 2014-08-13
    • Group: v 2.9 --> v 2.10
     
  • Scott Wilson

    Scott Wilson - 2014-10-31
    • Group: v 2.10 --> v 2.11
     
  • Scott Wilson

    Scott Wilson - 2015-05-12
    • Group: v 2.11 --> v2.12
     
  • Scott Wilson

    Scott Wilson - 2015-05-15
    • Group: v2.12 --> v2.13
     
  • Scott Wilson

    Scott Wilson - 2015-07-01
    • Group: v2.13 --> v2.14
     
  • Scott Wilson

    Scott Wilson - 2015-08-24
    • Group: v2.14 --> v2.15
     
  • Scott Wilson

    Scott Wilson - 2015-10-01
    • Group: v2.15 --> v2.16
     
  • Scott Wilson

    Scott Wilson - 2015-11-20
    • Group: v2.16 --> v2.17
     
  • Scott Wilson

    Scott Wilson - 2016-10-19
    • Group: v2.17 --> v2.18
     
  • Scott Wilson

    Scott Wilson - 2017-02-06
    • Group: v2.18 --> v2.19
     
  • Scott Wilson

    Scott Wilson - 2017-02-07
    • Group: v2.19 --> v2.20
     
  • Scott Wilson

    Scott Wilson - 2017-05-02
    • Group: v2.20 --> v2.21
     
  • Scott Wilson

    Scott Wilson - 2017-05-11
    • Group: v2.21 --> v2.22
     
  • Scott Wilson

    Scott Wilson - 2018-04-24
    • Group: v2.22 --> v2.23
     
  • Scott Wilson

    Scott Wilson - 2019-09-04
    • Group: v2.23 --> v2.24
     
  • Scott Wilson

    Scott Wilson - 2020-04-29
    • Group: v2.24 --> v2.25
     
  • Scott Wilson

    Scott Wilson - 2021-09-24
    • Group: v2.25 --> v2.26
     
  • Scott Wilson

    Scott Wilson - 2023-04-29
    • Group: v2.26 --> v2.29
     
  • Scott Wilson

    Scott Wilson - 2023-06-19
    • Group: v2.29 --> v2.30
     

Log in to post a comment.