Menu

#85 html comments not ignored inside style tags

General
open-fixed
nobody
None
5
2016-11-28
2016-11-22
Code Buddy
No

I'm using jericho-html 3.4 and stumbled across a this bit of (cutdown my me) real world html that is causing the parser to ignore the whole document:

<html>
<head>
<style type=\"text/css\">
<!-- 
body {
background-color: yellow;
}
</style>
</head>
<body>
<h1>foo</h1>
</body>
<script>
-->
</script>
</html>

I've also created an example JUnit test to show this in a standalone project:
https://github.com/liamsharp/jerichohtml-html-comments-in-css

I know the author should have closed the comment before the closing style tag, but...

Faced with the same html, a browser will show the H1 (screenshot attached).

I belive the browsers ignore the open html comments, because in css they should be c style comments:
https://css-tricks.com/snippets/css/comments-in-css/ so just treats it as a syntax error and moves on.

It looks like the Jericho parser is treating these as proper html comments and then ignoring the entire doc.

1 Attachments

Discussion

  • Martin Jericho

    Martin Jericho - 2016-11-22

    Hi Code Buddy,

    I think you're probably right, the parser should be ignoring the HTML comment tags inside style elements. It does this for script elements, but only if you've called source.fullSequentialParse(). It doesn't do it for style elements because I thought it was a bit irrelevant since normal CSS never contains anything that looks like HTML. It seems no other users have ever come across this situation.

    I checked the HTML5 spec to make sure HTML comments should be ignored inside style elements but found something strange. While the HTML 4 spec says to treat the content as CDATA (thereby not parsing comments) the HTML5 spec actually says they should be parsed. Despite that no modern browsers parse the comment tags. I've asked in a forum for help to clarify what the correct behaviour is. I strongly suspect the modern browsers are behaving correctly but I'd like to be able to document that fact before changing the current behaviour of this library.

    http://stackoverflow.com/questions/40747239/does-the-html5-spec-say-to-ignore-css-inside-html-comments

    I'll let you know when the issue is resolved.

    Cheers
    Martin

     
  • Martin Jericho

    Martin Jericho - 2016-11-22
    • status: unread --> pending
     
  • Code Buddy

    Code Buddy - 2016-11-23

    Great stuff - thanks for the update Martin, much apprecaited!

     
  • Martin Jericho

    Martin Jericho - 2016-11-24
    • status: pending --> open-fixed
     
  • Martin Jericho

    Martin Jericho - 2016-11-24

    This has been fixed in version 3.5.

    Until version 3.5 is officially released, the development version is available here:
    http://jericho.htmlparser.net/temp/jericho-html-3.5-dev.zip

    Please let me know if you notice any regressive bahaviour resulting from this change.

    Thanks for the bug report!

     
  • Code Buddy

    Code Buddy - 2016-11-28

    Great stuff, thanks Martin!

     

Log in to post a comment.