Menu

#180 duplicate tags added at the end (script,body,html)

v2.19
closed-fixed
nobody
None
5
2017-02-10
2016-11-20
Haadar
No

it seems like duplicate </body are added at the end when parsing this url
also it adds a strange /]]/ before the duplicate scripts
m.news24.com/news24/World/News/passenger-describes-india-train-derailment-over-100-dead-20161120

Discussion

  • Scott Wilson

    Scott Wilson - 2017-02-06
    • Group: v2.18 --> v2.19
     
  • Scott Wilson

    Scott Wilson - 2017-02-06

    Hmm, there are two main issues here.

    One is the conditional processing rules in comments before the head tag - its not really standard HTML so HC just moves the comments into the BODY. As there are two HTML tags in the document...

    The next is the handling of one of the scripts, which looks odd. I'll focus on that as there is clearly something wrong here.

     
  • Scott Wilson

    Scott Wilson - 2017-02-06

    OK, I think the problem here is that the CDATA section doesn't have an end token. So the CDATA is assumed to be everything up to the end of the doc. I've added a check for that - if there's a start CDATA with no end, we wind back to the start.

     
  • Scott Wilson

    Scott Wilson - 2017-02-06
    • status: open --> closed-fixed
     
  • Scott Wilson

    Scott Wilson - 2017-02-06

    Fixed in 2.19

     
  • Haadar

    Haadar - 2017-02-10

    shouldnt you close the cdata once it's wrapping tag end is reached?

     
  • Scott Wilson

    Scott Wilson - 2017-02-10

    What happens now is that if the CDATA start token has no corresponding end token, I terminate the section immediately after the start token. Previously it continued to the end of the document before terminating, which is why the output from that page looked so strange.

     

Log in to post a comment.