duplicate tags added at the end (script,body,html)

Brought to you by: patmoore, scottwilson, vnikic

#180 duplicate tags added at the end (script,body,html)

Milestone: v2.19

Status: closed-fixed

Owner: nobody

Labels: None

Priority: 5

Updated: 2017-02-10

Created: 2016-11-20

Creator: Haadar

Private: No

it seems like duplicate </body are added at the end when parsing this url
also it adds a strange /]]/ before the duplicate scripts
m.news24.com/news24/World/News/passenger-describes-india-train-derailment-over-100-dead-20161120

Discussion

Scott Wilson - 2017-02-06

Group: v2.18 --> v2.19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2017-02-06

Hmm, there are two main issues here.

One is the conditional processing rules in comments before the head tag - its not really standard HTML so HC just moves the comments into the BODY. As there are two HTML tags in the document...

The next is the handling of one of the scripts, which looks odd. I'll focus on that as there is clearly something wrong here.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2017-02-06

OK, I think the problem here is that the CDATA section doesn't have an end token. So the CDATA is assumed to be everything up to the end of the doc. I've added a check for that - if there's a start CDATA with no end, we wind back to the start.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2017-02-06

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2017-02-06

Fixed in 2.19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Haadar - 2017-02-10

shouldnt you close the cdata once it's wrapping tag end is reached?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2017-02-10

What happens now is that if the CDATA start token has no corresponding end token, I terminate the section immediately after the start token. Previously it continued to the end of the document before terminating, which is why the output from that page looked so strange.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.