#68 Unmatched <script> tag eats the rest of the markup in the document

General
closed-out-of-date
nobody
None
5
2015-10-24
2013-09-10
Trejkaz
No

We're evaluating switching to Jericho from our own parser based on HTMLParser, as ours is sort of limited.

For this sample from our test cases:

Leading Line.
<script>Some broken script.<p>Line 1.</p><p>Line 2.</p>

Jericho currently decides that the last

is the end of the <script>, resulting in our text extraction getting no content as we omit scripts.

Our existing parser hits the

and decides that the script is over, so "Line 1." comes out as text.

Discussion

  • Trejkaz

    Trejkaz - 2013-09-10

    Rats. Markdown interpreted that tag as part of the content and now I can't fix it.

     
  • Martin Jericho

    Martin Jericho - 2013-09-10
    • status: unread --> pending
     
  • Martin Jericho

    Martin Jericho - 2015-10-24
    • status: pending --> closed-out-of-date
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks