Incorrect parsing from InputStream at closing script tag

Brought to you by: andyc2, mguillem

#142 Incorrect parsing from InputStream at closing script tag

Status: open

Owner: nobody

Labels: scanner (58)

Priority: 5

Updated: 2012-07-27

Created: 2012-07-27

Creator: Anonymous

Private: No

If the InputStream splits an ending script tag as "</" and "script> the parser misses the end script tag and incorretcly parses the page.

This bug is driving me mad because it only fails sometimes on the same exact page!

As pointed by Janito Vaqueiro Ferreira the problem may be at:
HTMLScanner.nextContent(int)
but I am having a hard time understanding that code.

I have developed a test case with an InputStream that exposes this problem.

Discussion

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-07-27

Test Case for a chunked InputStream that splits at the closing script tag

Test Case for a chunked InputStream that splits at the closing script tag

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

HTMLScriptTest.java

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-08-09

ContentScanner fix

ContentScanner fix

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

nekohtml_ContentScanner_fix.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-08-09

I have attached a potential fix for this issue.
Could you try it?

I have attached a potential fix for this issue. Could you try it?

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-12-21

The attached patch fixes the bug.
It keeps the input buffer growing instead of clearing it at the wrong time.
Fixing a bug should be higher priority than memory usage.
Please, could you explain why you are not patching it?

The attached patch fixes the bug. It keeps the input buffer growing instead of clearing it at the wrong time. Fixing a bug should be higher priority than memory usage. Please, could you explain why you are not patching it?

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.