If the InputStream splits an ending script tag as "</" and "script> the parser misses the end script tag and incorretcly parses the page.
This bug is driving me mad because it only fails sometimes on the same exact page!
As pointed by Janito Vaqueiro Ferreira the problem may be at:
HTMLScanner.nextContent(int)
but I am having a hard time understanding that code.
I have developed a test case with an InputStream that exposes this problem.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Test Case for a chunked InputStream that splits at the closing script tag
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
ContentScanner fix
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
I have attached a potential fix for this issue.
Could you try it?
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
The attached patch fixes the bug.
It keeps the input buffer growing instead of clearing it at the wrong time.
Fixing a bug should be higher priority than memory usage.
Please, could you explain why you are not patching it?