Menu

#1869 Parsing invalid numeric character references may fail the page load

2.26
closed
RBRi
None
1
2017-04-13
2017-04-13
No

Hi team,

There is a small parser issue if a numeric character reference is invalid. Imagine a text like "Nimbus™ 3000" that the page author, however, entered as "Nimbus�" (mind the missing semicolon). As a consequence, the numeric character reference is of course invalid. When such a text is parsed, browsers usually handle this by inserting the � symbol.

HtmlUnit may or may not fail in such a scenario. Looks like the parser can gracefully handle this situation if the offending text is in the body of an element. If it is in an attribute value, the page load fails with an IllegalArgumentException thrown by Neko.

See the attached test case that demonstrates this behavior.

Thanks,
J.

1 Attachments

Discussion

  • RBRi

    RBRi - 2017-04-13
    • assigned_to: RBRi
     
  • RBRi

    RBRi - 2017-04-13
    • status: open --> closed
     
  • RBRi

    RBRi - 2017-04-13

    Fixed in SVN - you need an updated Neko.

    Thanks for reporting...

    PS: Solche Fehler können nur Leute mit Umlauten im Namen finden ;-)

     
    • Joerg Werner

      Joerg Werner - 2017-04-19

      Dafür sind wir doch da ... ;-)

       

Log in to post a comment.

MongoDB Logo MongoDB