xbx
-
2007-10-03
invalid utf-8 workaround
Brought to you by:
leethomason
If a document contains the following sequence:
3c 62 3e e3 25 3c 2f 62 3e
"<b>xxx</b>"
where xxx is an invalid utf-8 character sequence, the parser will eat the '<' and say the document is missing the closing element.
The patch works around the situation by making the utf-8 decoder not eating unexepected characters.