As of 1.5, simple html dom parses html byte by byte and is not multi byte aware. as a quick fix, i added one line to the load function to convert the character encoding from UTF-8 to 'HTML-ENTITIES' which effectively converts the doc to a single byte encoding and allows the parsing to continue. but, assumes your input is UTF-8. i've attached a patch file to show what i did. however, i believe the correct solution is to modify the parser to be multi byte aware. i recommend converting all input to UTF-8 and modifying the parser to handle UTF-8 according the wiki article at http://en.wikipedia.org/wiki/UTF-8#Design.
Cheers,
Keith
View and moderate all "feature-requests Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Feature Requests"
Patch to allow multibyte input
Ticket moved from /p/simplehtmldom/bugs/89/