character mismatch error
Brought to you by:
derrickoswald
I was trying to parser the source of http://www.microchip.com/ and encountered with the given exception below.
org.htmlparser.util.EncodingChangeException: character mismatch (new: ï [0xef] != old: [0xfeff]) for encoding change from UTF-8 to windows-1252 at character offset 0
I am using v2.0 also I have checked with other bug# and Faq related and modified my code accordingly but still encountering the same issue and is going into an infinite loop( as specified in one of the posts).
Please let me know if there is any work around for this.
thanks and regards,
Vijaya Bhaskar Peddinti
One way is to save the contents of the URL to disk and then run the parser against that file. Then you can specify the correct encoding.