I'm trying to parse a page that advertises a bogus charset in its META data (iso-8859-1 when it's really windows-1252).
Is there a way to tell HTML Parser to be lenient about charset identification and ignore the charset specified in the META data ? It would thus simply accept any encoding given to it by means of parser.setEncoding().
Thanks in advance.
Lionel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I'm trying to parse a page that advertises a bogus charset in its META data (iso-8859-1 when it's really windows-1252).
Is there a way to tell HTML Parser to be lenient about charset identification and ignore the charset specified in the META data ? It would thus simply accept any encoding given to it by means of parser.setEncoding().
Thanks in advance.
Lionel