i have major problems handling different character encodings on different websites. charsets like "windows-1256", "windows-1252" and "iso-8859-1" all get converted to "iso-8859-1" and corrupts the content. is there any way to convert them all to "utf-8" befor parsing them?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i have major problems handling different character encodings on different websites. charsets like "windows-1256", "windows-1252" and "iso-8859-1" all get converted to "iso-8859-1" and corrupts the content. is there any way to convert them all to "utf-8" befor parsing them?