This is a patch for reading non-utf8 (chinese gb2312 and others) chm files.
1. Work around the long-standing gtkhtml2 bug where the charset is wrong if the <title> tag include non-ascii characters. I simply remove non-printable characters inside the title tag.
2. The path name of index and topics is also in non-utf8 (gb2312) encoding. So we need two "to_utf8"'s there.
3. I decided to remove the to_utf8 in open_chm to avoid double conversion. This is a better place because the entire content of the index and topics is converted earlier. In general, it should be required that all links passed into open_chm must be in utf8.
Hopefully this patch will help other people. Please review it and let me know whether it can be merged into CVS
Thanks a lot
Log in to post a comment.