SourceForge has been redesigned. Learn more.
Close

#2 Patch for reading non-utf8 (chinese gb2312) chm files

closed
nobody
None
5
2007-09-23
2007-06-23
No

This is a patch for reading non-utf8 (chinese gb2312 and others) chm files.

1. Work around the long-standing gtkhtml2 bug where the charset is wrong if the <title> tag include non-ascii characters. I simply remove non-printable characters inside the title tag.

2. The path name of index and topics is also in non-utf8 (gb2312) encoding. So we need two "to_utf8"'s there.

3. I decided to remove the to_utf8 in open_chm to avoid double conversion. This is a better place because the entire content of the index and topics is converted earlier. In general, it should be required that all links passed into open_chm must be in utf8.

Hopefully this patch will help other people. Please review it and let me know whether it can be merged into CVS

Thanks a lot

Cheuksan Wang

Discussion

  • Cheuksan Edward Wang

    Patch for reading non-utf8 (chinese gb2312) chm files

     
  • Cheuksan Edward Wang

    Logged In: YES
    user_id=827361
    Originator: YES

    File Added: diff4

     
  • Cheuksan Edward Wang

    • status: open --> closed
     
  • Cheuksan Edward Wang

    Logged In: YES
    user_id=827361
    Originator: YES

    merged into latest release

     

Log in to post a comment.