#14 Support for Unicode

James Athey

Since I'm reading japanese manga sometimes, I found it
quite troublesome to unpack and rename all files in
order to read them in Comical... When I used CDisplay
before, using AppLocale (from M$), I could read them -
but this doesn't work for some reason with comical...

So it would be cool if you could add support for
Unicode to Comical - thank you!


  • Logged In: NO

    I can confirm this. It works for some non-ascii chars but if
    I put a Japanese character in the filename comical refuses
    to open it. And if I put a Japanese character in one of the
    image filenames it crashes with this bt:

    Program received signal SIGABRT, Aborted.
    [Switching to Thread 1083197792 (LWP 2490)]
    0x00002aaaac09913d in raise () from /lib/libc.so.6
    (gdb) bt

    0 0x00002aaaac09913d in raise () from /lib/libc.so.6

    1 0x00002aaaac09a86e in abort () from /lib/libc.so.6

    2 0x00002aaaabb7f7d7 in

    gnu_cxx::verbose_terminate_handler ()
    from /usr/lib/libstdc++.so.6

    3 0x00002aaaabb7d866 in __gxx_personality_v0 () from


    4 0x00002aaaabb7d893 in std::terminate () from


    5 0x00002aaaabb7d97a in __cxa_throw () from


    6 0x00000000004350ad in ComicBookZIP::ExtractStream ()

    7 0x000000000042f276 in ComicBook::Entry ()

    8 0x00002aaaab94a942 in wxThreadInternal::PthreadStart ()

    from /usr/lib/libwx_baseu-2.6.so.0

    9 0x00002aaaabf5b12a in start_thread () from


    10 0x00002aaaac1313c3 in clone () from /lib/libc.so.6

    11 0x0000000000000000 in ?? ()

    I can look into making a patch for it if I have time.

  • Logged In: NO

    This is exactly the kind of bug report I asked for a few
    months back when I switched to Unicode here. Can you point
    me toward some comic books which Comical objects to?

  • Logged In: NO

    Not really. I didn't have any comic books with those types
    of characters so I just added some of the characters into
    them myself, both in the images inside and the filename of
    the comic. For example, I added unicode char U+BB82 to the
    files. On further testing, if the filename of the comic
    itself contains unicode it works fine for rar archives but
    it refuses to open with unzip, implying a limitation with
    the unzip library you're using. Forget what I said earlier
    about it working for "some" non-ascii chars; zips don't work
    for any non-ascii. At least this fails gracefully and
    outputs an error and continues. I see in ComicBookZip.cpp
    that you assume the filename is ascii and convert to that
    (this again might be because of the unzip limitation).

    For the filenames of the images inside an archive, if they
    contain unicode then it crashes with the below backtrace. I
    could only test zip files for this since there is no linux
    support for creating rar archives (f**kin rarlabs).

    Also, can you please add a debugging mode to the makefile?
    CXXFLAGS = '-g -ggdb -D_DEBUG' and LDFLAGS = '-g -ggdb'
    would need to be added. This would allow me to see in which
    line it actually failed in the backtrace.

    -Steven Sheehy

  • Logged In: NO

    I made a patch to fix the unicode problems with Comical. It
    fixes all problems I mentioned below (at least on linux
    amd64...needs testing elsewhere). I used mb_str(wxConvLocal)
    to convert from unicode to the system's ANSI code page. I
    also went ahead and rewrote the unicode stuff in
    ComicBookRar to not have to use #ifdef wxUSE_UNICODE
    everywhere. I've tested with and without unicode support
    compiled into Comical and it works properly for both. It
    works for both RAR and ZIPs with non-ASCII chars in their
    filename and it works for non-ASCII chars for the images
    inside the archive (not able to test the latter for RAR
    archives since I'm on Linux).

    I also rewrote setPassword to convert the inputted password
    to ANSI code page before sending to the rar and zip
    libraries. This part is untested since I don't have any
    password protected archives. Patch is here:


    This patch was made with svn rev 157 since current versions
    don't work.

    • Steven Sheehy (steven[d0t]sheehy[@t]gmail[d0t]com)
  • Dennis Lim
    Dennis Lim

    Logged In: YES

    Latest version 0.8 of comical is using the minizip libraries.
    Unfortunately this requires a filename.toAscii() call while
    opening the file. This is moving towards less unicode
    support, not more.
    I'm not sure why the wxZipImputStream was abandoned. At
    least that had support for unicode on the filenames. We can
    work out the other problems later.
    I'm a programmer and I don't mind helping to fix bugs but so
    far I've not encountered any other unicode related problems
    with the comics I read.

  • Logged In: NO

    Um...did you not notice my post right below yours? I fixed
    the problem with unicode, including zips. The minizip does
    not require ascii, it just requires that the filename passed
    to it is encoded in the system's ANSI codepage since it just
    calls the system call fopen(), thus they can't be wide
    character strings. Why not try out the patch I posted and
    then comment?

    -Steven S.

  • Dennis Lim
    Dennis Lim

    Logged In: YES

    Thanks for the patch. I've tested your patch on v0.8 on windows.
    The problem I originally faced was with a (c) copyright
    symbol on the path. The patch has fixed this.
    Further testing with some chinese string I copied somewhere
    didn't work. The test string was "韩剧热线诚聘[韩、英语翻
    .cbz". Testing with comical v0.7 worked fine.

    I guess I was just venting frustration at having something
    working get broken. The move to minizip made it more trouble
    to compile on windows. I had to download some DLL from here
    and header files from there, etc. I was actually hoping to
    move towards wxZipInputStream and perhaps even create a
    wrapper to wxRarInputStream to solve this. Oh well.

    Anyway, I'll be using your patch for now since it solves my
    particular problem. However it probably is not a long term
    solution since it does not handle all cases. If you could
    get it to work with the string I paste above please let us
    know. Thanks.

  • Logged In: NO

    Thanks for testing. I knew it worked for me but it's good to
    have other people confirm it. Can you also test unicode in
    the files inside the archive?

    I tested that string and it worked perfectly for me. I'm not
    sure why it failed for you. By fails, do you mean it crashes
    or outputs an error or what? Can you put some printfs to see
    if it's passed to minizip fine?

    BTW, it may be that your cbz file is actually a cbr file.
    When this happens, comical refuses to open it (another bug
    not related to unicode). Try renaming it to cbr and see if
    it works.

    -Steven S.

  • Dennis Lim
    Dennis Lim

    Logged In: YES

    Hmm.. the email sent out by sourceforge seems to have
    mangled the filename. I can not see the chinese chars even
    though I have 'asian fonts' installed on XP. (I don't know
    how these things work on linux)
    However, from the web page, the filename looks fine. i.e. I
    can still see the chinese characters.
    Perhaps if you used the email to test, it was mangled. try
    to cut and paste from the web. You should be able to
    actually see some chinese characters before you test. Also,
    perhaps the behavior is different on linux and XP. I'm
    running on windowsXP.
    It fails inside minizip. We get a null from the unzOpenFile
    or something.

    The problem may not be a big deal. I tested with winzip and
    it doesn't seem to want to open it either. However,
    comical0.7 does open it fine. I guess it depends on what
    tools were used to create this files.

    At this point I would consider your patch an improvement
    over the existing 0.8 code base. Have you considered
    submitting in the patch section?

  • Logged In: NO

    As you can see, I'm not registered so I don't get
    sourceforge email. I used the chinese characters you posted
    on the web. On linux, we don't have to install "asian fonts"
    since we have proper unicode support built right in. That's
    the beauty of unicode, all the necessary characters are in
    one encoding...don't know why ms makes you install extra
    stuff to do that. If I can't reproduce your problem, then I
    can't really fix it. So feel free to have a look at if you
    can. I don't think minizip is the problem since it works for
    me just fine on linux. You may just want to create a simple
    test file that uses fopen() passed with a utf8 filename to
    see if it works on windows.

    btw, can you verify if the bug I submitted on the bug page
    happens to you?

  • Dennis Lim
    Dennis Lim

    Logged In: YES

    I think that what happens in windows is that the unicode
    support is there in the applications. However, if you try
    to view a chinese text, it will just output some funny
    square characters instead. This is due to lack of font. The
    fonts are not installed by default because unicode fonts
    can be rather big.

    It do like unicode and it would be great if the minizip
    interfaces were all using that. Unfortunately, they use

    At this point, your patch fixes my problem with
    the 'copyright' symbol. I do not read manga yet so I'm not
    personally facing any problems with that. I'm more
    interested to see if the original poster has any sample
    strings or sample files that I can test with.
    The string I posted below is interesting in that although
    comical 0.7 can open it, winzip cannot. Therefore, it may
    not be a fair test.

    So I'm just going to use your patch and leave it at that
    unless somebody is facing a problem and can submmit a
    sample file that does not work. (i.e. actual situation, not
    some contrived test). I don't want to be spending time
    fixing something which nobody needs. Who knows, I may
    revisit this later and do something about it.

    BTW, about the bug, I tested and posted in the bug section.

  • Logged In: NO

    I think you're confused. It is completely acceptable that
    minizip uses char for its strings. The other choice would
    be wchar_t
    (ie wxString in unicode mode). char uses 8-bit
    characters, whereas wchar_t is usually 32 bits. Unicode
    comes in many shapes and sizes: 7, 8, 16, 32 bits encodings
    are all possible. So to encode unicode into 8 bits, one
    would just use utf8. However, not all input is necessarily
    unicode. It could be ISO-8859-1, CP1252, etc. depending on
    the operating system's ansii code page. So in wxwidgets, the
    sequence of encodings for a system that is CP1252, for
    example, would be CP1252(from file being opened)->wide
    character unicode (ISO 10646 stored in wxString)->CP1252
    (using mb_str() in my patch). Perhaps your system's codepage
    is not an 8-bit encoding and that is why it's not working.
    The copyright symbol is actually still less than 8 bits (but
    above ASCII) so that's probably why that worked. Maybe you
    need to force a 8bit encoding being passed off to minizip.
    Try using wxConvUTF8 instead of wxConvLocal and see if that
    works. Make sure you replace all of them if you do.

    I tried the above and it worked for me, but my system's code
    page is utf8 so that is to be expected.

  • Dennis Lim
    Dennis Lim

    Logged In: YES

    You're right. I got confused. I forgot unicode can be
    encoded in UTF8. It is usually the case that unicode is
    encoded in wchar_t.

    I'm not too familliar with what my OS native encoding is. I
    believe that Windows has 2 sets of API, one for ASCII and
    one for unicode indicated by a postfix W or A. This is
    usually handled at compile time depending on the unicode
    define. (and unicode versi9on usually uses wchar_t)

    Anyway, I tested with wxConvUTF8 and it didn't work either.
    I believe that looking at the bytestream, the OS cannot
    automatically determine UTF16 or UTF8, therefore this
    determination is by convention or API documentation. Also,
    usually windows is using UTF16 for it's API so that's why
    it's not working.

  • Hito

    Logged In: YES

    I do not think winzip (or winrar) has proper unicode
    support. The only program I know that does it nicely is
    7-zip (and that's why I use it).

    I, too, am looking for an image viewing program with unicode
    support. If this program will become one, it would be great :)