#14 Support for Unicode

1.0_Features
open
James Athey
None
9
2012-09-06
2006-03-26
Anonymous
No

Since I'm reading japanese manga sometimes, I found it
quite troublesome to unpack and rename all files in
order to read them in Comical... When I used CDisplay
before, using AppLocale (from M$), I could read them -
but this doesn't work for some reason with comical...

So it would be cool if you could add support for
Unicode to Comical - thank you!

Discussion

1 2 > >> (Page 1 of 2)
  • Logged In: NO

    I can confirm this. It works for some non-ascii chars but if
    I put a Japanese character in the filename comical refuses
    to open it. And if I put a Japanese character in one of the
    image filenames it crashes with this bt:

    Program received signal SIGABRT, Aborted.
    [Switching to Thread 1083197792 (LWP 2490)]
    0x00002aaaac09913d in raise () from /lib/libc.so.6
    (gdb) bt

    0 0x00002aaaac09913d in raise () from /lib/libc.so.6

    1 0x00002aaaac09a86e in abort () from /lib/libc.so.6

    2 0x00002aaaabb7f7d7 in

    gnu_cxx::verbose_terminate_handler ()
    from /usr/lib/libstdc++.so.6

    3 0x00002aaaabb7d866 in __gxx_personality_v0 () from

    /usr/lib/libstdc++.so.6

    4 0x00002aaaabb7d893 in std::terminate () from

    /usr/lib/libstdc++.so.6

    5 0x00002aaaabb7d97a in __cxa_throw () from

    /usr/lib/libstdc++.so.6

    6 0x00000000004350ad in ComicBookZIP::ExtractStream ()

    7 0x000000000042f276 in ComicBook::Entry ()

    8 0x00002aaaab94a942 in wxThreadInternal::PthreadStart ()

    from /usr/lib/libwx_baseu-2.6.so.0

    9 0x00002aaaabf5b12a in start_thread () from

    /lib/libpthread.so.0

    10 0x00002aaaac1313c3 in clone () from /lib/libc.so.6

    11 0x0000000000000000 in ?? ()

    I can look into making a patch for it if I have time.

     
  • Logged In: NO

    This is exactly the kind of bug report I asked for a few
    months back when I switched to Unicode here. Can you point
    me toward some comic books which Comical objects to?

     
  • Logged In: NO

    Not really. I didn't have any comic books with those types
    of characters so I just added some of the characters into
    them myself, both in the images inside and the filename of
    the comic. For example, I added unicode char U+BB82 to the
    files. On further testing, if the filename of the comic
    itself contains unicode it works fine for rar archives but
    it refuses to open with unzip, implying a limitation with
    the unzip library you're using. Forget what I said earlier
    about it working for "some" non-ascii chars; zips don't work
    for any non-ascii. At least this fails gracefully and
    outputs an error and continues. I see in ComicBookZip.cpp
    that you assume the filename is ascii and convert to that
    (this again might be because of the unzip limitation).

    For the filenames of the images inside an archive, if they
    contain unicode then it crashes with the below backtrace. I
    could only test zip files for this since there is no linux
    support for creating rar archives (f**kin rarlabs).

    Also, can you please add a debugging mode to the makefile?
    CXXFLAGS = '-g -ggdb -D_DEBUG' and LDFLAGS = '-g -ggdb'
    would need to be added. This would allow me to see in which
    line it actually failed in the backtrace.

    -Steven Sheehy

     
  • Logged In: NO

    I made a patch to fix the unicode problems with Comical. It
    fixes all problems I mentioned below (at least on linux
    amd64...needs testing elsewhere). I used mb_str(wxConvLocal)
    to convert from unicode to the system's ANSI code page. I
    also went ahead and rewrote the unicode stuff in
    ComicBookRar to not have to use #ifdef wxUSE_UNICODE
    everywhere. I've tested with and without unicode support
    compiled into Comical and it works properly for both. It
    works for both RAR and ZIPs with non-ASCII chars in their
    filename and it works for non-ASCII chars for the images
    inside the archive (not able to test the latter for RAR
    archives since I'm on Linux).

    I also rewrote setPassword to convert the inputted password
    to ANSI code page before sending to the rar and zip
    libraries. This part is untested since I don't have any
    password protected archives. Patch is here:

    http://www.utdallas.edu/~sas014510/unicode.patch

    This patch was made with svn rev 157 since current versions
    don't work.

    • Steven Sheehy (steven[d0t]sheehy[@t]gmail[d0t]com)
     
  • Dennis Lim
    Dennis Lim
    2006-04-24

    Logged In: YES
    user_id=117202

    Latest version 0.8 of comical is using the minizip libraries.
    Unfortunately this requires a filename.toAscii() call while
    opening the file. This is moving towards less unicode
    support, not more.
    I'm not sure why the wxZipImputStream was abandoned. At
    least that had support for unicode on the filenames. We can
    work out the other problems later.
    I'm a programmer and I don't mind helping to fix bugs but so
    far I've not encountered any other unicode related problems
    with the comics I read.

     
  • Logged In: NO

    Um...did you not notice my post right below yours? I fixed
    the problem with unicode, including zips. The minizip does
    not require ascii, it just requires that the filename passed
    to it is encoded in the system's ANSI codepage since it just
    calls the system call fopen(), thus they can't be wide
    character strings. Why not try out the patch I posted and
    then comment?

    -Steven S.

     
  • Dennis Lim
    Dennis Lim
    2006-04-25

    Logged In: YES
    user_id=117202

    Thanks for the patch. I've tested your patch on v0.8 on windows.
    The problem I originally faced was with a (c) copyright
    symbol on the path. The patch has fixed this.
    Further testing with some chinese string I copied somewhere
    didn't work. The test string was "韩剧热线诚聘[韩、英语翻
    译]
    .cbz". Testing with comical v0.7 worked fine.

    I guess I was just venting frustration at having something
    working get broken. The move to minizip made it more trouble
    to compile on windows. I had to download some DLL from here
    and header files from there, etc. I was actually hoping to
    move towards wxZipInputStream and perhaps even create a
    wrapper to wxRarInputStream to solve this. Oh well.

    Anyway, I'll be using your patch for now since it solves my
    particular problem. However it probably is not a long term
    solution since it does not handle all cases. If you could
    get it to work with the string I paste above please let us
    know. Thanks.

     
  • Logged In: NO

    Thanks for testing. I knew it worked for me but it's good to
    have other people confirm it. Can you also test unicode in
    the files inside the archive?

    I tested that string and it worked perfectly for me. I'm not
    sure why it failed for you. By fails, do you mean it crashes
    or outputs an error or what? Can you put some printfs to see
    if it's passed to minizip fine?

    BTW, it may be that your cbz file is actually a cbr file.
    When this happens, comical refuses to open it (another bug
    not related to unicode). Try renaming it to cbr and see if
    it works.

    -Steven S.

     
  • Dennis Lim
    Dennis Lim
    2006-04-27

    Logged In: YES
    user_id=117202

    Hmm.. the email sent out by sourceforge seems to have
    mangled the filename. I can not see the chinese chars even
    though I have 'asian fonts' installed on XP. (I don't know
    how these things work on linux)
    However, from the web page, the filename looks fine. i.e. I
    can still see the chinese characters.
    Perhaps if you used the email to test, it was mangled. try
    to cut and paste from the web. You should be able to
    actually see some chinese characters before you test. Also,
    perhaps the behavior is different on linux and XP. I'm
    running on windowsXP.
    It fails inside minizip. We get a null from the unzOpenFile
    or something.

    The problem may not be a big deal. I tested with winzip and
    it doesn't seem to want to open it either. However,
    comical0.7 does open it fine. I guess it depends on what
    tools were used to create this files.

    At this point I would consider your patch an improvement
    over the existing 0.8 code base. Have you considered
    submitting in the patch section?

     
  • Logged In: NO

    As you can see, I'm not registered so I don't get
    sourceforge email. I used the chinese characters you posted
    on the web. On linux, we don't have to install "asian fonts"
    since we have proper unicode support built right in. That's
    the beauty of unicode, all the necessary characters are in
    one encoding...don't know why ms makes you install extra
    stuff to do that. If I can't reproduce your problem, then I
    can't really fix it. So feel free to have a look at if you
    can. I don't think minizip is the problem since it works for
    me just fine on linux. You may just want to create a simple
    test file that uses fopen() passed with a utf8 filename to
    see if it works on windows.

    btw, can you verify if the bug I submitted on the bug page
    happens to you?

     
1 2 > >> (Page 1 of 2)