Menu

#263 Capitalization fails with iconv on FreeBSD

open
nobody
None
5
2008-02-27
2008-02-27
No

When linking with iconv library, capitalization of family name fails and shows only "(?)". This was confirmed on POSIX, C, utf-8 and iso8859-1 locale settings. Tested platform was FreeBSD 6.2, 6.3, 7.0RC1 and 7.0RC2.

Discussion

  • elsapo

    elsapo - 2008-04-10

    Logged In: YES
    user_id=1195173
    Originator: NO

    Looks to me like the relevant code is indi_to_name (src/gedlib/node.c), which calls manip_name (src/gedlib/names.c), which calls upsurname (src/gedlib/names.c), which calls ll_toupperz (src/stdlib/strcvt.c).

    #1)
    Assuming that file UnicodeDataExcerpt.txt is available in the tt path, this should call through to charprops_toupperz. (Actually I had an old setup, and tt path was lacking that, so I had to go copy that there from the sourceforge source directory at this point.) charprops_toupperz calls convert_utf8 (src/gedlib/charprops.c), which uses the custom translation table "UTF-8 upper", which was created when the file UnicodeDataExcerpt.txt was read.

    #2)
    Otherwise this falls back to depending on the C runtime-library, depending on HAVE_TOWUPPER (do you have that defined to 1 in your config.h?) and calls wz_makeupper (src/stdlib/strcvt.c), which calls towupper, which is a C library function (I think).

    To see whether (#1) or (#2) above applies, go to u(tilities) c(haracter set options), and see whether it says "UTF-8 charprops loaded" (case #1 above) or "UTF-8 charprops not loaded" (case #2 above).

    PS: You should be able to disable the capitalization of family name via the option UppercaseSurnames.

     
  • elsapo

    elsapo - 2008-04-10

    Logged In: YES
    user_id=1195173
    Originator: NO

    So, if your local setup is using UnicodeDataExcerpt.txt (option #1 in my previous post), could you lookup whichever character is giving your trouble in that file, and see what its upper case is set to, or if it is missing from that Unicode table?

    If your local setup is using your C runtime library (option #2 in my previous post), could you run a test program to see whether your local towupper function works on whichever character is giving you trouble?

     
  • Olaf Trygve Berglihn

    Logged In: YES
    user_id=18626
    Originator: YES

    The problem was with any surename, even if characters were all within ASCII. The testing was done with setup according to option #2. Removing iconv made capitalization work again. When adding TTPATH and downloading UnicodeDataExcerpt.txt, capitalization works (now using option #1. Anyway, here is a test of the C runtime library, showing that there is nothing wrong there.

    #include <stdio.h>
    #include <stdlib.h>
    #include <locale.h>
    #include <wctype.h>

    #define TEST_STRING L"abcæøåÆØÅ"

    int main(int argc, char* argv[])
    {
    wchar_t s[]=TEST_STRING;
    int i, len;

    setlocale(LC_CTYPE, getenv("LC_CTYPE"));
    wprintf(L"Before towupper: %S\n", s);
    len = wcslen(s);
    for(i=0;i<len;i++)
    s[i] = (wchar_t)towupper(s[i]);
    wprintf(L"After towupper: %S\n", s);
    return 0;
    }

    The result of this code is as expected:
    Before towupper: abcæøåÆØÅ
    After towupper: ABCÆØÅÆØÅ

     
  • elsapo

    elsapo - 2008-04-16

    Logged In: YES
    user_id=1195173
    Originator: NO

    You said your setup was according to option #2.

    I missed it if you answered the question under my option #2:

    > on HAVE_TOWUPPER (do you have that defined to 1 in your config.h?)

    Now, to be clear, you said under option #2 (probably with HAVE_TOWUPPER defined), it was failing, but when you removed iconv, it worked? I want to confirm that, because that sounds odd to me -- I'll have to look at the code and think about what that means.

    Ah, I have an idea. In the failing setup, would you post the contents of the screen you get when you use the commands u(tility) c(haracter set options)? Unless you mind posting that info -- and if you want to chop off any parts of the paths, if you'd prefer not to post them, that is fine. Or you could mark this bug as Private if you like, or we could move this entire discussion to email also, if you'd prefer.

    Thanks for the help & debugging.

     
  • elsapo

    elsapo - 2008-04-16

    Logged In: YES
    user_id=1195173
    Originator: NO

    I have a hypothesis -- perhaps the failure is in strcvt.c in makewide, when in line 73:

    if (!iconv_trans(int_codeset, dest, str, zstr, '?')) {

    This suggests that iconv is failing to perform the requested translation. Assuming that get_wchar_codeset_name returns UCS-4-INTERNAL on your platform (meaning sizeof(wchar_t) == 4), then this should be discoverable by trying something like the following

    $ iconv -f ZZZZZ -t UCS-4-INTERNAL testfile

    (instead of ZZZZ, type the codeset of your database -- which is given in the top line of the uc info screen)

    In the file testfile, put whatever letters fail in lifelines capitalization.

    If that iconv test line gives results like this:

    iconv: testfile:1:0: cannot convert

    then iconv is incapable of performing the requested conversion.

     
  • elsapo

    elsapo - 2008-04-16

    Logged In: YES
    user_id=1195173
    Originator: NO

    (BTW, I'm disappointed that iconv_trans in strcvt.c, which I clearly wrote, doesn't report whether or not it inserted illegal characters -- if it did, I could consider altering the upcase stuff to not use iconv at all if any characters weren't handled fully.)

     
  • Olaf Trygve Berglihn

    Logged In: YES
    user_id=18626
    Originator: YES

    Yes, HAVE_TOWUPPER was defined in config.h. I tested two different setups. One in which iconv was compiled in and failed capitalization for any surename. The other setup without iconv had no problems in capitalization. I can post the utility->character set option stuff if you need it. However, it seems you are on the right track with strcvt.c:73 and failing iconv call results in printing (?).

     
  • elsapo

    elsapo - 2008-04-18

    Logged In: YES
    user_id=1195173
    Originator: NO

    Yes please, I'm still interested in more information -- to see if your iconv is really failing the conversion, and if so why -- iconv knows a LOT of character encodings, so I wonder if the culprit is going to be some slightly odd representation of the character name somewhere.

     
  • Olaf Trygve Berglihn

    Logged In: YES
    user_id=18626
    Originator: YES

    Iconv is fine. I use it often from console.
    $ iconv -f utf-8 -t UCS-4-INTERNAL testfile
    produces the correct result.

    Here are the settings in utilities->character set options. If I set TT-path in ~/.linesrc, and put the file UnicodeDataExcerpt.txt in the tt-directory, capitalization works. Note that the UnicodeDataExcerpt.txt is not included in the 3.0.62 tarball on sourceforge. Also note that ANY character fails in capitalization, even characters within standard ASCII.

    Codeset information (1/19))
    >Internal codeset: UTF-8
    Internal UTF-8: Yes
    Locales are enabled.
    NLS (National Language Support) is compiled in.
    LocaleDir (default): /usr/local/share/locale
    LocaleDir (override):
    bind_textdomain_codeset: UTF-8
    iconv (codeset conversion) is compiled in.
    Startup collate locale: C
    Startup messages locale: C
    Current collate locale: C
    Current messages locale: C
    Collation routine: wcscoll
    GUI codeset: UTF-8
    editor codeset: UTF-8
    report codeset: UTF-8
    GEDCOM codeset: UTF-8
    TTPATH: .
    UTF-8 charprops not loaded

     

Log in to post a comment.