#158 rra-timezone segfault (non-ascii tz)

open
rra (13)
5
2008-10-21
2008-10-21
Szymon Siwek
No

When I run rra-timezone I've got segmentation fault.

Details: Windows Mobile 6 Polish version; timezone name contains non-ascii character - "Europa Środkowa (czas stand.)"

First line function rra_timezone_create_id:

char* name = wstr_to_ascii(tzi->StandardName);

wstr_to_ascii fails and returns null, segfault is coming.

Discussion

  • Szymon Siwek
    Szymon Siwek
    2008-10-21

    As I say, attached synce-rra-nonascii_timezone.patch fixes this segfault, other wstr_*_ascii don't make any difference for me.

    Two remarks:
    - after change wstr_to_ascii(tzi->StandardName) to wstr_to_current, conversion [^[:alnum:]] to '_' doesn't work properly anymore (for example polish word 'żółć' will be converted to '________').

    - changing wstr_to_ascii to wstr_to_current is not sufficient - wstr_to_current still could return null ('LC_CTYPE=C ./rra-timezone')

     
  • David Eriksson
    David Eriksson
    2008-10-21

    Sorry, I was too quick to notice the attached file!

     
  • Mark Ellis
    Mark Ellis
    2009-01-02

    Can you please confirm the problem with the conversion of non alphanumerics to _, I would've thought this would be ok since isalnum should be locale aware.

     
  • Szymon Siwek
    Szymon Siwek
    2009-01-03

    "isalnum should be locale aware" - of course you're right
    ... unless you have to deal with multibyte characters.

    Compare two example programs and its outputs:

    one-byte.c:

    #include <ctype.h>
    #include <stdio.h>
    #include <locale.h>
    int main()
    {
    /*
    * "żółć" - four letter polish word (meaning f.e. "bile") in utf-8 encoding
    */
    char *str = "żółć";
    unsigned char *p;
    int alnum;
    setlocale(LC_CTYPE, "pl_PL.utf8");
    printf("(%s)\n", str);
    for (p = str; *p != '\0'; p++) {
    alnum = isalnum(*p);
    printf("0x%02x\t(%s)\t%i\n", *p, p, alnum);
    }
    printf("\n");
    return 0;
    }

    $ ./one-byte
    (żółć)
    0xc5 (żółć) 0
    0xbc (�ółć) 0
    0xc3 (ółć) 0
    0xb3 (�łć) 0
    0xc5 (łć) 0
    0x82 (�ć) 0
    0xc4 (ć) 0
    0x87 (�) 0

    multi-byte.c:

    #include <wchar.h>
    #include <wctype.h>
    #include <stdio.h>
    #include <locale.h>
    int main()
    {
    /*
    * "żółć" - four letter polish word (meaning f.e. "bile") in utf-8 encoding
    */
    const char *str_mb = "żółć";
    wchar_t str[666];
    wchar_t *p;
    int alnum;
    int ret;
    setlocale(LC_CTYPE, "pl_PL.utf8");
    ret = mbsrtowcs(str, &str_mb, 666, 0);
    if (ret < 0) {
    printf("mb conversion failed\n");
    return 1;
    }
    printf("(%S)\n", str);
    for (p = str; *p != '\0'; p++) {
    alnum = iswalnum(*p);
    printf("0x%08x\t(%S)\t%i\n", (int)*p, p, alnum);
    }
    printf("\n");
    return 0;
    }

    $ ./multi-byte
    (żółć)
    0x0000017c (żółć) 1
    0x000000f3 (ółć) 1
    0x00000142 (łć) 1
    0x00000107 (ć) 1

     
  • Szymon Siwek
    Szymon Siwek
    2009-01-03

    example multibyte not aware program

     
    Attachments
  • Szymon Siwek
    Szymon Siwek
    2009-01-03

    File Added: one-byte.c

     
  • Szymon Siwek
    Szymon Siwek
    2009-01-03

    example multibyte aware program

     
    Attachments
  • Szymon Siwek
    Szymon Siwek
    2009-01-03

    File Added: multi-byte.c

     
  • Szymon Siwek
    Szymon Siwek
    2009-01-03

    One more thing about "conversion to '_'" code - it converts also [:blank:], [:punct:] etc,
    so my timezone "Europa Środkowa (czas stand.)" is converted to "Europa___rodkowa__czas_stand__"

     
  • Mark Ellis
    Mark Ellis
    2009-01-03

    Ah yes, I see. The problem is I'm not a sync or rra guru, so I don't know how this string is actually used. This one may be a bit trickier.