Menu

#127 Not able to log unicode charecter on Linux Platform.

v1.0.4.1
closed
Appender (3)
7
2012-06-01
2011-11-30
No

We are using log4cplus as our logging framework.

log4cplus(1.0.4c11) has been compiled with -DUNICODE flag.

When you try to log unicode charecter using log4cplus (1.0.4rc11) on Linux then your logger crashed (means it not able to log anything after that).

This can be easily reproducible by using tests/fileappender_test test case.
To reproduce
1. cd tests/fileappender_test
2. open main.cxx file
3. modify this string "Entering loop #" by "Entering dépendant loop #"
4. make clean
5. make

log4cplus: Just opened file: Test.log
log4cplus:WARN RollingFileAppender: MaxFileSize property value is too small. Resetting to 204800.
log4cplus:ERROR file is not open: Test.log
log4cplus: Destroying appender named [First].
log4cplus: Destroying appender named [First].

Discussion

  • Premanand Patil

    Premanand Patil - 2011-11-30
    • priority: 5 --> 7
     
  • Premanand Patil

    Premanand Patil - 2011-12-08
    • assigned_to: nobody --> tcsmith
     
  • anokl

    anokl - 2011-12-13

    Actually there are two problems here. First one is that a stream is closed after inserting "é" character. The second problem is that the stream is never opened again. The parameter reopenDelay is set to 1 by default. If I set it explicitly to 0 the stream is reopened.
    So the second problem looks like a bug in log4cplus. The first one looks like a general Linux problem with wide char streams. In the following code the stream is in !good() state after flush:

    #include <fstream>
    #include <iostream>

    int main() {
    std::wofstream of("test.txt");
    of.imbue(std::locale());
    if (!of.good()) {
    std::cout << "Not good 1 !!!!" << std::endl;
    return 1;
    }
    of << L"Entéring ...";
    std::wcout << L"Entéring ..." << std::endl;
    of.flush();
    if (!of.good()) {
    std::cout << "Not good 2 !!!!" << std::endl;
    return 1;
    }
    of.close();
    return 0;
    }

     
  • anokl

    anokl - 2011-12-13

    Can anybody clarify why the stream is in not good state in the previous example?

     
  • Václav Haisman

    Václav Haisman - 2012-01-18
    • assigned_to: tcsmith --> wilx
     
  • Václav Haisman

    Václav Haisman - 2012-01-24
    • milestone: --> 1456447
     
  • Václav Haisman

    Václav Haisman - 2012-01-24

    Basically, when you try to write using wchar_t streams there is a conversion step from wchar_t to char. I am not sure what the C++ ISO standard says about it but how the conversion is achieved depends on your OS' and standard library's capabilities/std::locale implementation.

    Log4cplus does not do anything with the output file stream's locale. By default the locale is the global locale at the time the stream is created. That locale is likely the POSIX or C locale which does not need to handle anything but pure 7 bits of ASCII character set.

    When you try to log the L'é' character, the locale does not know how to convert it into char and sets the fail bit on the stream.

    There are some possibilities how to remedy this even with log4cplus 1.0.4.

    1) Wrap FileLogger, override the append(), and call clear() on the output stream if the failbit is set. (Or just unset the single bit, that might be safer.)

    2) Wrap FileLogger, imbue the output stream with the appropriate locale.

    (I have done both of these in the past with log4cplus 1.0.2.)

    3) Call std::locale::global(std::locale()). The FileLogger output stream should inherit this global locale. This assumes (e.g., on Linux) that your LANG environment variable is set to something appropriate. Unfortunately, setting the global locale affects stuff like number formatting as well, which might be a show stopper for you.

    4) Use 2) but with just modified locale with added utf8_codecvt_facet, which you would have to get out of Boost.

    5) Try head of PRODUCTION_1_0_x branch and imbue the output stream of FileLogger with own std::locale using its new setloc() member function.

     
  • Václav Haisman

    Václav Haisman - 2012-03-19
    • milestone: 1456447 --> v1.0.4.1
     
  • Václav Haisman

    Václav Haisman - 2012-05-31

    Have you used any ./configure script flags?

     
  • Václav Haisman

    Václav Haisman - 2012-05-31

    (In case the text gets mangled by SourceForge or your browsers, I am attaching it in UTF-8 text file as well.)

    Ok, I think I have reproduced your problem with log4cplus 1.0.4.1
    release.

    The problem has two layers:

    1. There is the std::locale::global (LOCALE); line that is or is not
    in your source, where LOCALE is either std::locale ("") or
    std::locale ().

    2. ./configure script flags: NONE, --with-working-locale (and
    --with-working-c-locale or --with-iconv)

    This gives us 6 combinations of these two parameters. To test this I
    have used ftests/fileappender_test with "Entering loop číslo/番:"
    string to log. Log4cplus was compiled with CPPFLAGS="-DUNICODE=1".

    (1)
    no parameters to ./configure
    std::locale::global: missing

    The test breaks exactly as you have reported:

    log4cplus: Just opened file: Test.log
    log4cplus:WARN RollingFileAppender: MaxFileSize property value is too small. Resetting to 204800.
    log4cplus:ERROR file is not open: Test.log
    log4cplus: Destroying appender named [First].
    log4cplus: Destroying appender named [First].

    (2)
    no parameters to ./configure
    std::locale::global(std::locale());

    Same result as for (1).

    (3)
    no parameters to ./configure
    std::locale::global(std::locale(""));

    The logger keeps on logging. ASCII text goes through unmodified,
    numbers get formatted according to my locale (my environment variable
    LANG is set to en_US.UTF-8). The testing Unicode characters pass
    through unmodified as well. I see 4 bytes per character
    (sizeof(wchar_t) == 4 on Linux platforms).

    (4)
    std::locale::global: missing
    ./configure with --with-working-locale

    The logging works but the non-ASCII characters are replaced by
    question marks:

    600 [139678217910080] DEBUG test.subtest <loop> - Entering loop ????slo/??????19759

    (5)
    std::locale::global(std::locale());
    ./configure with --with-working-locale

    Same result as for (4).

    (6)
    std::locale::global(std::locale(""));
    ./configure with --with-working-locale

    The logging works, numbers are formatted according to locale
    (en_US.UTF-8) and the non-ASCII characters show up as well:

    165 [139,711,023,560,512] DEBUG test.subtest <loop> - Entering loop číslo/番:18,467

    CONCLUSION

    The whole matter is unfortunately very complicated. The result depends
    on your locale settings, on capabilities of your compiler and standard
    C++ library. It also depends on compile time settings of log4cplus
    itself.

    Unless you really (really) have to handle wchar_t strings, then I
    strongly suggest that you do not compile log4cplus with -DUNICODE
    switch. Most of Linux distributions these days use UTF-8 as their
    native encoding and that passes through to the Test.log file unharmed
    in all cases (that I have tested) whether the --with-working-locale
    switch is present or not, whether any std::locale::global (LOCALE)
    line is present or not.

    Conclusion of the conclusion, I suggest configuring log4cplus on
    _Linux_ platforms with GCC with the --with-working-locale switch and
    _without_ -DUNICODE.

     
  • Václav Haisman

    Václav Haisman - 2012-05-31

    IMPROVING OUTPUT WITH -DUNICODE

    (7)
    You can improve the output of solution (6) by avoiding locale specifc
    formatting using combined locale. The following snippet sets global
    locale using existing global locale and codecvt facet from your user
    locale.

    std::locale::global (
    std::locale (
    std::locale (),
    new std::codecvt_byname<wchar_t, char, std::mbstate_t>("")));

     
  • Václav Haisman

    Václav Haisman - 2012-05-31
     
  • Václav Haisman

    Václav Haisman - 2012-06-01

    I am closing this as "works for me" as the text below demonstrates how to log non-ASCII text with log4cplus with or without -DUNICODE. Feel free to reopen this bug or fill another if you feel that the solution is not satisfactory.

     
  • Václav Haisman

    Václav Haisman - 2012-06-01
    • labels: --> Appender
    • status: open --> closed-works-for-me
     

Log in to post a comment.