Menu

Pantheious MT safety on different level msgs

2009-07-02
2013-05-20
  • Jacobo Cabaleiro

    I've been testing MT safety of Pantheios over Linux (Fedora 11) [gcc 4.4]
    I had no problems building Pantheious, just replacing on the gcc43 makefile gcc43 by gcc44 worked.

    I'm running pantheios-1.0.1-beta189. 32 threads are writing log messages thru Pantheios, but not always simultaneously (due to some random sleeps on the different threads). Of course, multiple simultaneous access happen sometimes. Log messages can be generated of two different levels: pantheios::informational (most of them) or pantheios::critical (just 1 per thread throughout each thread's life). Test is using be.file backend. Tests are being carried on a dual core system.

    What I discovered is that apparently "thread safety" is not maintained between different message levels. This is part of the generated log file:
    ----
    [...]
    [pantheios_test.140008514537744, Jul 02 13:39:30.927; Critical]: Thread 4 has reached the middle of its life
    [pantheios_test.140007580879120, Jul 02 13:39:31.324; Critical]: Thread 15 has reached the middle of its life
    [pantheios_test.140008128239888, Jul 02 13:39:31.560; Critical]: Thread 8 has reached the middle of its life
    [pantheios_test.140007228524816, Jul 02 13:39:31.686; Critical]: Thread 29 has reached the m[pantheios_test.140007259994384, Jul 02 13:39:31.685; Informational]: (thread 26) iter 49974
    [pantheios_test.140007259994384, Jul 02 13:39:31.730; Critical]: Thread 26 has reached the middle of its life
    [pantheios_test.140008556497168, Jul 02 13:39:31.801; Critical]: Thread 0 has reached the middle of its life
    [pantheios_test.140007218034960, Jul 02 13:39:31.937; Critical]: Thread 30 has reached the middle of its life
    [[pantheios_test.140007280974096, Jul 02 13:39:32.178; Critical]: Thread 24 has reached the [[pantheios_test.140008556497168, Jul 02 13:39:32.174; Informational]: (thread 0) iter 50312
    [...]
    ----
    What I was expecting, in case of a MT aware core - FEs and BEs, was a log were entries were interleaved but no entry was cut by another one, as can easily happen if sending messages directly thru std::cout.

    Test code is certainly short. Here's the important part:
    ----
    void thread_func(unsigned int thread_n)
    {
         boost::mt19937 rng;
         rng.seed(thread_n);
         boost::uniform_int<> ms_uniform_distr(10, 999);
         typedef boost::variate_generator<boost::mt19937 &, boost::uniform_int<> > MyGenerator;
         MyGenerator duration_generator(rng, ms_uniform_distr);

         for (unsigned int i = 0; i < MSGS_PER_THREAD; ++i) {
              pantheios::log(pantheios::informational,
                             "(thread ", pantheios::integer(thread_n),
                             ") iter ", pantheios::integer(i));
              if (i == MSGS_PER_THREAD / 2)
                   pantheios::log(pantheios::critical,
                                  "Thread ", pantheios::integer(thread_n),
                                  " has reached the middle of its life");
              unsigned int duration = duration_generator();

              boost::this_thread::sleep(microseconds(duration));
         }
    }
    ----

    Is this a known issue? Is this an issue at all or I'm misreading the log output? This is my first contact with Pantheios, but maybe I can take a look at its code, but I would like to hear from any experienced user before touching the inners :)

    Thanks in advance

     
    • Jacobo Cabaleiro

      Mmmm... I was mistaken. Certainly this also happens with messages from the same level.

      Extract of a different test run generated log file
      ----
      [...]
      [pantheios_test.139672013412624, Jul 02 17:15:40.352; Informational]: (thread 2) iter 255
      [pantheios_test.139670698551568, Jul 02 17:15:40.352; Informational]: (thread 30) iter 46
      [pantheios_test.139671789103376, Jul 02 17:15:40.352; Informational]: (thread 4) iter 257[pantheios_test.139670906198288, Jul 02 17:15:40.350; Informational]: (thread 17) iter 43

      [pantheios_test.139670751000848, Jul 02 17:15:40.352; Informational]: (thread 25) iter 44
      [pantheios_test.139670761490704, Jul 02 17:15:40.352; Informational]: (thread 24) iter 44
      [pantheios_test.139671799593232, Jul 02 17:15:40.352; Informational]: (thread 3) iter 248
      [pantheios_test.139670730021136, Jul 02 17:15:40.352; Informational]: (thread 27) iter 46
      [pantheios_test.139670709041424, Jul 02 17:15:40.353; Informational]: (thread 29) iter 47
      [...]
      ------

       
    • Matt Wilson

      Matt Wilson - 2009-07-03

      This is surprising behaviour.

      The first thing I need to ask is: are you linking to the multithreaded version of the library, i.e. pantheios.1.be.file.gcc44.mt(.debug) ? If not, then the locking within the API will not be happening.

      However, even if that is the case, you should not see mixed output. In a UNIX build, there are only two places where output is written to the underlying files, and in each case an entire statement is presented to the system API function write(). Linux guarantees that write() is an atomic operation. So, even if you're using the single-threaded build, you should see defective behaviour within the Pantheios sub-system, but not mixed output in the file.

      Can you send me a smallest possible working sample that illustrates the problem (preferably without Boost dependency), and we'll take it from there.

      Matt

       
    • Jacobo Cabaleiro

      Hi Matt. First thank you for your fast answer.

      Yes, I'm linking against *.gcc44.mt.debug. I'm sure, as I fought for 3 - 4 hours with the linker until I was able to correctly link Pantheios against my app :)

      OK. I'm going to get rid of Boost. And I'll test this again, just in case the behavior does not get reproduced with the changes.

       
      • Matt Wilson

        Matt Wilson - 2009-07-03

        > Hi Matt. First thank you for your fast answer.

        You're welcome.

        > Yes, I'm linking against *.gcc44.mt.debug. I'm sure, as I fought for 3 - 4 hours with the linker until I was able to correctly link Pantheios against my app :)

        Yes, it's the one aspect of Pantheios that's tricky. Have you read FAQ Q13?

        > OK. I'm going to get rid of Boost. And I'll test this again, just in case the behavior does not get reproduced with the changes.

        Thanks.

         
    • Jacobo Cabaleiro

      Done.

      Just Posix and std calls. The behavior is the same:
      ---- cut ----
      [pantheios_test.140462130194704, Jul 03 11:51:58.260; Informational]: (thread 22) iter 15000
      [pantheios_test.140462130194704, Jul 03 11:51:58.260; Critical]: Thread 22 has reached the middle of its life
      [pantheios_test.140462232942864, Jul 03 11:51:58.260; Informational]: (thr[pantheios_test.140463250065680, Jul 03 11:51:58.260; Informational]: (thread 6) iter 13921
      [pantheios_test.140462274902288, Jul 03 11:51:58.260; Informational]: (thread 15) iter 14745
      -----
      ----- cut ----
      [pantheios_test.140462734174480, Jul 03 11:51:58.283; Informational]: (thread 10) iter 14913
      [p[pantheios_test.140462077745424, Jul 03 11:51:58.283; Informational]: (thread 27) iter 1333[p[pantheios_test.140462098725136, Jul 03 11:51:58.283; Informational]: (thread 25) iter 15141
      ntheios_test.140463260555536, Jul 03 11:51:58.284; Informational]: (thread 5) iter 14718
      [pantheios_test.140462222453008, Jul 03 11:51:58.284; Informational]: (thread 20) iter 14883
      ----

      How do you want me to send you the code?

       
      • Matt Wilson

        Matt Wilson - 2009-07-04

        I received the code. Thanks

         
    • Jacobo Cabaleiro

      Certainly I didn't read Q13. But it was not lost time as I refreshed my knowledge about binutils ld static library linking process. What surprised me was that implicit linking did not work. I tried taking a look at preprocessor output, to see what was happening, but left that path early after considering that directly controlling the linker could give me results faster.

      Mmm.... I don't know why but SF reversed the order of my two previous posts. Code is ready and without boost it shows the same behaviour. How do you want me to send you the test case?

       
      • Matt Wilson

        Matt Wilson - 2009-07-04

        You say you're surprised that implicit linking did not work. Do you mean you're surprised that GCC does not support implicit linking, or were you trying it with another compiler?

         
    • Matt Wilson

      Matt Wilson - 2009-07-04

      I've been able to reproduce identified what I think was causing the issue. Ironically, the code in bec.file.cpp is correct, but the file was missing the crucial inclusion of pantheios/internal/threading.h, which defines the symbols that the source file was needing to define the correct mutex type.

      The fix, along with my test program, are included in 1.0.1 beta 191, just released: see https://sourceforge.net/forum/forum.php?forum_id=974890

      Thanks for bringing it to my attention. The reason that this has not occured in any of the commercial deployments is that none of them are using the be.file back-end. Remember, Pantheios is a logging *API* library, and that the backends provided are deliberately the minimum usefulness to get people up and running to then make decisions as to which logging library they may wish to wrap with it.

       
    • Jacobo Cabaleiro

      Sorry for the delay on reporting back, even if it was not really necessary.

      Tested newest beta (192) and, as you said, it's now solved.

      Thanks

       
  • gavin

    gavin - 2011-08-14

    I am not sure if my problem relative to the MT issue in Pantheios, but it start to core dump when I add OpenMP support in my compiler option, see below:
    1.   I am using gcc 4.4.2. and everything runs ok (even in MT environment).
    2.  I start to add "-fopenmp" compiler option to enable the support of the OpenMP , and in one of my share object add one simple open directive.
    3. then my application start to core while try to load this shared object and get an exception like:
           terminate called after throwing an instance of 'stlsoft::unixstl_project::synchronisation_exception'
                  what():  Mutex lock failed: Invalid argument
        stack trace of that show it's happening during patheoio's initliazer

    Any idea? thank in advance!

     
  • gavin

    gavin - 2011-08-14

    Please ignore my previous post, it turns out that it's a bug in my application that it did not call "pantheios::init()" properly.

     

Log in to post a comment.