Menu

#5 RNG_test with multithreaded options reaches deadlock after processing 512 gigabytes

v1.0_(example)
closed-fixed
nobody
None
5
2014-11-26
2014-11-18
Jirka
No

Hi,

I experience following bug for RNG_test. When running

RNG_test sfc64 -tlmax 128T -multithreaded

after 512 gigabytes has been processed, RNG_test process gets stalled. It seems like deadlock situation has been reached. ps -eLf shoes that no threads are running and strace -p <pid> shows series of nanosleeps:</pid>

nanosleep({0, 1000}, NULL) = 0
nanosleep({0, 1000}, NULL) = 0

It happens for every tested RNG and both in versions 0.90 and 0.91.

To reproduce it please run

RNG_test sfc64 -tlmax 128T -multithreaded

After 512 gigabytes will be processed you should see that system load goes to zero and RNG_test does not do any work.

Thanks
Jirka

Related

Bugs: #5

Discussion

  • - 2014-11-18

    Trying to reproduce now.

    Looking at the code, I'd say it's probably clean of race conditions. If a test crashed, that might cause it to wait indefinitely for the test to complete - but so far as I know PractRand tests don't make a habit of crashing. If pthread_create or CreateThread failed, that would likewise leave it waiting indefinitely - I never got around to checking the return value on those. That's actually sounding like the most plausible explanation to me at the moment... if I'd heard that before releasing .92 I would have tried to make it pass any pthread_create or CreateThread errors along to the user. man pages / MDSN suggests a couple of possibilities for why those functions might start giving me errors after a while of execution - I think I'm not cleaning up after myself properly in some ways, though it wouldn't matter if I were using a proper pool of worker threads instead of a quick-and-dirty solution.

     
  • - 2014-11-18

    I can't reproduce it. On win32 anyway. Are you using pthreads or win32 threading? Anyway, I have some ideas that might fix it for 0.93.

     
  • Jirka

    Jirka - 2014-11-18

    Hi,

    I run RNG_test on Linux. I will try it with 0.92 version and let you know the outcome.

    Jirka

     
  • - 2014-11-18

    Okay, my best guess (haven't finished a linux test run yet):

    What's happening is that I'm leaking resources in Threading::create_thread, which eventually causes it to refuse to launch more threads for me. Immediately after the lines:
    pthread_t thread;
    pthread_create(&thread, NULL, func, param);

    There should be a third line:
    pthread_detach(&thread);

    To let it know that it should clean up after the thread automatically when it terminates. The win32 code already does that correctly. It'll take me several hours to confirm that that is what's going on and this fixes it, but good chance that's it. I'll have it fixed in 0.93, you can patch your copy of 0.92 if you want, that's in tools/multithreading.h

     
  • Jirka

    Jirka - 2014-11-18

    I will try to apply the patch.

    0.92 (vanilla) version exhibits the same bug - it happens again after 512 gigabytes of data has been processed.

    rng=sfc64, seed=0x6c08d0b3
    length= 512 gigabytes (2^39 bytes), time= 2485 seconds
    no anomalies in 276 test result(s)

    strace shows that RNG_test does nothing but calling
    nanosleep({0, 1000}, NULL) = 0
    nanosleep({0, 1000}, NULL) = 0

     
  • Jirka

    Jirka - 2014-11-18

    Short update: the proposed patch should be

    pthread_detach(thread);

    and not

    pthread_detach(&thread);

    I have started the test and will report the results back.

    Thanks
    Jirka

     
  • Jirka

    Jirka - 2014-11-18

    I have tested the proposed patch but unfortunately it did not help. I still experience the same behaviour. After processing of 512 gigabytes of data program is only calling nanosleep in a loop.

     
  • - 2014-11-19

    I can't reproduce your behavior. With the original, on win32 it seemed to just work for me. On my (single-core) linux VM it did the first 512 GB in a little over 2 hours, and failed to reach 1 TB for the next 5 hours before I aborted it - but CPU usage remained at 100% until it was aborted, unlike your observed behavior. With the patch applied, it worked correctly for me on both win32 and linux.

    I'm not going to be up for any more testing today. If you want my next best idea: My linux VM doesn't seem to mind, but I just noticed that according to the docs pthread_mutex_destroy isn't supposed to be called on locked mutexes. So, on about line 17 of tools/MultithreadedTestManager.h where it says "delete this;", there should be a "lock.leave();" immediately before that.

     
  • Jirka

    Jirka - 2014-11-19

    Hi,

    I will try that.

    Do you have any debug messages which I could enable (for example during the compilation)? The fact that the bug manifests after processing 0.5 TB of data makes it hard to debug the code. If you can propoase some way how to debug it, please let me know.

    Thanks
    Jirka

     
  • - 2014-11-20

    Unfortunately there's not any debugging framework for that. I could write one I suppose... may have to if I can't figure out what's going on on your box.

    I assume that neither fix did anything for you? Here's an idea that might be easier to test, though far less definitive that the previous two: in MultithreadedTestManager.h, in the constructors parameter list, the default value for max_buffer_amount_ is "1 << (27-10)", try changing it to "1 << (10-10)". That's around line 95.

    That won't fix anything, but it might make your problem happen a lot faster, like within the first minute or two. If so, well, that would be informative in its own right, and make other changes easier to test. That change should force it to create threads far faster (one set for every kilobyte instead of for every 128 megabytes), which should force any issue tied to total number of threads (or total number of mutexes) to occur much faster.

     
  • Jirka

    Jirka - 2014-11-21

    Hi,

    unfortunately, "lock.leave()" patch has not helped either.

    With "1 << (10-10)" problem occurs already after 128MiB as you have predicted. I will try to debug it, let's see if I can find anything.

     
    • Jirka

      Jirka - 2014-11-21

      I see one strange behaviour. With all patches applied

      • pthread_detach
      • lock.leave
      • buffer_amount_ being set to "1 << (10-10)"

      and RNG_test (just RNG_test) compiled with -O2 flag (instead of -O3)

      g++ -o RNG_test_O2 tools/RNG_test.cpp libPractRand.a -O2 -Iinclude -pthread -std=c++11
      

      the bug seems to be gone. When compiled with -O3 I still see the bug.

      Does it help you any further?

      I have now started new tests with buffer_amount_ being set to "1 << (27-10)" and with patches pthread_detach and lock.leave applied. RNG_test was compiled with -O2. I should have results by tomorrow

       
      • - 2014-11-21

        Well, isn't that fun. So it's either a compiler bug, or something evil I'm
        doing that it tolerates with optimization off. If it's dependent upon the
        version of gcc used that could explain why I can't reproduce the issue.

        Hm... something evil... the first thing that comes to mind is how my
        OS-independence layer hides the implementation - the array of Uint8s cast
        to a pthread_mutex_t is pretty evil. I wouldn't expect optimization to
        have trouble with it, but if I'm violating some obscure
        optimization-related constraint of the language that would be my first
        guess.

        I don't know how to tell my VMed Centos7 (a rebranded Redhat) to change gcc
        versions, but just in case I figure it out, what version of gcc are you
        using? I should probably test it out on MinGW too, but after my recent
        hard drive failure I no longer have a working MinGW install, another thing
        to fix sometime.

        On Thu, Nov 20, 2014 at 5:11 PM, Jirka jhladky@users.sf.net wrote:

        I see one strange behaviour. With all patches applied

        • pthread_detach
        • lock.leave
        • buffer_amount_ being set to "1 << (10-10)"

        and RNG_test (just RNG_test) compiled with -O2 flag (instead of -O3)

        g++ -o RNG_test_O2 tools/RNG_test.cpp libPractRand.a -O2 -Iinclude -pthread -std=c++11

        the bug seems to be gone. When compiled with -O3 I still see the bug.

        Does it help you any further?

        I have now started new tests with buffer_amount_ being set to "1 <<
        (27-10)" and with patches pthread_detach and lock.leave applied. RNG_test
        was compiled with -O2. I should have results by tomorrow


        Status: open
        Group: v1.0_(example)
        Created: Tue Nov 18, 2014 12:58 AM UTC by Jirka
        Last Updated: Fri Nov 21, 2014 12:06 AM UTC
        Owner: nobody

        Hi,

        I experience following bug for RNG_test. When running

        RNG_test sfc64 -tlmax 128T -multithreaded

        after 512 gigabytes has been processed, RNG_test process gets stalled. It
        seems like deadlock situation has been reached. ps -eLf shoes that no
        threads are running and strace -p <pid> shows series of nanosleeps:</pid>

        nanosleep({0, 1000}, NULL) = 0
        nanosleep({0, 1000}, NULL) = 0

        It happens for every tested RNG and both in versions 0.90 and 0.91.

        To reproduce it please run

        RNG_test sfc64 -tlmax 128T -multithreaded

        After 512 gigabytes will be processed you should see that system load goes
        to zero and RNG_test does not do any work.

        Thanks
        Jirka


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/pracrand/bugs/5/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

         

        Related

        Bugs: #5

    • - 2014-11-21

      I'm running awfully low on things that could cause it to sleep
      indefinitely. I do have another bug though: If I'm not mistaken, I have
      the two buffers mixed up in the memcpy call in multithread_prep_blocks -
      alt_buffer[0] should be buffer[0], and buffer[prefix_blocks + main_blocks -
      num_prefix_blocks]
      should be alt_buffer[prefix_blocks + main_blocks -
      num_prefix_blocks]
      .

      There's a (very) remote chance that could cause a test to crash, which
      would in turn cause the main thread to wait forever. Probably it just
      causes a negligible error in the test results though.

      That's in MultithreadedTestManager.h, around line 84 or so.

      On Thu, Nov 20, 2014 at 4:06 PM, Jirka jhladky@users.sf.net wrote:

      Hi,

      unfortunately, "lock.leave()" patch has not helped either.

      With "1 << (10-10)" problem occurs already after 128MiB as you have
      predicted. I will try to debug it, let's see if I can find anything.


      Status: open
      Group: v1.0_(example)
      Created: Tue Nov 18, 2014 12:58 AM UTC by Jirka
      Last Updated: Thu Nov 20, 2014 06:03 PM UTC
      Owner: nobody

      Hi,

      I experience following bug for RNG_test. When running

      RNG_test sfc64 -tlmax 128T -multithreaded

      after 512 gigabytes has been processed, RNG_test process gets stalled. It
      seems like deadlock situation has been reached. ps -eLf shoes that no
      threads are running and strace -p <pid> shows series of nanosleeps:</pid>

      nanosleep({0, 1000}, NULL) = 0
      nanosleep({0, 1000}, NULL) = 0

      It happens for every tested RNG and both in versions 0.90 and 0.91.

      To reproduce it please run

      RNG_test sfc64 -tlmax 128T -multithreaded

      After 512 gigabytes will be processed you should see that system load goes
      to zero and RNG_test does not do any work.

      Thanks
      Jirka


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/pracrand/bugs/5/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #5

  • Jirka

    Jirka - 2014-11-21

    I have tested it on two different boxes and with following two gcc versions:

    gcc (GCC) 4.7.2 20121109

    gcc (GCC) 4.8.2 20140120

    Behaviour was the same - with O3 it exhibits the bug (falling into infinitive loop of nanosleep calls) , with -O2 (and other fixes applied) it's running fine.

    I will test it with gcc 4.9 tomorrow.

    Could you please send me the fixed MultithreadedTestManager.h ? The latest memcpy patch is not so easy as previous ones.... You can attach the file directly below this form.

    Thanks
    Jirka

     
  • - 2014-11-23

    I'm running 4.8.2 20140120 here. So apparently it's not tied to the compiler. I've checked that I'm using -O3. It occurred to me that it could be an issue with the lack of multicore in my linux VM, as windows will switch mutex implementations depending upon the hardware, but switching it to multicore did not help either. And I tried reverting to an older codebase and applying only the first patch, in case something else I did caused it.

    No dice. I can't produce anything like what you're seeing once the first patch is applied. So... it's not tied to gcc version. Maybe pthreads version? I'm using NPTL 2.17 (identified via: getconf GNU_LIBPTHREAD_VERSION). Or glibc version? Hm... version number seems to be identical, 2.17 (identified via: ldd --version). To be clear, I'm building 64 bit binaries on linux, and a mix of 32 and 64 on windows.

    I don't see how this could have gone wrong, but just to make very sure, this is the corrected function from the first patch:
    void create_thread( THREADFUNC_RETURN_TYPE (THREADFUNC_CALLING_CONVENTION func)(void), void *param ) {
    pthread_t thread;
    pthread_create(&thread, NULL, func, param);
    pthread_detach(thread);
    }

    I'm kind of grasping at straws here. My next guess would be that my portability layers dirty tricks are causing problems, and needs the layer largely removed and/or extra "volatile" keywords added and/or to have the storage space declared at a larger alignment (I think my code should already force it to 64 bit alignment, but it may require up to 512 bit alignment, I can't tell). I'll add some ifdefed compiler-dependent code to improve alignment, for whatever that is worth - I know malloc/new won't do alignments above 64 bit on MSVC, regardless of type properties.

    I'll try to make the next version report all unexpected pthreads error codes. They may not actually help though, considering that this is dependent upon optimization levels and not reproducible here, but it might.

     
  • Jirka

    Jirka - 2014-11-24

    Hi,

    I have installed Fedora21 Beta with gcc 4.9.2 20141101 and problem is gone after applying pthread_detach(thread) patch. I cannot reproduce the problem any longer also with older compiler version so it seems I have made some mistake during the testing. It could also be that problem does not manifest any longer with newer pthread library version.

    In any case, I think that reporting unexpected pthreads error codes is a great idea and will definitely help in the future.

    During the testing on Fedora21 I had one crash but unfortunately core dumps where disabled and I was not able to reproduce it.

    Have you fixed MultithreadedTestManager.h regarding the memcpy? If so, could you please send me the fixed version?

    For the original problem reported here it seems it's fixed with pthread_detach(thread) patch.

    Thank you very much for your help on this!
    Jirka

     
    • - 2014-11-24

      Whew, finally. The first patch was really the only one that had a high
      chance of fixing things, everything else combined was long shots.

      Yeah, I fixed which buffer it was copying where. The corrected code looks
      like:
      std::memcpy(
      &buffer[0],
      &alt_buffer[prefix_blocks + main_blocks - num_prefix_blocks],
      PractRand::Tests::TestBlock::SIZE * num_prefix_blocks
      );
      (at aproximately line 62 of MultithreadedTestManager.h)

      To check if you have the correct code enabled, do test runs of the
      following two command line options:
      RNG_test sfc64 -tlmax 1MB -a -seed 0
      and
      RNG_test sfc64 -tlmax 1MB -a -seed 0 -multithreaded

      They should produce identical output. In particular, look at the p-values
      for FPF-14+6/16:all and FPF-14+6/16:all2, those are probably the easiest
      place to spot divergence. I was aware they weren't producing identical
      output sometimes, but the results had gotten close enough that it wasn't a
      priority to hunt down the last source or two of minor divergence.

      On Mon, Nov 24, 2014 at 3:57 AM, Jirka jhladky@users.sf.net wrote:

      Hi,

      I have installed Fedora21 Beta with gcc 4.9.2 20141101 and problem is gone
      after applying pthread_detach(thread) patch. I cannot reproduce the problem
      any longer also with older compiler version so it seems I have made some
      mistake during the testing. It could also be that problem does not manifest
      any longer with newer pthread library version.

      In any case, I think that reporting unexpected pthreads error codes is a
      great idea and will definitely help in the future.

      During the testing on Fedora21 I had one crash but unfortunately core
      dumps where disabled and I was not able to reproduce it.

      Have you fixed MultithreadedTestManager.h regarding the memcpy? If so,
      could you please send me the fixed version?

      For the original problem reported here it seems it's fixed with
      pthread_detach(thread) patch.

      Thank you very much for your help on this!
      Jirka


      Status: open
      Group: v1.0_(example)
      Created: Tue Nov 18, 2014 12:58 AM UTC by Jirka
      Last Updated: Sun Nov 23, 2014 06:23 PM UTC
      Owner: nobody

      Hi,

      I experience following bug for RNG_test. When running

      RNG_test sfc64 -tlmax 128T -multithreaded

      after 512 gigabytes has been processed, RNG_test process gets stalled. It
      seems like deadlock situation has been reached. ps -eLf shoes that no
      threads are running and strace -p <pid> shows series of nanosleeps:</pid>

      nanosleep({0, 1000}, NULL) = 0
      nanosleep({0, 1000}, NULL) = 0

      It happens for every tested RNG and both in versions 0.90 and 0.91.

      To reproduce it please run

      RNG_test sfc64 -tlmax 128T -multithreaded

      After 512 gigabytes will be processed you should see that system load goes
      to zero and RNG_test does not do any work.

      Thanks
      Jirka


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/pracrand/bugs/5/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #5

  • - 2014-11-26
    • status: open --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB