Menu

#597 LLD built with mingw-w64-based GCC deadlocks

v1.0 (example)
open-accepted
niXman
None
5
2019-03-29
2017-03-20
awson
No

parallel_for_each always deadlocks (and parallel_for sporadically deadlocks) in LLD built with Msys2 GCC 6.2 64-bit.

Knowing that threading implementation in this GCC build is pretty buggy (see e.g. #445 which is still unresolved) I tried GCC based on the alternative threading implementation by lhmouse (20170313 build) and found that this doesn't help. Deadlocks remain.

I'm not sure if this a mingw-w64 or gcc/libstdc++ or even LLD problem.

Unfortunately, we have nothing to compare with on Windows because Visual C++ build of LLD uses completely different code. I believe LLD is much better tested on Linux though and I presume they have no problems with this implementation. This is why I've decided to file the bug here.

Discussion

  • awson

    awson - 2017-03-22

    To further clarify the situation I've modified Parallel.h to make Visual C++ use the same codepath as gcc. Thus build LLD works flawlessly with no deadlocks.

    Now we have: exactly the same code works when built with non-windows gcc, also it works when built with Visual C++ (I've used Visual Studio 2017 compiler), but it doesn't work (deadlocks) when built with mingw-w64 gcc (either from Msys2 distro or lhmouse distro).

     
  • awson

    awson - 2017-03-22

    Ah, sorry! I was horribly wrong regarding lhmouse build! It turned out it works just fine!

    I've mistested it since wrong libwinpthread-1.dll was picked up (that from MSys2 build).

    Then both Visual C++ and lhmouse mingw-w64 threading work just fine, and only vanilla mingw-w64 threading doesn't work.

    I've also isolated failing code from llvm codebase. Attached partest.cpp is completly self-contained.

    Btw, thank you, lhmouse, for your great work!

     
  • niXman

    niXman - 2017-03-22

    Attached partest.cpp is completly self-contained.

    ?

     
  • awson

    awson - 2017-03-22

    Sorry, forgot it. Here it is.

     
    • niXman

      niXman - 2017-03-22

      Using gcc-6.3.0-x86_64 on MSYS2 your code always print 50075028, but not deadlocks.

      Ideas?

       
      • awson

        awson - 2017-03-22

        Use -O2. It should deadlock (not each and every time, but pretty often).

         
        • niXman

          niXman - 2017-03-22

          Compiled using: g++ -O2 partest.cpp -opartest -pthread

          Executed as: for i in seq 0 1000; do ./partest.exe; done

          No deadlocks...

           

          Last edit: niXman 2017-03-22
        • LIU Hao

          LIU Hao - 2017-03-26

          It doesn't seem reproducible using the following command:

          $ g++ partest.cpp -O2 && for i in $(seq 0 1000); do ./a.exe; done
          

          x64 Windows 7, Xeon E3-1230v3, 8 logical cores.

           
          • awson

            awson - 2017-03-26

            It doesn't deadlock when built with your gcc, but deadlocks when built with vanilla MSys2 gcc, see below, @niXman confirmed this.

             
            • LIU Hao

              LIU Hao - 2017-03-26

              Gotcha. LLD doesn't build without patches in MSYS2. Let's move the discussion elsewhere.

               
      • awson

        awson - 2017-03-22

        Hm, it deadlocks for me even if built with no optimization. Did you run it sufficiently many times? What CPU are you on? I've run it on old 4-core Yorkfield, will try 4/8 Sandy.

         
        • niXman

          niXman - 2017-03-22

          it deadlocks for me even if built with no optimization.

          The same result.

          Did you run it sufficiently many times?

          Executed as: for i in seq 0 1000; do ./partest.exe; done

          What CPU are you on?

          VirtualBox, Win7-x86_64, i7-6700K

           
          • awson

            awson - 2017-03-22

            I tried Sandy, vanilla variant deadlocks either with or with no optimization enabled. I have Windows 10 1607 64-bit on both machines.

            How many virtual cores is VM configured with? Anyway, I believe the cuplrit is VM usage.

             
            • niXman

              niXman - 2017-03-22

              How many virtual cores is VM configured with?

              1 =)

              Sorry, I'll retest now...

               
              • niXman

                niXman - 2017-03-22

                I can catch a deadlock!

                I'll try to investigate this...

                 
  • niXman

    niXman - 2017-03-22
    • status: open --> open-accepted
    • assigned_to: niXman
     
  • LIU Hao

    LIU Hao - 2017-03-30

    We had a discussion about this problem on IRC and Kai said the second problem might not get fixed.

    [20:51:36] <lh_mouse> jon_y, you might want to have a look at this article about implementing condition variables on Windows: http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
    [20:51:38] <m2bot> Title: Strategies for Implementing POSIX Condition Variables on Win32 (at www.cs.wustl.edu)
    [20:52:22] <jon_y> lh_mouse: I get the impression Kai wanted winpthreads to be used underneath
    [20:52:30] <jon_y> so I haven't worked on it for quite some time
    [20:52:36] <lh_mouse> meh.
    [20:53:25] <lh_mouse> The 3rd solution in that article has the drawback I mentioned a few hours ago: in pthread_cond_broadcast the calling thread wait for other threads to be woken up, which could result in deadlocks.
    [20:55:39] <lh_mouse> 1. A thread sleeps on a CV and bumps the counter to one. 2. Another thread calls ExitProcess(), terminating the first thread. 3. The second thread calls pthread_cond_broadcast, it finds the counter is one, so it tries waking up one thread and waits for its notification. Since the first thread has already been terminated, a deadlock occurs.
    [20:56:28] <jon_y> bah
    [20:57:16] <jon_y> lh_mouse: rely on tls for clean ups :(
    [20:57:40] <lh_mouse> jon_y, yes that was why LLD deadlocks upon exit.
    [20:58:29] <lh_mouse> LLD people call pthread_cond_broadcast in the destructor of a static object which is invoked inside a DLL entry point function (with dwReason set to DLL_PROCESS_DETACH of course).
    [20:58:39] <lh_mouse> and it deadlocks. :<
    [21:03:23] <ktietz> this is broken by design
    [21:04:01] <lh_mouse> I would agree with you anyway but that is just what LLD looks like now.
    [21:05:42] <ktietz> for windows, if detach process is invoked, anything crt related to this dll (see dllonexit & co), is already executed. So invoking such things in Dll's entry point is pretty much broken
    [21:08:08] <jon_y> make a bookkeeping thread :)
    [21:08:12] <jon_y> more threads!
    [21:08:21] <lh_mouse> destructors of static objects are run upon DLL_PROCESS_DETACH notification... that is how mingw-w64 works at the moment, no?
    [21:09:18] <lh_mouse> jon_y, impossible. ExitProcess() calls NtTerminateProcess(NULL, 0) before it does any other thing, which brutely kills any other thread.
    [21:09:47] <jon_y> oh well
    [21:11:48] <lh_mouse> Here is it https://sourceforge.net/p/mingw-w64/bugs/597/ . This ticket contains two bugs, though.
    [21:11:50] <m2bot> Title: MinGW-w64 - for 32 and 64 bit Windows / Bugs / #597 LLD built with mingw-w64-based GCC deadlocks (at sourceforge.net)
    [21:12:48] <lh_mouse> And here is a stack backtrace that shows how the deadlock happens: https://github.com/lhmouse/mcfgthread/issues/21#issuecomment-289288022
    [21:12:51] <m2bot> Title: Deadlock in exit(0) (LLD) · Issue #21 · lhmouse/mcfgthread · GitHub (at github.com)
    [21:13:17] <lh_mouse> The destructor in LLD (#6) is invoked from crtdll.c (#9).
    [21:14:19] <jon_y> have some bookkeeping that tracks every C11 thread created?
    [21:14:26] <ktietz> well, this is a bug we won't be able to fix. sorry to say that
    [21:14:43] <jon_y> yeah it all involves Cygwin-like tracking
    [21:16:11] <lh_mouse> too sad.

     
  • LIU Hao

    LIU Hao - 2017-04-14

    Even Microsoft's own condition variable suffers from this problem.
    The following code effectively results in a crash on my Windows 7:

    #define _WIN32_WINNT 0x0601
    #include <windows.h>
    
    static SRWLOCK g_srwLock = SRWLOCK_INIT;
    static CONDITION_VARIABLE g_condEvent = CONDITION_VARIABLE_INIT;
    
    static DWORD __stdcall ThreadProc(LPVOID pParam){
        (void)pParam;
        SleepConditionVariableSRW(&g_condEvent, &g_srwLock, INFINITE, 0);
        return 0;
    }
    
    int main(){
        HANDLE hThread;
    
        AcquireSRWLockExclusive(&g_srwLock);
        hThread = CreateThread(0, 0, &ThreadProc, 0, 0, 0);
        if(!hThread){
            __builtin_trap();
        }
        Sleep(1000);
        /* Terminate the thread while it is sleeping on the CV.*/
        TerminateThread(hThread, 0);
        CloseHandle(hThread);
        /* Wait for the thread to be terminated...
           But is this really needed? */
        Sleep(1000);
        /* The following line result in a hang or crash. */
        WakeAllConditionVariable(&g_condEvent);
    
        return 0;
    }
    
    LH_Mouse@LH_Mouse-PC  /e/Desktop
    $ gcc test.c -Wall -Wextra -pedantic -std=c99 -O3
    
    LH_Mouse@LH_Mouse-PC  /e/Desktop
    $ ./a.exe
    Segmentation fault
    
     
  • Andrew Ng

    Andrew Ng - 2019-03-22

    I have run into this issue just recently, so it is still a problem!

    I've done some investigation and it appears to be related to the use of std::condition_variable in "lib/Support/Parallel.cpp" of LLVM. In this file, there is a use of notify_one() which is called without the lock for the mutex used in the associated wait(). This is valid code according to the specification of std::condition_variable.

    However, the default MinGW libstdc++ runtine is built using POSIX threading and thus the std::condition_variable implementation makes use of the MinGW implementation of Winpthread. The notify_one() uses pthread_cond_signal() and it seems that this is where the deadlock is occurring. Whilst investigating, I did stumble across quite an old deadlock bug related to this implementation of pthread condition variable and basically the bug was closed as incorrect usage, i.e. it seems the implementation is only guaranteed to work if the "wait" lock is acquired when pthread_cond_signal() is called. I'm not sure that this is even a requirement of pthread_cond_signal(), but it certainly makes it incompatible with the implementation of std::condition_variable.

    I believe that this can be worked around by just removing the "unlock" before the notify_one() in the LLVM source code. Or perhaps MinGW libstdc++ can be built using native threading, if that is supported. The best fix would be to fix the MinGW pthread implementation.

    Cheers,
    Andrew

     
  • LIU Hao

    LIU Hao - 2019-03-22

    Yes it has been reported here but apparently nobody has been working on it. Sorry for that.

     
  • Andrew Ng

    Andrew Ng - 2019-03-28

    Well as no one has looked into this, I've taken a look and I think I have a fix and some minor improvements too. I think I'll need to test it a bit longer, but so far so good.

    It certainly does not deadlock in the various test cases and mostly fixes LLD. However, there are still LLD deadlocks when running the LLD lit tests and I believe that these are actually related to LLD exit causing deadlock (something else that I've had the misfortune of looking at).

     
  • LIU Hao

    LIU Hao - 2019-03-29

    If you have any ideas please send a mail to mingw-w64-public@lists.sourceforge.net. We together make the software better. Thanks in advance.

     

Log in to post a comment.