Menu

#859 thread_local destructors not called at thread exit

v1.0 (example)
open
nobody
None
5
2020-10-27
2020-10-09
No

I originally posted this to https://github.com/msys2/MINGW-packages/issues/7096.

I stepped through the disassembly, and the tls value that mingw was storing its dtor list in was NULL at the time the tls_callback was called. It appears that emutls is cleaned up before mingw's tls_callback is called for DLL_THREAD_DETACH (3).

Verified on both i686 and x86_64.

Thread 5 hit Breakpoint 2, 0x00406d70 in emutls_destroy ()
(gdb) c
Continuing.

Thread 5 hit Breakpoint 1, tls_callback (hDllHandle=0x400000, dwReason=3,
    lpReserved=0x0)
    at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/tls_atexit.c:89
89      in C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/tls_atexit.c

Test program:

#include <thread>
#include <string>

using namespace std;

thread_local string s("Hello");

int main()
{
        thread([] {
                s;
        }).join();
        return 0;
}

If built with g++ -std=c++11 -static-libgcc, breakpoints can be set on tls_callback and emutls_destroy (and std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() to see that it is never called).

Discussion

  • Jeremy Drake

    Jeremy Drake - 2020-10-20

    I neglected to mention, these destructors are registered with __cxa_thread_atexit, and despite commit messages to the contrary, the version of that from this project is used by gcc.

    I was thinking something along these lines (solves the simple test case), but then I worry that any thread_local that the destructors might reference would already be gone, too.

    diff --git a/mingw-w64-crt/crt/tls_atexit.c b/mingw-w64-crt/crt/tls_atexit.c
    index 1062106d..8651098c 100644
    --- a/mingw-w64-crt/crt/tls_atexit.c
    +++ b/mingw-w64-crt/crt/tls_atexit.c
    @@ -35,7 +35,7 @@ extern char __mingw_module_is_dll;
     static CRITICAL_SECTION lock;
     static int inited = 0;
     static dtor_obj *global_dtors = NULL;
    -static __thread dtor_obj *tls_dtors = NULL;
    +static DWORD tls_dtors_slot = TLS_OUT_OF_INDEXES;
    
     int __mingw_cxa_atexit(dtor_fn dtor, void *obj, void *dso) {
       if (!inited)
    @@ -73,24 +73,28 @@ int __mingw_cxa_thread_atexit(dtor_fn dtor, void *obj, void *dso) {
         return 1;
       handler->dtor = dtor;
       handler->obj = obj;
    -  handler->next = tls_dtors;
    -  tls_dtors = handler;
    +  handler->next = (dtor_obj *)TlsGetValue(tls_dtors_slot);
    +  TlsSetValue(tls_dtors_slot, handler);
       return 0;
     }
    
     static void WINAPI tls_atexit_callback(HANDLE __UNUSED_PARAM(hDllHandle), DWORD dwReason, LPVOID __UNUSED_PARAM(lpReserved)) {
       if (dwReason == DLL_PROCESS_DETACH) {
    -    run_dtor_list(&tls_dtors);
    +    dtor_obj * p = (dtor_obj *)TlsGetValue(tls_dtors_slot);
    +    run_dtor_list(&p);
    +    TlsFree(tls_dtors_slot);
         run_dtor_list(&global_dtors);
       }
     }
    
     static void WINAPI tls_callback(HANDLE hDllHandle, DWORD dwReason, LPVOID __UNUSED_PARAM(lpReserved)) {
    +  dtor_obj * p;
       switch (dwReason) {
       case DLL_PROCESS_ATTACH:
         if (inited == 0) {
           InitializeCriticalSection(&lock);
           __dso_handle = hDllHandle;
    +      tls_dtors_slot = TlsAlloc();
           /*
            * We can only call _register_thread_local_exe_atexit_callback once
            * in a process; if we call it a second time the process terminates.
    @@ -128,12 +132,15 @@ static void WINAPI tls_callback(HANDLE hDllHandle, DWORD dwReason, LPVOID __UNUS
          * variables' destructors don't run. (This matches what MSVC does with
          * a dynamically linked CRT.)
          */
    -    run_dtor_list(&tls_dtors);
    +    p = (dtor_obj *)TlsGetValue(tls_dtors_slot);
    +    run_dtor_list(&p);
    +    TlsSetValue(tls_dtors_slot, NULL);
         if (__mingw_module_is_dll) {
           /* For DLLs, run dtors when detached. For EXEs, run dtors via the
            * thread local atexit callback, to make sure they don't run when
            * exiting the process with _exit or ExitProcess. */
           run_dtor_list(&global_dtors);
    +      TlsFree(tls_dtors_slot);
         }
         if (inited == 1) {
           inited = 0;
    @@ -143,7 +150,9 @@ static void WINAPI tls_callback(HANDLE hDllHandle, DWORD dwReason, LPVOID __UNUS
       case DLL_THREAD_ATTACH:
         break;
       case DLL_THREAD_DETACH:
    -    run_dtor_list(&tls_dtors);
    +    p = (dtor_obj *)TlsGetValue(tls_dtors_slot);
    +    run_dtor_list(&p);
    +    TlsSetValue(tls_dtors_slot, NULL);
         break;
       }
     }
    
     
  • Martin Storsjö

    Martin Storsjö - 2020-10-27

    Hi, sorry for the incovenience you've run into, and thanks for your patch!

    FWIW, to simplify testing, I've used a test program that looks like this, to avoid needing to use a debugger to see whether the destructor runs:

    #include <windows.h>
    #include <thread>
    #include <string>
    
    class Hello {
    public:
        Hello(const char* s) {
            str = s;
            thread = GetCurrentThreadId();
            fprintf(stderr, "%s ctor on thread %d\n", str, thread);
        }
        ~Hello() {
            fprintf(stderr, "%s dtor from thread %d, now on %d\n", str, thread, (int) GetCurrentThreadId());
        }
        const char* str;
        int thread;
    };
    
    thread_local Hello tls_hello("tls");
    
    int main(int argc, char* argv[]) {
        Hello local_hello("main local");
        std::thread([] {
            tls_hello;
        }).join();
        return 0;
    }
    

    Also, regarding this:

    despite commit messages to the contrary, the version of that from this project is used by gcc

    I looked closer, and in my test setups (building a cross compiler on linux) GCC/libstdc++ still does provide its own __cxa_thread_atexit, so it seems to still work fine there. The reason seems to be this: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/configure.ac#L261-L277

    When cross compiling, it doesn't try to detect the presence of such functions, and always provides its own, but if (re)building GCC in a mingw environment, it will try to detect whether those functions exist.

    I was aware that the tls_atexit.c code doesn't work in combination with emutls, but didn't see this as much of an issue, as GCC would use the one from libstdc++ in my (limited) tests. Your patch looks good - do you want to send it to the mingw-w64-public list so it can be acked there and merged? I can also send it for you if you say in which form you prefer the git author line to be for it.

    You're right that there might be a risk that the destructors reference thread_local variables that already are gone, when using emutls though - but this patch in itself is at least a good thing to have in any case. But maybe it'd be good with an option to libstdc++ to force it to provide its own __cxa_thread_atexit?

     
  • Jeremy Drake

    Jeremy Drake - 2020-10-27

    I think I've managed to get a (slightly revised) patch to the mailing list.

     

Log in to post a comment.