thread_local
objects are supposed to have their destructor called when the thread they're bound to exits (assuming the objects were constructed in the first place, of course).
With the Win32 threading model, this indeed works correctly. With the POSIX threading model, however, the destructor is called, but it appears as if the object's memory was freed before the destructor's invocation. This means all data members are corrupt at the time the destructor is invoked. (When running under gdb
, you can see all the data members are set to 0xfeeefeee
when the destructor is called.)
Additionally, with the POSIX model, calling a function with a static thread_local
variable from a thread_local
object's destructor will cause the variable to be re-created (and then destroyed again after). This is particularly nasty when the destructor attempts to call a function which should yield a reference to its own object, which ends up creating a destruct-construct-destruct-... loop (fortunately not infinite -- there seems to be a limit of 256 such cycles).
I've observed this behaviour with both x64-4.8.1-posix-seh-rev5
and x86_64-4.9.2-posix-seh-rt_v3-rev0
(the latest as of this writing). My OS is Windows 7 64-bit. I've tried using both std::thread
(which should definitely work) and just the plain Win32 threading APIs (which I accept may not work with the POSIX model) just in case.
Below is a sample program that can reproduce the problem:
#include <cstdio> #include <atomic> #include <thread> static std::atomic<int> tlocalid(0); class ThreadLocalTest { private: struct ThreadLocalTestData { ThreadLocalTestData() : magic0(0xdeadbeef), id(tlocalid.fetch_add(1, std::memory_order_relaxed)), magic1(0xcafecafe) { std::printf("d-ctor %i\n", id); } ~ThreadLocalTestData() { std::printf("d-dtor %i\n", id); } const int magic0; int id; const int magic1; }; static ThreadLocalTestData& data() { static thread_local ThreadLocalTestData data; return data; } ThreadLocalTest(ThreadLocalTestData& d) { std::printf("t-ctor %i\n", d.id); } ~ThreadLocalTest() { auto& d = data(); std::printf("t-dtor %i\n", d.id); } public: static ThreadLocalTest& instance() { auto& d = data(); static thread_local ThreadLocalTest test(d); return test; } }; int main() { std::printf("before\n"); std::thread t([]() { ThreadLocalTest::instance(); }); t.join(); std::printf("after\n"); return 0; }
The expected output (as produced with the Win32 threading model with a swap of std::thread
for the Win32 threading API, or when run under Linux with the same version of g++) is:
before d-ctor 0 t-ctor 0 t-dtor 0 d-dtor 0 after
The actual output (as received when running under the POSIX threading model) is:
before d-ctor 0 t-ctor 0 d-ctor 1 t-dtor 1 d-dtor 0 d-dtor 0 after
Note the actual output is just a sample, which can vary since it depends on uninitialized memory.
Confirmed with gcc 5.1.0 & mingw/winpthread 4.0.2