the last paragraph says:
If a DLL declares any nonlocal data or object as __declspec( thread ), it can cause a protection fault if dynamically loaded. After the DLL is loaded with LoadLibrary, it causes system failure whenever the code references the nonlocal __declspec( thread ) data. Because the global variable space for a thread is allocated at run time, the size of this space is based on a calculation of the requirements of the application plus the requirements of all the DLLs that are statically linked. When you use LoadLibrary, there is no way to extend this space to allow for the thread local variables declared with __declspec( thread ). Use the TLS APIs, such as TlsAlloc, in your DLL to allocate TLS if the DLL might be loaded with LoadLibrary.
This is exactly what happened to my DLLs. My hook DLL is being loaded at the start time of any executable (via LoadPerProcess registry entry), and in some cases loading my DLL will trigger a segmentation fault (Windows will hide this fault, thus user won't observe it). However, the functionality of my DLL will be, obviously, missing.
I'm not 100% sure the above is the real cause, but it fails in NtLoadLibrary(), in the function that does something with the .tls section. In some cases everything goes well; I found that this happens when the DLL is loaded to the prelinked address (e.g. the "Image Base" address in PE header). Sometimes it happens that the Image Base address is occupied by other DLL, in which case the loader will try loading the DLL to a different address, and this 100% triggers the bug.
Unfortunately, some runtime (I think libgcc.a, have to do more research here) already contains several (four?) TLS variables, thus any DLL built by mingw will contain a .tls section. This is a very serious issue for me as it happened in a product we're going to release soon. Don't even know what to do now, perhaps will try a older libgcc.a which does not contain thread local storage variables.
I see two solutions here:
- Avoid usage of thread-local-storage variables in gcc runtime
- Use TlsAlloc(), as microsoft recommends