gcc-4.4.0-mingw is not producing thread-specific storage when __thread specifier is used.
In the submission source file, a number of threads are created that increment a __thread variable. When all threads have terminated, the "master" thread (original process thread) prints "its" __thread variable to stdout. In the test case, it is non-zero (should be zero since original process thread does not run that code).
-- Details --
Generate tls.exe (test program) with any of the following:
g++ -o tls.exe tls.cpp
g++ -mthreads -o tls.exe tls.cpp
g++ -shared-libgcc -o tls.exe tls.cpp
... etc ...
Compilation generates reference to __emutls_get_address which just returns the same memory address for each thread.
WinXP Pro SP3
FYI - Unfortunately, this means that multi-threaded exception handling is broken too (which is how I stumbled onto this).