When using parallel OpenMP code, computations with long double
variables are actually done in double
precision. The following example computes the machine epsilon (eps
) for long double
in several threads:
#include <stdio.h> #include <omp.h> int main(){ volatile long double eps[] = {1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0}, one_plus_eps[8]; int j; #pragma omp parallel for default(shared) private(j) for (j=0; j<8; j++) { //__asm__ ("fninit"); do { eps[j] /= 2.0; one_plus_eps[j] = 1.0 + eps[j]; } while (one_plus_eps[j] != 1.0); eps[j] *= 2.0; int id=omp_get_thread_num(); printf ("thread #%d,\teps=%LE\n", id, eps[j]); } return 0; }
Result: (compiled with Thread model: posix
gcc version 7.2.0 (Rev1, Built by MSYS2 project)
)
$ gcc -D__USE_MINGW_ANSI_STDIO=1 -fopenmp ldbl-omp.c -o a.exe && ./a.exe thread #1, eps=2.220446E-016 thread #0, eps=1.084202E-019 thread #2, eps=2.220446E-016 thread #1, eps=2.220446E-016 thread #0, eps=1.084202E-019 thread #3, eps=2.220446E-016 thread #2, eps=2.220446E-016 thread #3, eps=2.220446E-016
The correct value of epsilon for long double
of 1.08E-19 is obtained only in thread 0, all other threads compute the epsilon of double
precision.
The correct result can be obtained by uncommenting the instruction __asm__ ("fninit");
I'm actually investigating this; I discovered a couple bugs in the floating point environment functions that doesn't set the env properly. However I am still setting my build/test environment. I do have a question though to satisfy my curiosity what happens when you compile with -frounding-math -fsignaling-nans paramaters? (I'm not saying the results will be different, just that there may be a difference).
Also, I can't tell if you're compiling x86 or x64 code; there is a huge difference...if x86 code, can you see if there is a difference if you compile with '-mpfmath=sse -msse2" . Note, x64 is irrelevant.
(this will at least let me try and narrow things down)
I compiled x64 code. The flags
-frounding-math -fsignaling-nans
do not make any difference.When I compile x86 code (in the MSYS2 MINGW32 evironment), I get the same results. Also, neither
-mpfmath=sse -msse2
nor-frounding-math -fsignaling-nans
nor using both changes the result for me.Well, the problem is that for new created threads we don't set the floating point environment. Even worse, we let msvcrt do the wrong thing (which resets it to its own settings).
By using winpthread with recent fix, this issue is resolved for all cases you are using pthread API to create the thread, but not in general ...
It might be a way to use our TLS routines in an executable. The problematic part is here that our code is used in DLL, which is used in a VC executable (or the other way around). As in such scenarios the FPU environment setting is for sure always wrong for one side.