I have encountered an issue with a certain piece of code on the "Fedora cross-compiler" (EPEL version.) The Fedora team say they are unable to follow up detailed bug reports, so I'm trying here.
The problem is that variables updated in the body of a loop gets unexpected values in some special circumstances. I only see this when building a 32-bit version, and only when -O2 is passed. A small test program that reproduces the issue will be attached.
Steps to reproduce (with Linux cross-compiler):
1. On Linux,
i686-w64-mingw32-g++ -mms-bitfields -O2 -I/usr/i686-w64-mingw32/sys-root/mingw/include -L/usr/i686-w64-mingw32/sys-root/mingw/lib testLoop.cpp -o testLoop.exe
2. Copy testLoop.exe and the necessary MinGW libs to a Windows system (I assume you know how.)
3. On Windows,
cd where-you-put-the-files
testLoop.exe
Actual output:
0 0 0 0
0.344262 0 0 0
0.688525 0 0 0
1.03279 1 1 1
1.37705 1 1 1
1.72131 1 1 1
2.06557 2 2 2
2.40984 2 2 2
2.7541 2 2 2
3.09836 3 3 3
3.44262 3 3 3
3.78689 3 3 3
4.13115 4 4 4
4.47541 4 4 4
4.81967 4 4 4
5.16393 5 5 5
5.5082 5 5 5
5.85246 5 5 5
6.19672 6 6 6
6.54098 6 6 6
6.88525 6 6 6
7.22951 7 7 7
7.57377 7 7 7
7.91803 7 7 7
8.2623 8 8 8
8.60656 8 8 8
8.95082 8 8 8
9.29508 9 9 9
9.63934 9 9 9
9.98361 9 9 9
10.3279 10 10 10
10.6721 10 10 10
11.0164 11 11 11
11.3607 11 11 11
11.7049 11 11 11
12.0492 12 12 12
12.3934 12 12 12
12.7377 12 12 12
13.082 13 13 13
13.4262 13 13 13
13.7705 13 13 13
14.1148 14 14 14
14.459 14 14 14
14.8033 14 14 14
15.1475 15 15 15
15.4918 15 15 15
15.8361 15 15 15
16.1803 16 16 16
16.5246 16 16 16
16.8689 16 16 16
17.2131 17 17 17
17.5574 17 17 17
17.9016 17 17 17
18.2459 18 18 18
18.5902 18 18 18
18.9344 18 18 18
19.2787 19 19 19
19.623 19 19 19
19.9672 19 19 19
20.3115 20 20 20
20.6557 20 20 20
SHOULD NOT BE HERE
21 20 21 21
Expected output:
0 0 0 0
0.344262 0 0 0
0.688525 0 0 0
1.03279 1 1 1
1.37705 1 1 1
1.72131 1 1 1
2.06557 2 2 2
2.40984 2 2 2
2.7541 2 2 2
3.09836 3 3 3
3.44262 3 3 3
3.78689 3 3 3
4.13115 4 4 4
4.47541 4 4 4
4.81967 4 4 4
5.16393 5 5 5
5.5082 5 5 5
5.85246 5 5 5
6.19672 6 6 6
6.54098 6 6 6
6.88525 6 6 6
7.22951 7 7 7
7.57377 7 7 7
7.91803 7 7 7
8.2623 8 8 8
8.60656 8 8 8
8.95082 8 8 8
9.29508 9 9 9
9.63934 9 9 9
9.98361 9 9 9
10.3279 10 10 10
10.6721 10 10 10
11.0164 11 11 11
11.3607 11 11 11
11.7049 11 11 11
12.0492 12 12 12
12.3934 12 12 12
12.7377 12 12 12
13.082 13 13 13
13.4262 13 13 13
13.7705 13 13 13
14.1148 14 14 14
14.459 14 14 14
14.8033 14 14 14
15.1475 15 15 15
15.4918 15 15 15
15.8361 15 15 15
16.1803 16 16 16
16.5246 16 16 16
16.8689 16 16 16
17.2131 17 17 17
17.5574 17 17 17
17.9016 17 17 17
18.2459 18 18 18
18.5902 18 18 18
18.9344 18 18 18
19.2787 19 19 19
19.623 19 19 19
19.9672 19 19 19
20.3115 20 20 20
20.6557 20 20 20
As indicated above, I get the expected results when using the 64-bit compiler (x86_64-w64-mingw32-g++).
There is of course also a chance that the same issue will occur if using a "native" compiler under Windows. I'd appreciate if if someone would test this.
You're running into a floating point precision issue; the fact that on an x86, the FPU does its calculations internally as 80 bit calculations but a double is stored as 64 bit. This really isn't a bug; it is an issue in a misunderstanding about floating point, and why you never should use doubles as loop counters.
step = 0.3442622950819672067
last pos = 20.65573770491803174
add the two and you get 20.9999999999999989467
This causes index to be 20 because the cast to size_t is done on the longer internally calculated variable directly using. Optimization during code generation is causing this because some loads and stores aren't being completed. You don't see this on the x64 because on that processor, the 387 fpu instructions are not used. Use -msse2 and -fpmath=sse to the x86 compiler and the issue "goes away"
Better: do not use floating point values as loop control parameters
for more information see https://gcc.gnu.org/wiki/FloatingPointMath#x86note
or https://gcc.gnu.org/wiki/x87note
That's a nice explanation of what's going on, and I must admit i didn't think about co-processors etc. I don't necessarily agree with your conclusion, though. I do realise that with floating-points, you often get results that are not quite what you expected when you accumulate values or do limit checks using direct comparisons. (The "real code" I was working on doesn't do this quite as naively as you might think; what's attached is a much simplified example.) That's not the main problem here, though. The real issue is that we do the same operation twice on the same variable, without intervening changes, and get different results. I can't see how that's not a bug.
Yes, you have outlined a plausible cause for what happens, but it seems to me that you have described the cause of a bug, not acceptable behaviour.
Compiler optimizations are different between SSE2 and x87 instructions at the -O0 and -O2 levels. I gave you the cause, it is a rounding error introduced by a floating point coprocessor at a place beyond the IEEE significant digit area due to probably floating point constants and loop optimizations. There is no bug, except in the code, by using a transfinite variant that is not increasing monotonically due to the floating point rounding error. Even then, the optimizer has optimized out certain things and has transferred some of these to 64 bit memory locations (thus losing the 80 bit precision).
If you want to turn this off, use the -ffloat-store compiler option, which keeps the floating point intermediates in the floating point unit as long as possible.
This is a misunderstanding of optimization, how floating point works, and a "bug" in your code. GCC is allowed to reduce the IEEE down to 64 bit as soon as it wants to, but these things are only done at optimizations (with folding, lifting, and precomputations). Floating point is slow , and gcc optimizations take that into account and during the optimization phase it is choosing different math.
You are using a floating point value as the basis for your loop control parameter (the truncation during the conversion of index), which has been optimized to a different precision for the comparison at the beginning of the loop.
Well, I'm sorry you do not agree that this is a bug. I have actually verified gcc native targeting x86 for linux behaves exactly the same. If you really feel this is a bug, submit it to the gcc folks, but they are going to tell you exactly what I told you, to use the flag -ffloat-store compiler option.