This problem has been seen with at least: gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) and gcc version 4.8.2 (Ubuntu 4.8.2-16ubuntu4) so I believe it to be an upstream problem. This problem has only been observed to happen on 32bit compilations. There doesn't seem to be a problem with 64bit. Consider this code: ==> get-value.h <== double get_value (void); ==> get-value.c <== #include "get-value.h" #include <stdint.h> int x = 1; double get_value (void) { return x / 1e6; } ==> main.c <== #include "get-value.h" #include <stdlib.h> int main (void) { double a, b; a = get_value (); b = get_value (); if (a != b) abort (); return 0; } and build it with -O2 -m32. You will get an abort. The reason for this is because the return value of the get_value() function comes via a floating point register. These registers have a higher precision than IEEE double. The spec permits "extra range and precision": """ 8 Except for assignment and cast (which remove all extra range and precision), the values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. """ It seems that GCC is failing to remove the extra precision on the assignment "b = get_value();". Indeed, looking at the code that is output: call get_value movsd %xmm0, 8(%rsp) call get_value movsd 8(%rsp), %xmm1 ucomisd %xmm0, %xmm1 we see that the first call has the return value stored in memory, but the comparison uses the value from the second call directly, without truncating the precision. Adding 'volatile' to the local variables involved is an effective workaround for the problem.
Use -fexcess-precision=standard or -std=c99 if you want the slower, but standard conforming, rounds to get rid of excess precision.
Why is this violation of standards treated in a special way? Quoting from gcc's manpage: -ffast-math Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. This option causes the preprocessor macro "__FAST_MATH__" to be defined. This option is not turned on by any -O option besides -Ofast since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. It seems that the logic about "since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications" should be equally applied here. ie: default should be to follow the standards, and maybe have 'fast' mode enabled if the user gives -ffast-math.
(In reply to Ryan Lortie from comment #2) > Why is this violation of standards treated in a special way? Because it slows down things way too much. Much better is just to use -msse2 -mfpmath=sse if you really need to use 32-bit programs and have at least SSE2 capable CPU, i387 floating point stack has tons of issues.
It seems like a good solution to this problem might be to enable -mfpmath=sse by default on arches where SSE is known to be supported and -fexcess-precision=standard otherwise. If people want their binaries to be backwards compatible to machines before the pentium3 then they can pay the price in performance -- at least we would not be violating the standard. This would be nicely mixed with an appeal to distributions to bring their default -march= flag a bit more up to date...
I think this is another duplicate of pr323.