The program below show that gcc reorder floating point instructions in such a way to make inexact checking fruitless. Reading generated assembler I see two problems: 1) the cast to float in x assignment is executed *after* fetestexcept and not before as it's written (and needed to get the correct result). This infringes C99 standard sequence point rules. 2) the second division is not recomputed (because CSE), then inexact flag is not changed after feclearexcept I guess that the latter is due to missing #pragma STDC FENV_ACCESS implementation, but the former undermine the whole fetestexcept usability. $ cat bug.c #include <fenv.h> #include <stdio.h> double vf = 0x0fffffff; double vg = 0x10000000; /* vf/vg is exactly representable as IEC559 64 bit floating point, while it's not representable exactly as a 32 bit one */ int main() { double a = vf; double b = vg; feclearexcept(FE_INEXACT); float x; x = a / b; printf("%i %.1000g\n", fetestexcept(FE_INEXACT), x); feclearexcept(FE_INEXACT); double y; y = a / b; printf("%i %.1000g\n", fetestexcept(FE_INEXACT), y); return 0; } $ gcc -O2 bug.c -lm $ ./a.out 0 1 0 0.9999999962747097015380859375 $
Created attachment 17176 [details] Assembler generated by gcc -S -O2 bug.c
It is both due to missing #pragma STDC FENV_ACCESS GCC does not have a way to represent use/def of floating-point status, so the call to fetestexcept is not a barrier for moving floating-point operations. In fact, it will be hard to represent this.
*** Bug 85633 has been marked as a duplicate of this bug. ***
Note that a not too disruptive "implementation" of the dependences would be to add outgoing abnormal edges to the fenv* calls. Not too disruptive in terms of implementation - the effect on code generation might be very noticable though (note that all calls to functions that might call fenv* functions themselves are subject to the same treatment). Of course there's the (existing) issue of RTL expansion not maintaining abnormal edges. You can experiment with this by declaring the fenv* functions with __attribute__((returns_twice)). Note w/o also having incoming abnormal edges this might not be a full barrier for downward motion.
Since any non-const function can examine floating-point state, I'd expect significant effects on code generation. (Whether this also applies to asms depends on the architecture; some architectures have a register name you can use in asm operands to refer to floating-point state, and in those cases asms reading or writing that state "should" say explicitly that they do so, but I don't think all architectures have such a name supported by GCC in asms.)
On Fri, 4 May 2018, joseph at codesourcery dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38960 > > --- Comment #5 from joseph at codesourcery dot com <joseph at codesourcery dot com> --- > Since any non-const function can examine floating-point state, I'd expect > significant effects on code generation. (Whether this also applies to > asms depends on the architecture; some architectures have a register name > you can use in asm operands to refer to floating-point state, and in those > cases asms reading or writing that state "should" say explicitly that they > do so, but I don't think all architectures have such a name supported by > GCC in asms.) That's true. GCCs job would then be to prove and IPA-propagate knowledge of which functions actually do access FP state. If it actually works (still needs to be proven by experiment) it is still the simplest approach for "fixing" the issue. If it works a first enhancement would be to not re-use returns_twice but invent a new attribute so we can do more careful abnormal edge creation. An alternative fix could involve forcing all FP computation results to (addressable aka aliasable) memory and make FP state accesses also access all (FP?) memory. Alternatively all FP ops could be "lowered" to internal functions and thus basically hidden from the optimizers. Dependences to FP state accessors can be handled as memory dependence then. This lowering would be similar to what is proposed for a -ftrapv replacement. The issue then remains on the RTL side though (but maybe we're lucky and re-ordering doesn't happen there and/or we could expand suitable barriers before and after possible FP state accesses). Another alternative would be to try to model the FP state explicitely. With the right infrastructure this would allow modeling other CPU state (CC flags) in a similar way. I think that the force-to-memory variant isn't really worth exploring since it involves a lot of engineering with questionable benefit over the "simple" solution(s).