Description
merkert
2008-01-04 17:48:20 UTC
Sorry, -frounding-math does not help for this case - it assumes the _same_ rounding mode is in effect everywhere, but doesn't assume it is round-to-nearest. Ok, so how then would one accomplish this in std c without resorting to asm? I still assume the original code is correct even though the rounding-math doesn't do what I wanted. At any rate, I played a little with it and there was hint in the asm manual how to do it. This seems to work for me, but I'm not sure I'm using the constraints as efficiently as possible: #include <fenv.h> inline void reload(double* x) { asm volatile ("" : "=m"(x) ); } void xdiv (double x, double y, double* lo, double* hi) { #pragma STDC FENV_ACCESS ON fesetround(FE_DOWNWARD); *lo = x/y; reload(&y); fesetround(FE_UPWARD); *hi = x/y; } Subject: Re: Optimization generates incorrect code
with -frounding-math option (#pragma STDC FENV_ACCESS not implemented)
On Sat, 5 Jan 2008, rguenth at gcc dot gnu dot org wrote:
> Sorry, -frounding-math does not help for this case - it assumes the _same_
> rounding mode is in effect everywhere, but doesn't assume it is
> round-to-nearest.
To be clear: this is a bug in the present implementation of
-frounding-math - it *should* disable all the assumptions of same rounding
mode (unless it can prove that no function changing the rounding mode is
called between the two places where it assumes the same mode), but, as
documented in the manual, it's not yet fully implemented.
# This option is experimental and does not currently guarantee to
# disable all GCC optimizations that are affected by rounding mode.
I wouldn't read the language this way. Because that will forcefully disable all redundancy removing optimizations (which is what happens in this testcase). What it currently guards is expression rewriting that changes the outcome if a rounding mode different than round-to-nearest is used. The finer-grained control the documentation mentions should not be globbed to -frounding-math IMHO. Subject: Re: Optimization generates incorrect code with -frounding-math option (#pragma STDC FENV_ACCESS not implemented) On Sat, 5 Jan 2008, rguenth at gcc dot gnu dot org wrote: > I wouldn't read the language this way. Because that will forcefully disable > all redundancy removing optimizations (which is what happens in this testcase). > What it currently guards is expression rewriting that changes the outcome if > a rounding mode different than round-to-nearest is used. My understanding has always been that -frounding-math should be usable for the code that calls rounding-mode-changing functions, rather than having no way to compile that code safely with GCC, as well as the code that does not call those functions but may execute with non-default rounding modes. The FENV_ACCESS pragma does not distinguish between the two. > The finer-grained control the documentation mentions should not be globbed > to -frounding-math IMHO. The pragma would in effect set -frounding-math for particular regions of code; it isn't more fine-grained regarding whether the code sets the mode or merely runs under a different mode. It is of course possible that -frounding-math should be split into multiple options (more fine-grained than the pragma) as the other related flags have been split over time. I see. So basically we need to split all floating point operators into two variants, one specifying a default rounding mode is used and one specifying the rounding mode is unknown. I suppose the frontend parts would be actually quite simple then? Subject: Re: Optimization generates incorrect code
with -frounding-math option (#pragma STDC FENV_ACCESS not implemented)
On Sun, 6 Jan 2008, rguenth at gcc dot gnu dot org wrote:
> I see. So basically we need to split all floating point operators into two
> variants, one specifying a default rounding mode is used and one specifying the
> rounding mode is unknown. I suppose the frontend parts would be actually quite
> simple then?
More than two variants, in the end, depending on how you handle all the
other flags - but eventually, everything about GIMPLE semantics controlled
by the global flags should be directly represented in the GIMPLE and the
pragmas, together with the command-line options determining global
defaults, would map to appropriate choices of operations / flags on those
operations. This is desirable for LTO optimizing between objects compiled
with different options as well.
*** Bug 37838 has been marked as a duplicate of this bug. *** What was said in bug 37838 but not here is that -frounding-math sometimes fixes the problem. So, I was suggesting that -frounding-math should be enabled by default. The default of -fno-rounding-math is chosen with the reason that this is what a compiler can assume if #pragma STDC FENV_ACCESS is not turned on. What you may request instead is an implementation of FENV_ACCESS up to that point that we issue a fatal error whenever you try to turn it on. Or at least a warning by default. (In reply to comment #10) > The default of -fno-rounding-math is chosen with the reason that this is what > a compiler can assume if #pragma STDC FENV_ACCESS is not turned on. The C standard doesn't require a compiler to recognize the FENV_ACCESS pragma, but if the compiler does not recognize it, then it must assume that this pragma is ON (otherwise the generated code can be incorrect). That's why I suggested that -frounding-math should be the default. Turning -frounding-math on by default would be a disservice to (most of) our users which is why the decision was made (long ago) to not enable this by default. Would it make sense to have a function attribute to indicate that rounding mode was changed as a side effect? This way, one could keep the default rounding behavior and not incur a performance penalty, but at the same time setroundingmode would work as expected. (In reply to comment #12) > Turning -frounding-math on by default would be a disservice to (most of) our > users which is why the decision was made (long ago) to not enable this by > default. The compiler should generate correct code by default, and options like -funsafe-math-optimizations are there to allow the users to run the compiler in a non-conforming mode. So, it would be wise to have -frounding-math by default and add -fno-rounding-math to the options enabled by -funsafe-math-optimizations. Subject: Re: Optimization generates incorrect code with -frounding-math option (#pragma STDC FENV_ACCESS not implemented) On Thu, 16 Oct 2008, vincent at vinc17 dot org wrote: > The compiler should generate correct code by default, and options like Sure, it generates correct code for the supported features, and <http://gcc.gnu.org/c99status.html>, which is linked from the manual, documents the state of support for the features: standard pragmas are documented as Missing and IEC 60559 support as Broken, with the explanation: * IEC 60559 is IEEE 754 floating point. This works if and only if the hardware is perfectly compliant, but GCC does not define __STDC_IEC_559__ or implement the associated standard pragmas; nor do some options such as -frounding-math to enable the pragmas globally work in all cases (for example, required exceptions may not be generated) and contracting expressions (e.g., using fused multiply-add) is not restricted to source-language expressions as required by C99. So you are using features documented not to be fully implemented and whether they will work is unpredictable. I was suggesting to improve the behavior by having -frounding-math by default (at least when the user compiles with -std=c99 -- if he does this, then this means that he shows some interest in a conforming implementation). This is not perfect, but would be better than the current behavior. *** Bug 46180 has been marked as a duplicate of this bug. *** *** Bug 47617 has been marked as a duplicate of this bug. *** *** Bug 48295 has been marked as a duplicate of this bug. *** Richard Biener's approach to the default is the one that matches the C standard and Vincent Lefèvre is mistaken. C11 7.6.1p2 says: "... If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined. The default state (‘‘on’’ or ‘‘off’’) for the pragma is implementation-defined." Defining it to be 'off' and not setting __STDC_IEC_559__ is very reasonable. Because generated code and the library are potentially dependent on the rounding mode (including even floating point to integer conversion!), the default should remain that rounding mode support is off until each target has been thoroughly checked that it does NOT break. There are also very strong grounds for not wanting IEEE 754 support by default, anyway, because of the performance impact and because a lot of programs won't reset the state before calling external functions (and hence may well give wrong answers). That is especially true if the code is used within a C++ program or uses GPUs or some SIMD units - let alone OpenMP :-( (In reply to Nick Maclaren from comment #20) > Richard Biener's approach to the default is the one that matches the C > standard and Vincent Lefèvre is mistaken. No, I'm correct. > Defining it to be 'off' and not setting __STDC_IEC_559__ is very reasonable. No, this is really stupid. If the user *decides* to set the STDC FENV_ACCESS pragma to "on", then the compiler must not assume that it is "off" (this bug is not about the default state). At least it must behave in this way if -std=c99 (or c11) has been used. Otherwise a compilation failure may be better than getting wrong results. On Mon, 11 Nov 2013, nmm1 at cam dot ac.uk wrote:
> There are also very strong grounds for not wanting IEEE 754 support by
> default, anyway, because of the performance impact and because a lot of
> programs won't reset the state before calling external functions (and
> hence may well give wrong answers). That is especially true if the code
> is used within a C++ program or uses GPUs or some SIMD units - let alone
> OpenMP :-(
Note also that the documented default is -ftrapping-math
-fno-rounding-math. I suspect that if -ftrapping-math actually
implemented everything required for the floating-point exceptions aspects
of FENV_ACCESS, it would be just as bad for optimization as
-frounding-math - it would disallow constant-folding inexact
floating-point expressions because that would eliminate the side effect of
raising the "inexact" exception, for example (just as -frounding-math does
disable such constant folding, although not in all cases it should,
because the result depends on the rounding mode), and would mean a value
computed before a function call can't be reused for the same computation
after that call because the computation might raise exceptions that the
function call could have cleared (just as -frounding-math should prevent
such reuse because the call might change the rounding mode).
So a key part of actually making rounding modes and exceptions work
reliably would be working out a definition of GCC's default mode that
allows more or less the same optimizations as at present, while allowing
users wanting the full support (and consequent optimization cost) to
specify the appropriate command-line options or FENV_ACCESS pragma to
enable it.
(In reply to Vincent Lefèvre from comment #21) > > > Richard Biener's approach to the default is the one that matches the C > > standard and Vincent Lefèvre is mistaken. > > No, I'm correct. > > > Defining it to be 'off' and not setting __STDC_IEC_559__ is very reasonable. > > No, this is really stupid. If the user *decides* to set the STDC FENV_ACCESS > pragma to "on", then the compiler must not assume that it is "off" (this bug > is not about the default state). At least it must behave in this way if > -std=c99 (or c11) has been used. Otherwise a compilation failure may be > better than getting wrong results. If __STDC_IEC_559__ is unset or does not have the value 1, setting STDC FENV_ACCESS to "on" is undefined behaviour (see 6.10.8.3, 7.6 and Annex F), unless the implementation explicitly chooses to extend the language to support it. So the user would get what he so richly deserves. (In reply to joseph@codesourcery.com from comment #22) > > So a key part of actually making rounding modes and exceptions work > reliably would be working out a definition of GCC's default mode that > allows more or less the same optimizations as at present, while allowing > users wanting the full support (and consequent optimization cost) to > specify the appropriate command-line options or FENV_ACCESS pragma to > enable it. Yes. That won't deal with the correctness problems of introducing IEEE 754 support into code not set up to handle it, especially C++, of course. I tried to get WG21 to take a stand on that issue, but failed :-( Working out what on earth to do in such a case is likely to be a far fouler task than merely dealing with the performance problems :-( (In reply to Nick Maclaren from comment #23) > If __STDC_IEC_559__ is unset or does not have the value 1, setting > STDC FENV_ACCESS to "on" is undefined behaviour (see 6.10.8.3, 7.6 and > Annex F), unless the implementation explicitly chooses to extend the > language to support it. You're wrong. The C standard doesn't say that. 6.10.8.3 says: "__STDC_IEC_559__ The integer constant 1, intended to indicate conformance to the specifications in annex F (IEC 60559 floating-point arithmetic)." and nothing about STDC FENV_ACCESS. In 7.6, only 7.6.1 is specifically about the FENV_ACCESS pragma, and it specifies under which conditions the behavior is undefined, but nothing related to __STDC_IEC_559__ and Annex F. Annex F doesn't apply in the case __STDC_IEC_559__ is unset. On Jan 10 2014, vincent-gcc at vinc17 dot net wrote: > >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678 > >--- Comment #24 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> - > >(In reply to Nick Maclaren from comment #23) > >> If __STDC_IEC_559__ is unset or does not have the value 1, setting >> STDC FENV_ACCESS to "on" is undefined behaviour (see 6.10.8.3, 7.6 and >> Annex F), unless the implementation explicitly chooses to extend the >> language to support it. > >You're wrong. The C standard doesn't say that. I am sorry, but it is you that is wrong. >6.10.8.3 says: "__STDC_IEC_559__ The integer constant 1, intended to indica >te >conformance to the specifications in annex F (IEC 60559 floating-point >arithmetic)." and nothing about STDC FENV_ACCESS. 3.4.3 says: undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements 4. Conformance, paragraph 2, says: ... Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined". What "explicit definition of behavior" is there for the case when STDC FENV_ACCESS is set to "on" but __STDC_IEC_559__ is not set to one? As there is none, it is undefined behaviour. gcc can therefore do whatever it likes. Regards, Nick Maclaren. (In reply to Nick Maclaren from comment #25) > 3.4.3 says: > undefined behavior > behavior, upon use of a nonportable or erroneous program construct > or of erroneous data, for which this International Standard imposes > no requirements > > 4. Conformance, paragraph 2, says: > ... Undefined behavior is otherwise indicated in this International > Standard by the words "undefined behavior" or by the omission of any > explicit definition of behavior. There is no difference in emphasis > among these three; they all describe "behavior that is undefined". > > What "explicit definition of behavior" is there for the case when > STDC FENV_ACCESS is set to "on" but __STDC_IEC_559__ is not set to one? The behavior is defined. The standard says, e.g. for C99: ---- 7.6.1 The FENV_ACCESS pragma The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes.184) [...] 184) The purpose of the FENV_ACCESS pragma is to allow certain optimizations that could subvert flag tests and mode changes (e.g., global common ubexpression elimination, code motion, and constant folding). In general, if the state of FENV_ACCESS is ``off'', the translator can assume that default modes are in effect and the flags are not tested. ---- And there is here no relation at all with __STDC_IEC_559__. > As there is none, it is undefined behaviour. gcc can therefore do > whatever it likes. No. On Jan 14 2014, vincent-gcc at vinc17 dot net wrote: > >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678 > >> What "explicit definition of behavior" is there for the case when >> STDC FENV_ACCESS is set to "on" but __STDC_IEC_559__ is not set to one? > >The behavior is defined. The standard says, e.g. for C99: > >7.6.1 The FENV_ACCESS pragma > >The FENV_ACCESS pragma provides a means to inform the implementation when a >program might access the floating-point environment to test floating-point >status flags or run under non-default floating-point control modes. I suggest looking up the word "explicit" in a dictionary. Unless __STDC_IEC_559__ is set to 1, what modes and flags exist (and, even more importantly) what there semantics are) is at best implicit and more realistically unspecified - see footnote 204 for a clear statement of that. Have you ever implemented a C system for an architecture with non-IEEE arithmetic but with modes and flags? Because I have, and I have used several others. >> As there is none, it is undefined behaviour. gcc can therefore do >> whatever it likes. > >No. You are quite simply wrong. Regards, Nick Maclaren. (In reply to Nick Maclaren from comment #27) > On Jan 14 2014, vincent-gcc at vinc17 dot net wrote: > >The FENV_ACCESS pragma provides a means to inform the implementation when a > >program might access the floating-point environment to test floating-point > >status flags or run under non-default floating-point control modes. > > I suggest looking up the word "explicit" in a dictionary. The above is an explicit definition. Where do you see an undefined behavior here? #include <fenv.h> #pragma STDC FENV_ACCESS ON int main (void) { return 0; } The modes and so on are dealt with in other parts of the standard, e.g. ---- Each of the macros FE_DOWNWARD FE_TONEAREST FE_TOWARDZERO FE_UPWARD is defined if and only if the implementation supports getting and setting the represented rounding direction by means of the fegetround and fesetround functions. ---- This doesn't mean that the rounding direction will necessarily be honored even for the basic operations (just like the C standard doesn't require "1.0+2.0" to evaluate as 3.0, and a poorly-designed implementation could decide that 1-bit accuracy is OK), but honoring the rounding direction when the processor does[*] is a reasonable QoI feature. Basically, this means: disabling some optimizations when STDC FENV_ACCESS is set to ON. This is what this bug is about. [*] a weaker requirement than __STDC_IEC_559__ being set to 1. Note that the C standard doesn't explicitly say how a source file as a sequence of bytes is to be interpreted as a sequence of character, so that if you just restrict to the C standard, everything is undefined. The discussion is going nowhere. On Jan 14 2014, vincent-gcc at vinc17 dot net wrote: > >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678 > >> >The FENV_ACCESS pragma provides a means to inform the implementation whe >n a >> >program might access the floating-point environment to test floating-poi >nt >> >status flags or run under non-default floating-point control modes. > >> I suggest looking up the word "explicit" in a dictionary. > >The above is an explicit definition. Where do you see an undefined behavior >here? It is not an "explicit definition of BEHAVIOR" (my emphasis), and what it implies for any nnon-IEEE system is completely unclear. Of the two countries active during the standardisation of C99, one voted "no" on these grounds (among others). >Note that the C standard doesn't explicitly say how a source file as a sequ >ence >of bytes is to be interpreted as a sequence of character, so that if you ju >st > restrict to the C standard, everything is undefined. Yes, it does - it's implementation-defined in 5.1.1.2 Translation phases, paragraph 1.1: Physical source file multibyte characters are mapped, in an implementation-defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. ... You imply that you are also relying on some other standards or specifications. ISO/IEC Directives Part II is quite clear (in 6.2.2) that they shall be referenced in the ISO standard. Which ones are you referring to and why? If you are claiming that C99 and beyond support only systems that conform to IEEE 754, then I can tell you that was not the intention of WG21 at the time and is not a requirement of the standard. To repeat, how many other such systems are you familiar with? The grounds that the UK voted "no" to this aspect was that the whole 'IEEE 754' morass (including "fenv.h") was neither dependent on __STD_IEC_559__ nor implementation-dependent nor sufficiently explicit to be interpreted on any non-IEEE system. > The discussion is going nowhere. Now, at least that is true. Regards, Nick Maclaren. (In reply to Nick Maclaren from comment #29) > It is not an "explicit definition of BEHAVIOR" (my emphasis), The pragma is just a directive. It has no additional behavior, so that there is nothing else to define. > >Note that the C standard doesn't explicitly say how a source file as a sequence > >of bytes is to be interpreted as a sequence of character, so that if you just ^^^^^ > > restrict to the C standard, everything is undefined. > > Yes, it does - it's implementation-defined in 5.1.1.2 Translation phases, > paragraph 1.1: > > Physical source file multibyte characters are mapped, in an ^^^^^^^^^^^^^^^^^^^^ Read again. I'm talking of a sequence of bytes. What your quoting is about a sequence of multibyte characters. The interpretation of the sequences of bytes as a sequence of multibyte characters is not defined. > You imply that you are also relying on some other standards or > specifications. Not other standards, just the implementation. *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Marked for reference. Resolved as fixed @bugzilla. Is there any hope this could actually be improved? Now, 10 years later, the FENV_ACCESS pragma seems to be implemented, but the problem here seems to persist. I run into this with GCC 8.3.0 and code like this: #pragma STDC FENV_ACCESS ON #include <stdio.h> #include <stdlib.h> #include <fenv.h> #include <assert.h> int main() { double op; double down; double up; double near; op = atof("0.2"); fesetround(FE_DOWNWARD); down = 1.0 / op; fesetround(FE_UPWARD); up = 1.0 / op; fesetround(FE_TONEAREST); near = 1.0 / op; printf("1/%.16g: down = %.16g near = %.16g up = %.16g\n", op, down, near, up); assert(down <= 5.0); assert(down <= near); assert(near <= up); assert(5.0 <= up); return 0; } $ gcc -O3 -lm div.c && ./a.out 1/0.2: down = 4.999999999999999 near = 4.999999999999999 up = 4.999999999999999 a.out: div.c:32: main: Assertion `5.0 <= up' failed. Looking at the assembler code, I see that only the first divsd remains and the other two were optimized away. The Intel compiler 19.0.0.177 handles this better: $ icc -O3 div.c -lm && ./a.out 1/0.2: down = 4.999999999999999 near = 5 up = 5 (In reply to Stefan Vigerske from comment #32) > Is there any hope this could actually be improved? > Now, 10 years later, the FENV_ACCESS pragma seems to be implemented, but the > problem here seems to persist. The pragma still has no effect (not sure if we would be allowed to diagnose that fact). Nobody stepped up with a poor-mans "solution" to the issue (not sure if there even exists one) and a complete solution is not even on a drawing board. Other people running into this issue live with inserting compiler-barriers like __asm__("# %0" : "=r" (x) : "0" (x)); for hiding 'x' from the compiler across this point. (In reply to Richard Biener from comment #33) > (In reply to Stefan Vigerske from comment #32) > > Is there any hope this could actually be improved? > > Now, 10 years later, the FENV_ACCESS pragma seems to be implemented, but the > > problem here seems to persist. > > The pragma still has no effect (not sure if we would be allowed to diagnose > that fact). I don't remember ever seeing anything in the C or C++ standard that would prevent from printing as many diagnostics as we want, even for perfectly valid code. Most warnings would be illegal otherwise. > Nobody stepped up with a poor-mans "solution" to the issue (not sure if there > even exists one) and a complete solution is not even on a drawing board. One approach could be producing IFN_PLUS instead of PLUS_EXPR, for floating-point types inside a "pragma on" region (or everywhere with -frounding-math), and expanding it using the barrier mentioned below by default (maybe with a possibility for targets to override that expansion). But then it is tempting to refine IFN_PLUS and have variants (an argument?) specifying the rounding mode when known, specifying if we only care about rounding, only about exceptions, or both, etc. We could also produce PLUS_EXPR and the barriers directly from the front-end, but with IFN_PLUS we at least have a chance to fold exact constant operations (say 2.+2.), and more with the refined version. Most lacking, whatever the approach, is a volunteer motivated enough to work on it ;-) clang doesn't handle the pragma either (AFAIK there was a recent effort, but it stalled). Intel and Microsoft supposedly handle it, but Microsoft only applies it to scalars and not SIMD vectors. That's not a lot of support... > Other people running into this issue live with inserting compiler-barriers > like > > __asm__("# %0" : "=r" (x) : "0" (x)); > > for hiding 'x' from the compiler across this point. You need to make it volatile, or it might be moved across fesetround. Also, you can refine the constraint, say "=gx" on x86_64 (or just "=x"), and a different one on each platform. An operation is then: *lo = hide(hide(x)/hide(y)); The IFN_ way may be a possibility indeed. I believe a volunteer should first tackle -ftrapv in this way then to see how painful an exercise this is. Note that the issue with FENV access is not so much the operations themselves but the dependence on the fenv accesses which is what is missing at the moment. Without IL adjustments for "special" (non-value, non-memory) dependences the IFNs would need to appear to read and write global memory which may be somewhat detrimental to optimization. This issue doesn't exist for -ftrapv. Created attachment 46396 [details] poor mans solution^Whack So this is what a hack looks like, basically sprinkling those asm()s throughout the code automatically. Note I need to protect inputs, not outputs, otherwise the last testcase isn't fixed. Improving this poor-mans solution by writing in some flow-sensitivity like tracking which values are already protected and if there's a possibly harmful FENV access inbetween maybe in a similar way tree-complex.c tracks complex components might work. Note that the FENV pragma does _not_ enable -frounding-math (it really has no effect!) so you need to supply -frounding-math yourself (or fix the frontends to do that). It's a hack of course. But it fixes the testcase: > ./xgcc -B. t.c -O3 -lm > ./a.out 1/0.2: down = 4.999999999999999 near = 4.999999999999999 up = 4.999999999999999 a.out: t.c:32: main: Assertion `5.0 <= up' failed. Aborted > ./xgcc -B. t.c -O3 -lm -frounding-math > ./a.out 1/0.2: down = 4.999999999999999 near = 5 up = 5 IL after the lowering: main () { static const char __PRETTY_FUNCTION__[5] = "main"; double near; double up; double down; double op; int D.3058; op = atof ("0.2"); fesetround (1024); __asm__ __volatile__("" : "=g" op : "0" op); down = 1.0e+0 / op; fesetround (2048); __asm__ __volatile__("" : "=g" op : "0" op); up = 1.0e+0 / op; fesetround (0); __asm__ __volatile__("" : "=g" op : "0" op); near = 1.0e+0 / op; printf ("1/%.16g: down = %.16g near = %.16g up = %.16g\n", op, down, near, up); ... (In reply to Richard Biener from comment #36) > Created attachment 46396 [details] > poor mans solution^Whack > > So this is what a hack looks like, basically sprinkling those asm()s > throughout the code automatically. > > Note I need to protect inputs, not outputs, otherwise the last > testcase isn't fixed. Actually, you need to protect both inputs *and* outputs... > Improving this poor-mans solution by writing in some flow-sensitivity > like tracking which values are already protected At least if you use "=x" (or whatever the right constraint is on each target) it doesn't really hurt to have a dozen protections on the same variable. > and if there's a possibly > harmful FENV access in between maybe in a similar way tree-complex.c tracks > complex components might work. > > Note that the FENV pragma does _not_ enable -frounding-math (it really has > no effect!) so you need to supply -frounding-math yourself (or fix the > frontends to do that). If you protect even constants, the current effects of -frounding-math become redundant. (In reply to Marc Glisse from comment #37) > If you protect even constants, the current effects of -frounding-math become > redundant. Oops, forget that, the hack is too late for this sentence to be true, some constant propagation has already happened by that time. (In reply to Richard Biener from comment #36) > Created attachment 46396 [details] > poor mans solution^Whack > > So this is what a hack looks like, basically sprinkling those asm()s > throughout the code automatically. > > Note I need to protect inputs, not outputs, otherwise the last > testcase isn't fixed. > > Improving this poor-mans solution by writing in some flow-sensitivity > like tracking which values are already protected and if there's a possibly > harmful FENV access inbetween maybe in a similar way tree-complex.c tracks > complex components might work. > > Note that the FENV pragma does _not_ enable -frounding-math (it really has > no effect!) so you need to supply -frounding-math yourself (or fix the > frontends to do that). > > It's a hack of course. > > But it fixes the testcase: > > > ./xgcc -B. t.c -O3 -lm > > ./a.out > 1/0.2: down = 4.999999999999999 near = 4.999999999999999 up = > 4.999999999999999 > a.out: t.c:32: main: Assertion `5.0 <= up' failed. > Aborted > > ./xgcc -B. t.c -O3 -lm -frounding-math > > ./a.out > 1/0.2: down = 4.999999999999999 near = 5 up = 5 > > IL after the lowering: > > main () > { > static const char __PRETTY_FUNCTION__[5] = "main"; > double near; > double up; > double down; > double op; > int D.3058; > > op = atof ("0.2"); > fesetround (1024); > __asm__ __volatile__("" : "=g" op : "0" op); > down = 1.0e+0 / op; > fesetround (2048); > __asm__ __volatile__("" : "=g" op : "0" op); > up = 1.0e+0 / op; > fesetround (0); > __asm__ __volatile__("" : "=g" op : "0" op); > near = 1.0e+0 / op; > printf ("1/%.16g: down = %.16g near = %.16g up = %.16g\n", op, down, near, > up); > ... How does this work if op is a SSA_NAME? (In reply to Jakub Jelinek from comment #39) > (In reply to Richard Biener from comment #36) > > Created attachment 46396 [details] > > poor mans solution^Whack ^^^^ > How does this work if op is a SSA_NAME? it doens't, the patch has to be fixed to create a new def and adjust all uses which isn't possible here (no immediate uses). It's a proof-of-concept hack - the SSA name issue means we have to find a better place for such hack. Note I don't think we should go with this kind of hack, iff, then we should at least not use an ASM but some special __IFN and a more appropriate construct on the RTL side (not sure what that would be). It's likely that caring about exceptions would actually be worse for optimization than caring about rounding modes (because exceptions mean that floating-point operations can write global state, not just read it). I.e., a proper implementation would also indicate splitting -ftrapping-math into the existing parts relating only to local transformations, and something new for the global effects of operations being considered to write the exception state, and so not be movable past any code that might read it (which includes most function calls, and asms depending on whether they might read the floating-point state register). (There are plenty of local bugs in this area - both in machine-independent optimizations, and in machine-specific code, or libgcc code, that doesn't do the right thing regarding exceptions; lots of thorough tests would be needed to find such places. But in general the local bugs should be individually straightforward to fix in a way that the global issues aren't.) (See discussions on the gcc mailing list in Dec 2012 / Jan 2013 / Feb 2013 for more details in this area.) Created attachment 46502 [details]
other hack
Another approach.
* lowering in an optimization pass is idiotic, it only works at -O2+, but it shows the idea and should be easy to move anywhere.
* manually setting SSA_NAME_DEF_STMT seems strange, it probably should happen automatically as it does for an assignment.
I think this kind of approach makes sense. It can be made to work without too much effort, and then can be incrementally improved
0) handle vectors and complex
1) let targets replace "=g" with something nicer, say "=x" or "=xm" for SSE (we generate nonsense for "=gx").
2) allow targets to expand the operations as they like (add an opcode?)
3) add parsing of #pragma fenv and change flag_rounding_math according to it
4) enable it as well for flag_trapping_math (and stop making that the default!)
5) add some constant folding (mpfr can tell if operations are exact or raise any flag)
6) add other, more specific versions, for cases where we care about rounding but not flags, or the reverse, or when we know the rounding direction (possible in the newest C standard?), or...
etc
Note that the effect of changing the rounding mode after a computation, whether -frounding-math is used or not, is not just that the change of rounding mode may not be honored. If can yield inconsistencies in a block where the rounding mode is not changed. #include <stdio.h> #include <stdlib.h> #include <fenv.h> #pragma STDC FENV_ACCESS ON #define CST 0x1p-200 int main (void) { volatile double a = CST; double b = a, c = a, d; printf ("%a\n", 1.0 - b); fesetround (FE_DOWNWARD); printf ("%a\n", 1.0 - c); if (b == c && b == CST && c == CST) { printf ("%d\n", 1.0 - b == 1.0 - c); printf ("1: %a\n", 1.0 - b); printf ("2: %a\n", 1.0 - c); d = b == CST ? b : (abort (), 1.0); printf ("3: %a\n", 1.0 - d); d = b == CST ? b : 1.0; printf ("4: %a\n", 1.0 - d); } return 0; } With -std=c17 -frounding-math -O3 -lm, I get: 0x1p+0 0x1.fffffffffffffp-1 0 1: 0x1p+0 2: 0x1.fffffffffffffp-1 3: 0x1p+0 4: 0x1.fffffffffffffp-1 *** Bug 108318 has been marked as a duplicate of this bug. *** *** Bug 56020 has been marked as a duplicate of this bug. *** Fortran gets this right: $ cat set_rounding_mode.f90 module x implicit none integer, parameter :: wp = selected_real_kind(15) contains subroutine foo(a,b,c) use ieee_arithmetic real(kind=wp), dimension(4), intent(out) :: a real(kind=wp), intent(in) :: b, c type (ieee_round_type), dimension(4), parameter :: mode = & [ieee_nearest, ieee_to_zero, ieee_up, ieee_down] integer :: i do i=1,4 call ieee_set_rounding_mode (mode(i)) a(i) = b + c end do end subroutine foo end module x program main use x real(kind=wp), dimension(4) :: a call foo(a, 0.1_wp, 0.2_wp) print *,a end program main $ gfortran -O3 set_rounding_mode.f90 $ ./a.out 0.30000000000000004 0.29999999999999999 0.30000000000000004 0.29999999999999999 (In reply to Thomas Koenig from comment #46) > Fortran gets this right: ... but only by accident. This test case shows that it doesn't: $ cat y.f90 module y implicit none integer, parameter :: wp = selected_real_kind(15) contains subroutine foo(a,b,c) use ieee_arithmetic real(kind=wp), dimension(4), intent(out) :: a real(kind=wp), intent(in) :: b, c type (ieee_round_type), dimension(4), parameter :: mode = & [ieee_nearest, ieee_to_zero, ieee_up, ieee_down] call ieee_set_rounding_mode (mode(1)) a(1) = b + c call ieee_set_rounding_mode (mode(2)) a(2) = b + c call ieee_set_rounding_mode (mode(3)) a(3) = b + c call ieee_set_rounding_mode (mode(4)) a(4) = b + c end subroutine foo end module y program main use y real(kind=wp), dimension(4) :: a call foo(a, 0.1_wp, 0.2_wp) print *,a end program main $ gfortran -O y.f90 && ./a.out 0.30000000000000004 0.30000000000000004 0.30000000000000004 0.30000000000000004 $ gfortran y.f90 && ./a.out 0.30000000000000004 0.29999999999999999 0.30000000000000004 0.29999999999999999 Clang gets this right, even without the pragma; the original test case is compiled to pushq %r14 pushq %rbx subq $24, %rsp movq %rsi, %r14 movq %rdi, %rbx movsd %xmm1, 16(%rsp) # 8-byte Spill movsd %xmm0, 8(%rsp) # 8-byte Spill movl $1024, %edi # imm = 0x400 callq fesetround@PLT movsd 8(%rsp), %xmm0 # 8-byte Reload divsd 16(%rsp), %xmm0 # 8-byte Folded Reload movsd %xmm0, (%rbx) movl $2048, %edi # imm = 0x800 callq fesetround@PLT movsd 8(%rsp), %xmm0 # 8-byte Reload divsd 16(%rsp), %xmm0 # 8-byte Folded Reload movsd %xmm0, (%r14) addq $24, %rsp popq %rbx popq %r14 retq (In reply to Thomas Koenig from comment #48) > Clang gets this right, even without the pragma; The "even without the pragma" part is wrong. |