38960 – Wrong floating point reorder with fetestexcept

Bug 38960 - Wrong floating point reorder with fetestexcept

Summary: Wrong floating point reorder with fetestexcept

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.3.2

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Duplicates (1):	85633 (view as bug list)
Depends on:
Blocks:

Reported:	2009-01-24 14:57 UTC by Abramo Bagnara
Modified:	2021-09-15 21:09 UTC (History)
CC List:	3 users (show)

See Also:	6065 34678
Host:
Target:	i486-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2009-01-24 15:43:40

Attachments
Assembler generated by gcc -S -O2 bug.c (476 bytes, text/plain) 2009-01-24 15:14 UTC, Abramo Bagnara	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Abramo Bagnara 2009-01-24 14:57:40 UTC

The program below show that gcc reorder floating point instructions in such a way to make inexact checking fruitless.

Reading generated assembler I see two problems:

1) the cast to float in x assignment is executed *after* fetestexcept and not before as it's written (and needed to get the correct result). This infringes C99 standard sequence point rules.

2) the second division is not recomputed (because CSE), then inexact flag is not changed after feclearexcept

I guess that the latter is due to missing #pragma STDC FENV_ACCESS implementation, but the former undermine the whole fetestexcept usability.

$ cat bug.c
#include <fenv.h>
#include <stdio.h>

double vf = 0x0fffffff;
double vg = 0x10000000;

/* vf/vg is exactly representable as IEC559 64 bit floating point,
   while it's not representable exactly as a 32 bit one */

int main() {
  double a = vf;
  double b = vg;

  feclearexcept(FE_INEXACT);
  float x;
  x = a / b;
  printf("%i %.1000g\n", fetestexcept(FE_INEXACT), x);

  feclearexcept(FE_INEXACT);
  double y;
  y = a / b;
  printf("%i %.1000g\n", fetestexcept(FE_INEXACT), y);
  return 0;
}
$ gcc -O2 bug.c -lm
$ ./a.out
0 1
0 0.9999999962747097015380859375
$

Comment 1 Abramo Bagnara 2009-01-24 15:14:18 UTC

Created attachment 17176 [details]
Assembler generated by gcc -S -O2 bug.c

Comment 2 Richard Biener 2009-01-24 15:43:40 UTC

It is both due to missing #pragma STDC FENV_ACCESS

GCC does not have a way to represent use/def of floating-point status, so
the call to fetestexcept is not a barrier for moving floating-point
operations.  In fact, it will be hard to represent this.

Comment 3 Richard Biener 2018-05-04 07:43:11 UTC

*** Bug 85633 has been marked as a duplicate of this bug. ***

Comment 4 Richard Biener 2018-05-04 07:49:21 UTC

Note that a not too disruptive "implementation" of the dependences would be
to add outgoing abnormal edges to the fenv* calls.  Not too
disruptive in terms of implementation - the effect on code generation might
be very noticable though (note that all calls to functions that might call
fenv* functions themselves are subject to the same treatment).  Of course
there's the (existing) issue of RTL expansion not maintaining abnormal edges.

You can experiment with this by declaring the fenv* functions with
__attribute__((returns_twice)).

Note w/o also having incoming abnormal edges this might not be a full barrier
for downward motion.

Comment 5 jsm-csl@polyomino.org.uk 2018-05-04 16:06:09 UTC

Since any non-const function can examine floating-point state, I'd expect 
significant effects on code generation.  (Whether this also applies to 
asms depends on the architecture; some architectures have a register name 
you can use in asm operands to refer to floating-point state, and in those 
cases asms reading or writing that state "should" say explicitly that they 
do so, but I don't think all architectures have such a name supported by 
GCC in asms.)

Comment 6 rguenther@suse.de 2018-05-08 07:52:38 UTC

On Fri, 4 May 2018, joseph at codesourcery dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38960
> 
> --- Comment #5 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
> Since any non-const function can examine floating-point state, I'd expect 
> significant effects on code generation.  (Whether this also applies to 
> asms depends on the architecture; some architectures have a register name 
> you can use in asm operands to refer to floating-point state, and in those 
> cases asms reading or writing that state "should" say explicitly that they 
> do so, but I don't think all architectures have such a name supported by 
> GCC in asms.)

That's true.  GCCs job would then be to prove and IPA-propagate knowledge
of which functions actually do access FP state.

If it actually works (still needs to be proven by experiment) it is
still the simplest approach for "fixing" the issue.  If it works
a first enhancement would be to not re-use returns_twice but invent
a new attribute so we can do more careful abnormal edge creation.

An alternative fix could involve forcing all FP computation results
to (addressable aka aliasable) memory and make FP state accesses
also access all (FP?) memory.

Alternatively all FP ops could be "lowered" to internal functions
and thus basically hidden from the optimizers.  Dependences to
FP state accessors can be handled as memory dependence then.  This
lowering would be similar to what is proposed for a -ftrapv replacement.
The issue then remains on the RTL side though (but maybe we're lucky
and re-ordering doesn't happen there and/or we could expand suitable
barriers before and after possible FP state accesses).

Another alternative would be to try to model the FP state explicitely.
With the right infrastructure this would allow modeling other CPU
state (CC flags) in a similar way.

I think that the force-to-memory variant isn't really worth exploring
since it involves a lot of engineering with questionable benefit
over the "simple" solution(s).