Bug 85957 - i686: Integers appear to be different, but compare as equal
Summary: i686: Integers appear to be different, but compare as equal
Status: REOPENED
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 7.3.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-28 19:05 UTC by Luke Shumaker
Modified: 2020-02-18 17:31 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-05-29 00:00:00


Attachments
The preprocessed source (53.91 KB, text/plain)
2018-05-28 19:05 UTC, Luke Shumaker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Shumaker 2018-05-28 19:05:54 UTC
Created attachment 44200 [details]
The preprocessed source

This is a bug that at first looks a bit like a "problems with floating
point numbers" bug.  However, my problem is with integers (calculated
from float types) behave inconsistently.

    a6 = a.dbl * 1e6;
    b6 = b.dbl * 1e6;
    printf ("a6 = %llu\n", a6); // prints "1"
    printf ("b6 = %llu\n", b6); // prints "0"
    printf ("(a6 == b6) = %s\n", (a6 == b6) ? "true" : "false"); // prints "true"

I understand why floating point math could result in a6 and b6 being
different; my concern is that a6 and b6 (which are integer types)
appear to be different, yet compare as being equal.

This happens on i686 with -O1 and -O2 (but not -O0), and not on
x86-64.

I apologize that my minimal testcase makes use of the glib-2.0
library; I'm having a hard time replicating the problem without it; it
seems GCC optimizing out a variable is key; and removing the library
use makes it not optimize it out.

Here is the output of gcc, including the appropriate version information:

    $ gcc -v -save-temps -O1 $(pkg-config --libs --cflags glib-2.0) demo.c -o demo

    Using built-in specs.
    COLLECT_GCC=/usr/bin/gcc
    COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/lto-wrapper
    Target: i686-pc-linux-gnu
    Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp
    Thread model: posix
    gcc version 7.3.1 20180312 (GCC) 
    COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-I' '/usr/include/glib-2.0' '-I' '/usr/lib/glib-2.0/include' '-o' 'demo' '-mtune=generic' '-march=pentiumpro'
     /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/cc1 -E -quiet -v -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include demo.c -mtune=generic -march=pentiumpro -O1 -fpch-preprocess -o demo.i
    ignoring nonexistent directory "/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/../../../../i686-pc-linux-gnu/include"
    #include "..." search starts here:
    #include <...> search starts here:
     /usr/include/glib-2.0
     /usr/lib/glib-2.0/include
     /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/include
     /usr/local/include
     /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/include-fixed
     /usr/include
    End of search list.
    COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-I' '/usr/include/glib-2.0' '-I' '/usr/lib/glib-2.0/include' '-o' 'demo' '-mtune=generic' '-march=pentiumpro'
     /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/cc1 -fpreprocessed demo.i -quiet -dumpbase demo.c -mtune=generic -march=pentiumpro -auxbase demo -O1 -version -o demo.s
    GNU C11 (GCC) version 7.3.1 20180312 (i686-pc-linux-gnu)
    	compiled by GNU C version 7.3.1 20180312, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.18-GMP
    
    GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
    GNU C11 (GCC) version 7.3.1 20180312 (i686-pc-linux-gnu)
    	compiled by GNU C version 7.3.1 20180312, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.18-GMP
    
    GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
    Compiler executable checksum: b94f7ca39249d495c6913c6ded8c0b64
    COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-I' '/usr/include/glib-2.0' '-I' '/usr/lib/glib-2.0/include' '-o' 'demo' '-mtune=generic' '-march=pentiumpro'
     as -v -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include --32 -o demo.o demo.s
    GNU assembler version 2.30 (i686-pc-linux-gnu) using BFD version (GNU Binutils) 2.30
    COMPILER_PATH=/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/:/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/:/usr/lib/gcc/i686-pc-linux-gnu/:/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/:/usr/lib/gcc/i686-pc-linux-gnu/
    LIBRARY_PATH=/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/:/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/../../../:/lib/:/usr/lib/
    COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-I' '/usr/include/glib-2.0' '-I' '/usr/lib/glib-2.0/include' '-o' 'demo' '-mtune=generic' '-march=pentiumpro'
     /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/collect2 -plugin /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/liblto_plugin.so -plugin-opt=/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/lto-wrapper -plugin-opt=-fresolution=demo.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr --hash-style=gnu -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -pie -o demo /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/../../../Scrt1.o /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/../../../crti.o /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/crtbeginS.o -L/usr/lib/gcc/i686-pc-linux-gnu/7.3.1 -L/usr/lib/gcc/i686-pc-linux-gnu/7.3.1/../../.. -lglib-2.0 demo.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/crtendS.o /usr/lib/gcc/i686-pc-linux-gnu/7.3.1/../../../crtn.o
    COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-I' '/usr/include/glib-2.0' '-I' '/usr/lib/glib-2.0/include' '-o' 'demo' '-mtune=generic' '-march=pentiumpro'

Attached is the preprocessed demo.i file.
Comment 1 Andrew Pinski 2018-05-28 19:12:42 UTC
>(calculated from float types)

Which is exactly just that.
64bit float point does not have 64bit of precision but rather 53bits.
On x86, since it uses 80bit fpu internally and does not round between the intermediate steps which is why you are getting two different answers.

*** This bug has been marked as a duplicate of bug 323 ***
Comment 2 Luke Shumaker 2018-05-28 20:39:22 UTC
I do not believe that this is a duplicate of bug 323.  As I wrote:

> As I understand why floating point math could result in a6 and b6 being
> different; my concern is that a6 and b6 (which are integer types)
> appear to be different, yet compare as being equal.

"a6" and "b6" are both variables with types that resolve to "long long unsigned integer".

    printf ("a6 = %llu\n", a6); // prints "a6 = 1"
    printf ("b6 = %llu\n", b6); // prints "b6 = 0"

That's fine, I understand that a6 and b6 could be different because of differing round-off between intermediate steps.  That's not my concern.

Note that a6 and b6 have should have concrete values at this point, as we have printed them.

My concern is the following:

    printf ("(a6 == b6) = %s\n",
            (a6 == b6) ? "true" : "false"); // prints "(a6 == b6) = true"

That is, the entire output of the POC program is:

    a6 = 1
    b6 = 0
    a6 == b6

I am not concerned that a6 and b6 disagree, or that they are equal.  I am concerned that *both* are true.
Comment 3 Andrew Pinski 2018-05-28 21:17:38 UTC
There is still rounding errors when it comes to the math you are doing.

*** This bug has been marked as a duplicate of bug 323 ***
Comment 4 Vincent Lefèvre 2018-05-28 22:32:30 UTC
(In reply to Andrew Pinski from comment #3)
> There is still rounding errors when it comes to the math you are doing.

Yes, but the issue here is much more serious, and I don't see this bug as a duplicate (bug 323 is just a cause of this more serious bug).

While it has been accepted that a floating-point variable can be multi-valued (except in C99/C11 modes), this must not be the case on a variable of integer type, even though the value of such a variable has been computed from a floating-point expression: Once a floating-point number has been converted into an integer type, the value of this integer must be fixed.
Comment 5 Andrew Pinski 2018-05-28 22:36:32 UTC
Try -std=c99 or -fexcess-precision=standard which will get you the behavior you want.
Comment 6 Vincent Lefèvre 2018-05-28 23:28:12 UTC
(In reply to Andrew Pinski from comment #5)
> Try -std=c99 or -fexcess-precision=standard which will get you the behavior
> you want.

This is not what is documented: "By default, -fexcess-precision=fast is in effect; this means that operations may be carried out in a wider precision than the types specified in the source if that would result in faster code, and it is unpredictable when rounding to the types specified in the source code takes place."

This means that in

  double x = 1.1 * 1.2;

x can be kept with excess precision (typically 64 bits instead of 53) or can be rounded to double depending on its use.

But here, one has:

  unsigned long long a6 = a.dbl * 1e6;

This is no longer just a rounding of a floating-point value, but a conversion to an integer type. From -fexcess-precision=fast, one cannot decide whether a6 will be 0 or 1, but once the value of a6 has been observed, it should no longer be allowed to change.
Comment 7 Alexander Monakov 2018-05-29 17:28:53 UTC
Reopening, the issue here is way more subtle than bug 323 and points to a possible issue in DOM. Hopefully Richi can have a look and comment.

It appears dom2 pass performs something like jump threading based on compile-time-evaluated floating-point expression values without also substituting those expressions in IR. At run time, they are evaluated to different values, leading to an inconsistency. Namely, dom2 creates bb 10:

  <bb 9>:
  # iftmp.1_1 = PHI <"true"(7), "false"(8), "true"(10)>
  printf ("(a6 == b6) = %s\n", iftmp.1_1);
  return 0;

  <bb 10>:
  _24 = __n2_13 * 1.0e+6;
  b6_25 = (guint64) _24;
  printf ("a6 = %llu\n", 1);
  printf ("b6 = %llu\n", b6_25);
  goto <bb 9>;

where jump to bb 9 implies that _24 evaluates to 1.0 and b6_25 to 1, but they are not substituted as such, and at run time evaluate to 0.99... and 0 due to excess precision.

The following reduced testcase demonstrates the same issue, but requires -fdisable-tree-dom3 (on gcc-6 at least, as otherwise dom3 substitutes results of compile-time evaluation).

__attribute__((noinline,noclone))
static double f(void)
{
  return 1e-6;
}

int main(void)
{
  double a = 1e-6, b = f();

  if (a != b) __builtin_printf("uneq");

  unsigned long long ia = a * 1e6, ib = b * 1e6;

  __builtin_printf("%lld %s %lld\n", ia, ia == ib ? "==" : "!=", ib);
}
Comment 8 Alexander Monakov 2018-05-30 04:05:21 UTC
To expand a bit: DOM makes the small testcase behave as if 'b' and 'ib' are evaluated twice:

* one time, 'b' is evaluated in precision matching 'a' (either infinite or double), and 'ib' is evaluated to 1; this instance is used in 'ia == ib' comparison;
* a second time, 'b' is evaluated in extended precision and 'ib' is evaluated to 0; this instance is passed as the last argument to printf.

This is surprising as the original program clearly evaluates 'b' and 'ib' just once.

If there's no bug in DOM and the observed transformation is allowed to happen when -fexcess-precision=fast is in effect, I think it would be nice to mention that in the compiler manual.
Comment 9 Alexander Monakov 2018-05-30 04:15:58 UTC
Sorry, the above comment should have said 'b * 1e6' every time it said 'b'.
Comment 10 Alexander Monakov 2018-05-30 06:06:55 UTC
Also note that both the original and the reduced testcase can be tweaked to exhibit the surprising transformation even when -fexcess-precision=standard is enabled. A "lazy" way is via -mpc64, but I think it's possible even without the additional option (by making the code more convoluted to enforce rounding to double). Here's what happens on the reduced testcase:

$ gcc -m32 d.c -O -fdisable-tree-dom3 && ./a.out 
cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295]
1 == 0

$ gcc -m32 d.c -O -fdisable-tree-dom3 -fexcess-precision=standard -mpc64 && ./a.out                                                                                                                                
cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295]
0 == 1
Comment 11 joseph@codesourcery.com 2018-05-31 22:50:38 UTC
On Mon, 28 May 2018, vincent-gcc at vinc17 dot net wrote:

> floating-point expression: Once a floating-point number has been converted into
> an integer type, the value of this integer must be fixed.

Yes, I agree that any particular conversion to integer executed in the 
abstract machine must produce some definite integer value for each 
execution.

(Conversions of *out-of-range* floating-point values to integer are a 
trickier case; under Annex F they produce unspecified values.  I think 
semantics along the lines of N2221 are fine for unspecified values arising 
from reading an uninitialized object, but more questionable for values 
arising from a floating-point-to-integer conversion.)
Comment 12 Rich Felker 2020-02-07 15:39:03 UTC
Note that -fexcess-precision=standard is not available in C++ mode to fix this.

However, -ffloat-store should also ensure consistency to the optimizer (necessary to prevent this bug, and other variants of it, from happening) at the expense of some extreme performance and code size costs and making the floating point results even more semantically incorrect (double-rounding all over the place, mismatching FLT_EVAL_METHOD==2) and -ffloat-store is available in C++ mode. Despite all these nasty effects, it may be a suitable workaround, and at least it avoids letting the optimizer prove 0==1, thereby effectively treating any affected code as if it contained UB.

Note that in code written to be excess-precision-aware, making use of float_t and double_t for intermediate operands and only using float and double for in-memory storage, -ffloat-store should yield behavior equivalent to -fexcess-precision=standard.
Comment 13 Vincent Lefèvre 2020-02-09 02:05:11 UTC
(In reply to Rich Felker from comment #12)
> [...] and making the floating point results even more semantically incorrect
> (double-rounding all over the place, mismatching FLT_EVAL_METHOD==2)

No problems: FLT_EVAL_METHOD==2 means "evaluate all operations and constants to the range and precision of the long double type", which is what really occurs. The consequence is indeed double rounding when storing in memory, but this can happen at *any* time even without -ffloat-store (due to spilling), because you are never sure that registers are still available; see some reports in bug 323.

Double rounding can be a problem with some codes, but this just means that the code is not compatible with FLT_EVAL_METHOD==2. For some floating-point algorithms, double rounding is not a problem at all, while keeping a result in extended precision will make them fail.
Comment 14 Rich Felker 2020-02-09 04:06:28 UTC
> No problems: FLT_EVAL_METHOD==2 means "evaluate all operations and constants to the range and precision of the long double type", which is what really occurs. The consequence is indeed double rounding when storing in memory, but this can happen at *any* time even without -ffloat-store (due to spilling), because you are never sure that registers are still available; see some reports in bug 323.

It sounds like you misunderstand the standard's requirements on, and GCC's implementation of, FLT_EVAL_METHOD==2/excess-precision. The availability of registers does not in any way affect the result, because when expressions are evaluated with excess precision, any spills must take place in the format of float_t or double_t (long double) and are thereby transparent to the application. The buggy behavior prior to -fexcess-precision=standard (and now produced with -fexcess-precision=fast which is default in "gnu" modes) spills in the nominal type, producing nondeterministic results that depend on the compiler's transformations and that lead to situations like this bug (where the optimizer has been lied to that two expressions are equal, but they're not).

> Double rounding can be a problem with some codes, but this just means that the code is not compatible with FLT_EVAL_METHOD==2. For some floating-point algorithms, double rounding is not a problem at all, while keeping a result in extended precision will make them fail.

With standards-conforming behavior, the rounding of an operation and of storage to an object of float/double type are discrete roundings and you can observe and handle the intermediate value between them. With -ffloat-store, every operation inherently has a double-rounding attached to it. This behavior is non-conforming but at least deterministic, and is what I was referring to in my previous comment. But I think this is largely a distraction from the issue at hand; I was only pointing out that -ffloat-store is a workaround, but one with its own (often severe) problems.
Comment 15 Vincent Lefèvre 2020-02-09 13:25:34 UTC
(In reply to Rich Felker from comment #14)
> It sounds like you misunderstand the standard's requirements on, and GCC's
> implementation of, FLT_EVAL_METHOD==2/excess-precision. The availability of
> registers does not in any way affect the result, because when expressions
> are evaluated with excess precision, any spills must take place in the
> format of float_t or double_t (long double) and are thereby transparent to
> the application.

The types float_t or double_t correspond to the evaluation format. Thus they are equivalent to long double if FLT_EVAL_METHOD is 2 (see 7.12p2). And GCC does not do spills in this format, as see in bug 323.

> With standards-conforming behavior, the rounding of an operation and of
> storage to an object of float/double type are discrete roundings and you can
> observe and handle the intermediate value between them. With -ffloat-store,
> every operation inherently has a double-rounding attached to it. This
> behavior is non-conforming

This is conforming as there is no requirement to keep intermediate results in excess precision and range.
Comment 16 Rich Felker 2020-02-09 15:30:32 UTC
> And GCC does not do spills in this format, as see in bug 323.

In my experience it seems to (assuming -fexcess-precision=standard), though I have not done extensive testing. I'll check and follow up.

> This is conforming as there is no requirement to keep intermediate results in excess precision and range.

Such behavior absolutely is non-conforming. The standard reads (5.2.4.2.2 ¶9):

"Except for assignment and cast (which remove all extra range and precision), the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type"

Note "are evaluated", not "may be evaluated depending on what spills the compiler chooses to perform".
Comment 17 Rich Felker 2020-02-09 15:35:05 UTC
And indeed you're right that GCC does it wrong. This can be seen from a minimal example:

double g(),h();
double f()
{
    return g()+h();
}

where gcc emits fstpl/fldp around the second call rather than fstpt/fldt.

So this is all even more broken that I thought. It looks like the only way to get deterministic behavior from GCC right now is to get the wrong deterministic behavior via -ffloat-store.

Note that libfirm/cparser gets the right result, emitting fstpt/fldt.
Comment 18 Rich Felker 2020-02-09 15:55:30 UTC
It was just pointed out to me that this might be an invalid test since GCC assumes (correctly or not) that the return value of a function does not have excess precision. I'll see if I can make a better test.
Comment 19 Rich Felker 2020-02-09 16:02:57 UTC
Test case provided by Szabolcs Nagy showing that GCC does seem to spill right if it can't assume there's no excess precision to begin with:

double h();
double ff(double x, double y)
{
    return x+y+h();
}

In theory this doesn't force a spill, but GCC seems to choose to do one, I guess to avoid having to preserve two incoming values (although they're already in stack slots that would be naturally preserved).

Here GCC 9.2 with -fexcess-precision=standard -O3 it emits fstpt/fldt.
Comment 20 Alexander Cherepanov 2020-02-11 16:36:25 UTC
Minimized testcase that should still be quite close to the original:

----------------------------------------------------------------------
#include <stdio.h>

__attribute__((noipa)) // imagine it in a separate TU
static double opaque(double d) { return d; }

int main()
{
    double d;
    do {
        d = opaque(1e6);
    } while (opaque(0));

    if (d == 1e6)
        printf("yes\n");

    int i = d * 1e-6;
    printf("i = %d\n", i);

    if (i == 1)
        printf("equal to 1\n");
}
----------------------------------------------------------------------
$ gcc -std=gnu11 -pedantic -Wall -Wextra -m32 -march=i686 -O3 test.c && ./a.out
yes
i = 0
equal to 1
----------------------------------------------------------------------

According to https://godbolt.org/z/AmkmS5 , this happens for gcc versions 8.1--9.2 but not for trunk (I haven't tried earlier versions).

With gcc 8.3.0 from the stable Debian it works like this:
- (as described in comment 7) 120t.dom2 merges two `if`s, in particular deducing that `i == 1` is true if `d == 1e6` is true but not substituting `i` in `printf`;
- 142t.loopinit introduces `# d_4 = PHI <d_8(3)>` between the loop and the first `if`;
- 181t.dom3 would fold computation of `i` in the `d == 1e6` branch but the introduced `PHI` seems to prevent this.

With gcc from the trunk a new pass 180t.fre4 removes that `PHI` and 182t.dom3 then does its work. (The numeration of passes changed slightly since gcc 8.3.0.)
Comment 21 Alexander Cherepanov 2020-02-11 16:37:00 UTC
The following variation works with the trunk:

----------------------------------------------------------------------
#include <stdio.h>

__attribute__((noipa)) // imagine it in a separate TU
static int opaque(int i) { return i; }

int main()
{
    static int a = 0;
    int d = opaque(1);

    if (opaque(0))
        puts("ignore");
    // need the next `if` to be at the start of a BB
    
    if (d == 1)
        a = 1;

    int i = d - 0x1p-60;

    if (i == 1)
        printf("i = %d\n", i);

    printf("i = %d\n", i);

    opaque(a);
}
----------------------------------------------------------------------
$ gcc -std=gnu11 -pedantic -Wall -Wextra -m32 -march=i686 -O3 test.c && ./a.out
i = 1
i = 0
----------------------------------------------------------------------
gcc x86-64 version: gcc (GCC) 10.0.1 20200211 (experimental)
----------------------------------------------------------------------

All the same but the computation of `i` is hoisted from the `if` in the 133t.pre pass so dom3 doesn't have a chance to fold it.

Another interesting aspect: there are no comparisons of floating-point numbers in this example, all FP operations are limited to a basic arithmetic and a conversion.
Comment 22 Alexander Cherepanov 2020-02-11 16:40:42 UTC
(In reply to joseph@codesourcery.com from comment #11)
> Yes, I agree that any particular conversion to integer executed in the 
> abstract machine must produce some definite integer value for each 
> execution.
The idea that floating-point variables could be unstable but integer variables have to be stable seems like an arbitrary boundary. But I guess this is deeply ingrained in gcc: the optimizer just assumes that integers are stable (e.g., optimizes `if (x != y && y == z) use(x == z);` for integers to `if (x != y && y == z) use(0);`) but it's ready for instability of FPs (e.g., doesn't do the same optimization for FPs).

When the stability of integers is violated everything blows up. This bug report show that instability of floating-point values extends to integers via casts. Another way is via comparisons -- I've just filed bug 93681 for it. There is also a testcase there that shows how such an instability can taint surrounding code.

So, yeah, it seems integers have to be stable. OTOH, now that there is sse and there is -fexcess-precision=standard floating-point values are mostly stable too. Perhaps various optimizations done for integers could be enabled for FPs too? Or the situation is more complicated?
Comment 23 Alexander Cherepanov 2020-02-11 16:45:17 UTC
(In reply to Alexander Monakov from comment #10)
> Also note that both the original and the reduced testcase can be tweaked to
> exhibit the surprising transformation even when -fexcess-precision=standard
> is enabled. A "lazy" way is via -mpc64

I think this is another problem. I filed bug 93682 for it.
Comment 24 joseph@codesourcery.com 2020-02-12 00:37:41 UTC
On Tue, 11 Feb 2020, ch3root at openwall dot com wrote:

> So, yeah, it seems integers have to be stable. OTOH, now that there is sse and
> there is -fexcess-precision=standard floating-point values are mostly stable
> too. Perhaps various optimizations done for integers could be enabled for FPs
> too? Or the situation is more complicated?

Well, 0.0 == -0.0 but it's not valid to substitute one for the other (and 
similarly with decimal quantum exponents), for example, so floating-point 
certainly has different rules for what's valid in this area.

I think fewer and fewer people care about x87 floating point nowadays; 
32-bit libraries are primarily for running old binaries, not new code.  
So x87 excess precision issues other than maybe the ABI ones for excess 
precision returns from standard library functions will become irrelevant 
in practice as people build 32-bit libraries with SSE (cf. 
<https://fedoraproject.org/wiki/Changes/Update_i686_architectural_baseline_to_include_SSE2>), 
and even the ABI ones will disappear in the context of builds with SSE as 
the remaining float and double not-bound-to-IEEE754-operations glibc libm 
functions with .S implementations move to C implementations once there are 
suitably optimized C implementations that prove faster in benchmarking.  
I'd encourage people who care about reliability with floating point on 
32-bit x86 to do that benchmarking work to justify such removals of 
x86-specific assembly.

However, if you want to fix such issues in GCC, it might be plausible to 
force the standard-conforming excess precision handling to always-on for 
x87 floating point (maybe except for the part relating to constants, since 
that seems to confuse users more).  There would still be the question of 
what to do with -mfpmath=sse+387.
Comment 25 Rich Felker 2020-02-12 02:09:44 UTC
I think standards-conforming excess precision should be forced on, and added to C++; there are just too many dangerous ways things can break as it is now. If you really think this is a platform of dwindling relevance (though I question that; due to the way patent lifetimes work, the first viable open-hardware x86 clones will almost surely lack sse, no?) then we should not have dangerous hacks for the sake of marginal performance gains, with too few people spending the time to deal with their fallout.

I'd be fine with an option to change the behavior of constants, and have it set by default for -std=gnu* as long as the unsafe behavior is removed from -std=gnu*.
Comment 26 joseph@codesourcery.com 2020-02-12 18:03:54 UTC
Adding the support for C++ would also be a matter for people who care 
about this platform that few people do now care about.

I suspect that if you force the back-end insn pattern effects of 
standards-conforming excess precision to on (i.e. stop the back end 
claiming to have SFmode / DFmode operations that use x87 floating point), 
that will still work for languages without special support, because of the 
optabs code that handles expanding using a wiser mode as necessary (even 
if not safe) - but while that would get XFmode spills, the GIMPLE code 
would still think some operations were being carried out in SFmode / 
DFmode, so without the front-end support you wouldn't eliminate optimizer 
anomalies.
Comment 27 Vincent Lefèvre 2020-02-12 18:07:45 UTC
(In reply to Rich Felker from comment #25)
> I think standards-conforming excess precision should be forced on, and added
> to C++; there are just too many dangerous ways things can break as it is
> now.

+1

People who currently build x87 software could choose between different solutions, such as switching to SSE if possible, switching back to "fast" excess precision if speed is really important and they know that it will work (this requires testing). They could even try -ffast-math (some of its optimizations are actually less dangerous than "fast" excess precision)...
Comment 28 Alexander Cherepanov 2020-02-18 17:16:39 UTC
The -funsafe-math-optimizations option has a similar problem (on all processors, I assume) -- I've just filed pr93806 for it. I guess unstable FP results are essential for this mode but integers computed from FPs should be somehow guarded from instability.

I don't know if anybody cares about -funsafe-math-optimizations but if it's fixed then x87 will get this improvement for free:-)