95450 – [10/11 regression] Wrong long double folding

Bug 95450 - [10/11 regression] Wrong long double folding

Summary: [10/11 regression] Wrong long double folding

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	10.1.0

Importance:	P3 normal
Target Milestone:	10.3
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2020-05-31 12:51 UTC by Andreas Schwab
Modified:	2020-08-25 18:25 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:	powerpc--*
Build:
Known to work:
Known to fail:
Last reconfirmed:	2020-08-11 00:00:00

Attachments
gcc11-pr95450.patch (1.04 KB, patch) 2020-08-11 17:26 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andreas Schwab 2020-05-31 12:51:24 UTC

$ cat test-float.c
#include <float.h>
#include <assert.h>

union gl_long_double_union
  {
    struct { double hi; double lo; } dd;
    long double ld;
  };

const union gl_long_double_union gl_LDBL_MAX =
  { { DBL_MAX, DBL_MAX / (double)134217728UL / (double)134217728UL } };
# undef LDBL_MAX
# define LDBL_MAX (gl_LDBL_MAX.ld)

int
main ()
{
  volatile long double m = LDBL_MAX;

  assert (m + m > m);
}
$ gcc -O2 test-float.c
$ ./a.out
a.out: test-float.c:20: main: Assertion `m + m > m' failed.
Aborted

test-float.c.234t.optimized contains:

  m ={v} 1.79769313486231580793728971405302307166001572487e+308;

but that evaluates to Inf.  DBL_MAX is 1.79769313486231570814527423731704e+308L.

Comment 1 Richard Biener 2020-06-02 07:32:17 UTC

Does it work with -O0?  I guess that we fold the read from gl_LDBL_MAX
in a "wrong" way, thus native_interpret_expr is maybe wrong?

Comment 2 Andreas Schwab 2020-06-02 08:34:36 UTC

No, it doesn't.

Comment 3 Richard Biener 2020-07-23 06:51:46 UTC

GCC 10.2 is released, adjusting target milestone.

Comment 4 Carlos O'Donell 2020-08-11 16:14:22 UTC

(In reply to Andreas Schwab from comment #0)
> $ cat test-float.c
> #include <float.h>
> #include <assert.h>
> 
> union gl_long_double_union
>   {
>     struct { double hi; double lo; } dd;
>     long double ld;
>   };
> 
> const union gl_long_double_union gl_LDBL_MAX =
>   { { DBL_MAX, DBL_MAX / (double)134217728UL / (double)134217728UL } };
> # undef LDBL_MAX
> # define LDBL_MAX (gl_LDBL_MAX.ld)
> 
> int
> main ()
> {
>   volatile long double m = LDBL_MAX;
> 
>   assert (m + m > m);
> }
> $ gcc -O2 test-float.c
> $ ./a.out
> a.out: test-float.c:20: main: Assertion `m + m > m' failed.
> Aborted
> 
> test-float.c.234t.optimized contains:
> 
>   m ={v} 1.79769313486231580793728971405302307166001572487e+308;
> 
> but that evaluates to Inf.  DBL_MAX is
> 1.79769313486231570814527423731704e+308L.

This comes from gnulib's use of lib/float.h.

My question is why is gnulib using float.h on power? What makes the system float.h unsuable?

Even if you fix this for your package including gnulib, the next failure you run into is this one:

test-float.c:324: assertion 'x + x == x' failed
Aborted (core dumped)

Extracting from the test case:

#include <stdio.h>
#include <assert.h>
#include <float.h>
#include <math.h>

int
main (void)
{
  int n = 107;
  volatile long double m = LDBL_MAX;
  volatile long double pow2_n = powl (2, n);
  volatile long double x = m + (m / pow2_n);

  printf ("n = %d\n", n);
  printf ("m = %Lf (%La)\n", m, m);
  printf ("pow2_n = %Lf (%La)\n", pow2_n, pow2_n);
  printf ("m / pow2_n = %Lf (%La)\n", (m / pow2_n), (m / pow2_n));
  printf ("x = %Lf (%La)\n", x, x);

  if (x > m)
    assert (x + x == x);
  return 0;
}

gcc -o ~/test-ldbl-max ~/test-ldbl-max.c -lm

~/test-ldbl-max
n = 107
m = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffff8p+1023)
pow2_n = 162259276829213363391578010288128.000000 (0x1p+107)
m / pow2_n = 1107913932560222581216724223049124694376931327937918798971295069363205703164244740389102844506567402654244799528342026118673562844811584683014545030137100678976901567468093855075985516353544747282849589098225960074532039651619564827101237983225846137075291097947344654582153216.000000 (0x1.fffffffffffff7ffffffffffff8p+916)
x = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffffcp+1023)
test-ldbl-max: /root/test-ldbl-max.c:21: main: Assertion `x + x == x' failed.
Aborted (core dumped)

Is this just a function of double double?

That there is something representable that is larger than LDBL_MAX, but isn't valid given the double-double rules?

Comment 5 Marek Polacek 2020-08-11 16:17:25 UTC

Confirmed.

Comment 6 Jakub Jelinek 2020-08-11 16:19:04 UTC

The problem is that this gl_LDBL_MAX.ld is really the right maximum normalized double double number, but is one ulp larger than GCC's __LDBL_MAX__.
The former is:
0x1.fffffffffffff7ffffffffffffc000p+1023
and the latter is:
0x1.fffffffffffff7ffffffffffff8000p+1023
The reason why gcc doesn't like the former and will treat it as infinity is that
GCC internally treats the double double as having a 106 bit precision, but the former number is too large for 106 bit precision, it requires 107 bit precision.
If we'd want to handle double double "properly" in GCC, we'd need to emulate it the way it is actually implemented, as a pair of doubles, and have all the operations defined for those (the question is what to do for transcendentals etc.), by recursing on real_* operations on both doubles.

Comment 7 Jakub Jelinek 2020-08-11 16:46:29 UTC

Bisection points to my change - r280141 aka
r10-5900-gea69031c5facc70e4a96df83cd58702900fd54b6
That changed:
-  _1 = gl_LDBL_MAX.ld;
-  m ={v} _1;
to:
+  m ={v} 1.79769313486231580793728971405302307166001572487395108634e+308;
So, either on the gnulib side one can drop the const from gl_LDBL_MAX, so that nothing tries to optimize it (or make it const volatile?), or perhaps the compiler could completely punt on all optimizations with double double in the
+         if (len > 0)
+           return native_interpret_expr (type, buf, len);
gimple-fold.c code (i.e. when using native_encode_initializer first) and target is double double, or just punt for this specific case?

Comment 8 Jakub Jelinek 2020-08-11 16:54:17 UTC

Or as an ugly hack for floating types with MODE_COMPOSITE_P (TYPE_MODE (mode))
in that spot, after using native_interpret_expr do native_encode_expr again and compare if the bits are identical (or perhaps do it for all floating point values, e.g. to deal with Intel magic values, NaN canonicalization etc.?

Comment 9 Jakub Jelinek 2020-08-11 17:26:52 UTC

Created attachment 49045 [details]
gcc11-pr95450.patch

Untested fix.  Or as I said, it could be limited to
&& COMPOSITE_MODE_P (element_mode (type))
only too.

Comment 10 GCC Commits 2020-08-25 05:18:58 UTC

The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:9f2f79df19fbfaa1c4be313c2f2b5ce04646433e

commit r11-2830-g9f2f79df19fbfaa1c4be313c2f2b5ce04646433e
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Aug 25 07:17:10 2020 +0200

    gimple-fold: Don't optimize wierdo floating point value reads [PR95450]
    
    My patch to introduce native_encode_initializer to fold_ctor_reference
    apparently broke gnulib/m4 on powerpc64.
    There it uses a const union with two doubles and corresponding IBM double
    double long double which actually is the largest normalizable long double
    value (1 ulp higher than __LDBL_MAX__).  The reason our __LDBL_MAX__ is
    smaller is that we internally treat the double double type as one having
    106-bit precision, but it actually has a variable 53-bit to 2000-ish bit precision
    and for the
    0x1.fffffffffffff7ffffffffffffc000p+1023L
    value gnulib uses we need 107-bit precision, therefore for GCC __LDBL_MAX__
    is
    0x1.fffffffffffff7ffffffffffff8000p+1023L
    Before my changes, we wouldn't be able to fold_ctor_reference it and it
    worked fine at runtime, but with the change we are able to do that, but
    because it is larger than anything we can handle internally, we treat it
    weirdly.  Similar problem would be if somebody creates this way valid,
    but much more than 106 bit precision e.g. 1.0 + 1.0e-768.
    Now, I think similar problem could happen e.g. on i?86/x86_64 with long
    double there, it also has some weird values in the format, e.g. the
    unnormals, pseudo infinities and various other magic values.
    
    This patch for floating point types (including vector and complex types
    with such elements) will try to encode the returned value again and punt
    if it has different memory representation from the original.  Note, this
    is only done in the path where native_encode_initializer was used, in order
    not to affect e.g. just reading an unpunned long double value; the value
    should be compiler generated in that case and thus should be properly
    representable.  It will punt also if e.g. the padding bits are initialized
    to non-zero values.
    
    I think the verification that what we encode can be interpreted back
    woiuld be only an internal consistency check (so perhaps for ENABLE_CHECKING
    if flag_checking only, but if both directions perform it, then we need
    to avoid mutual recursion).
    While for the other direction (interpretation), at least for the broken by
    design long doubles we just know we can't represent in GCC all valid values.
    The other floating point formats are just theoretical case, perhaps we would
    canonicalize something to a value that wouldn't trigger invalid exception
    when without canonicalization it would trigger it at runtime, so let's just
    ignore those.
    
    Adjusted (so far untested) patch to do it in native_interpret_real instead
    and limit it to the MODE_COMPOSITE_P cases, for which e.g.
    fold-const.c/simplify-rtx.c punts in several other places too because we just
    know we can't represent everything.
    
    E.g.
          /* Don't constant fold this floating point operation if the
             result may dependent upon the run-time rounding mode and
             flag_rounding_math is set, or if GCC's software emulation
             is unable to accurately represent the result.  */
          if ((flag_rounding_math
               || (MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations))
              && (inexact || !real_identical (&result, &value)))
            return NULL_TREE;
    Or perhaps guard it with MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations
    too, thus break what gnulib / m4 does with -ffast-math, but not normally?
    
    2020-08-25  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/95450
            * fold-const.c (native_interpret_real): For MODE_COMPOSITE_P modes
            punt if the to be returned REAL_CST does not encode to the bitwise
            same representation.
    
            * gcc.target/powerpc/pr95450.c: New test.

Comment 11 GCC Commits 2020-08-25 17:45:33 UTC

The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7e53436da1902061af797d0aaa744c52bd9829ae

commit r10-8669-g7e53436da1902061af797d0aaa744c52bd9829ae
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Aug 25 07:17:10 2020 +0200

    gimple-fold: Don't optimize wierdo floating point value reads [PR95450]
    
    My patch to introduce native_encode_initializer to fold_ctor_reference
    apparently broke gnulib/m4 on powerpc64.
    There it uses a const union with two doubles and corresponding IBM double
    double long double which actually is the largest normalizable long double
    value (1 ulp higher than __LDBL_MAX__).  The reason our __LDBL_MAX__ is
    smaller is that we internally treat the double double type as one having
    106-bit precision, but it actually has a variable 53-bit to 2000-ish bit precision
    and for the
    0x1.fffffffffffff7ffffffffffffc000p+1023L
    value gnulib uses we need 107-bit precision, therefore for GCC __LDBL_MAX__
    is
    0x1.fffffffffffff7ffffffffffff8000p+1023L
    Before my changes, we wouldn't be able to fold_ctor_reference it and it
    worked fine at runtime, but with the change we are able to do that, but
    because it is larger than anything we can handle internally, we treat it
    weirdly.  Similar problem would be if somebody creates this way valid,
    but much more than 106 bit precision e.g. 1.0 + 1.0e-768.
    Now, I think similar problem could happen e.g. on i?86/x86_64 with long
    double there, it also has some weird values in the format, e.g. the
    unnormals, pseudo infinities and various other magic values.
    
    This patch for floating point types (including vector and complex types
    with such elements) will try to encode the returned value again and punt
    if it has different memory representation from the original.  Note, this
    is only done in the path where native_encode_initializer was used, in order
    not to affect e.g. just reading an unpunned long double value; the value
    should be compiler generated in that case and thus should be properly
    representable.  It will punt also if e.g. the padding bits are initialized
    to non-zero values.
    
    I think the verification that what we encode can be interpreted back
    woiuld be only an internal consistency check (so perhaps for ENABLE_CHECKING
    if flag_checking only, but if both directions perform it, then we need
    to avoid mutual recursion).
    While for the other direction (interpretation), at least for the broken by
    design long doubles we just know we can't represent in GCC all valid values.
    The other floating point formats are just theoretical case, perhaps we would
    canonicalize something to a value that wouldn't trigger invalid exception
    when without canonicalization it would trigger it at runtime, so let's just
    ignore those.
    
    Adjusted (so far untested) patch to do it in native_interpret_real instead
    and limit it to the MODE_COMPOSITE_P cases, for which e.g.
    fold-const.c/simplify-rtx.c punts in several other places too because we just
    know we can't represent everything.
    
    E.g.
          /* Don't constant fold this floating point operation if the
             result may dependent upon the run-time rounding mode and
             flag_rounding_math is set, or if GCC's software emulation
             is unable to accurately represent the result.  */
          if ((flag_rounding_math
               || (MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations))
              && (inexact || !real_identical (&result, &value)))
            return NULL_TREE;
    Or perhaps guard it with MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations
    too, thus break what gnulib / m4 does with -ffast-math, but not normally?
    
    2020-08-25  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/95450
            * fold-const.c (native_interpret_real): For MODE_COMPOSITE_P modes
            punt if the to be returned REAL_CST does not encode to the bitwise
            same representation.
    
            * gcc.target/powerpc/pr95450.c: New test.
    
    (cherry picked from commit 9f2f79df19fbfaa1c4be313c2f2b5ce04646433e)

Comment 12 Jakub Jelinek 2020-08-25 18:25:02 UTC

Fixed for 10.3+ and 11+.