Bug 107569 - [13 Regression] Failure to optimize std::isfinite since r13-3596
Summary: [13 Regression] Failure to optimize std::isfinite since r13-3596
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P1 normal
Target Milestone: 13.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2022-11-08 12:09 UTC by Jakub Jelinek
Modified: 2023-03-23 16:22 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 12.2.0
Known to fail:
Last reconfirmed: 2023-02-27 00:00:00


Attachments
preprocessed testcase for comment #2 (63.16 KB, text/plain)
2022-11-09 14:30 UTC, Aldy Hernandez
Details
gcc13-pr107569.patch (2.07 KB, patch)
2022-11-10 09:36 UTC, Jakub Jelinek
Details | Diff
gcc13-pr107569.patch (2.27 KB, patch)
2022-11-10 10:40 UTC, Jakub Jelinek
Details | Diff
do not set NAN sign in frange::set_nonnegative() (1.09 KB, application/mbox)
2022-11-10 13:35 UTC, Aldy Hernandez
Details
gcc13-pr107569-div.patch (1.67 KB, patch)
2022-11-10 16:12 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2022-11-08 12:09:24 UTC
Since r13-3596-ge7310e24b1c0ca67b1bb507c1330b2bf39e59e32
we don't optimize foo in 1 into return 1:

namespace std
{
  constexpr bool isfinite (float x) { return __builtin_isfinite (x); }
  constexpr bool isfinite (double x) { return __builtin_isfinite (x); }
  constexpr bool isfinite (long double x) { return __builtin_isfinite (x); }
}

bool
foo (double x)
{
  if (!std::isfinite (x))
    __builtin_unreachable ();

  return std::isfinite (x);
}

bool
bar (double x)
{
  [[assume (std::isfinite (x))]];
  return std::isfinite (x);
}

While bar hasn't been optimized before, it would be nice to make it working too.
Comment 1 Pilar Latiesa 2022-11-08 12:59:26 UTC
I also noted that the following testcase:

#include <cmath>

struct TVec { double x, y; };

double dot(TVec const &u, TVec const &v)
  { return u.x * v.x + u.y * v.y; }

double mag(TVec const &u)
{
    if (!(dot(u, u) >= 0.0)) __builtin_unreachable();
    return std::sqrt(dot(u, u));
}

results in:

(with -O3 -march=skylake --param=vrp1-mode=vrp)

mag(TVec const&):
        vmovsd  xmm1, QWORD PTR [rdi+8]
        vmovsd  xmm0, QWORD PTR [rdi]
        vmulsd  xmm1, xmm1, xmm1
        vfmadd132sd     xmm0, xmm1, xmm0
        vsqrtsd xmm0, xmm0, xmm0
        ret

(with -O3 -march=skylake --param=vrp1-mode=ranger)

mag(TVec const&):
        vmovsd  xmm1, QWORD PTR [rdi+8]
        vmovsd  xmm0, QWORD PTR [rdi]
        vmulsd  xmm1, xmm1, xmm1
        vfmadd132sd     xmm0, xmm1, xmm0
        vxorpd  xmm1, xmm1, xmm1
        vucomisd        xmm1, xmm0
        ja      .L11
        vsqrtsd xmm0, xmm0, xmm0
        ret
.L11:
        jmp     sqrt
Comment 2 Jakub Jelinek 2022-11-08 13:34:32 UTC
Perhaps we should in vrp1 defer removal of __builtin_unreachable (), at least
the cases where we don't turn that into some useful range?
That of course doesn't improve the assume attribute case.
Comment 3 Andrew Macleod 2022-11-08 14:46:27 UTC
(In reply to Jakub Jelinek from comment #2)
> Perhaps we should in vrp1 defer removal of __builtin_unreachable (), at least
> the cases where we don't turn that into some useful range?
> That of course doesn't improve the assume attribute case.

well we do improve it tho.  in vrp1:

Global Exported: _5 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN
Global Exported (via unreachable): _9 = [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN
Removing basic block 3
Merging blocks 2 and 4
double mag (const struct TVec & u)
{
  double _3;
  double _5;
  double _6;
  double _7;
  double _8;
  double _9;

  <bb 2> [local count: 1073741824]:
  _3 = u_2(D)->x;
  _6 = _3 * _3;
  _7 = u_2(D)->y;
  _8 = _7 * _7;
  _9 = _6 + _8;
  _5 = sqrt (_9);
  return _5;

The problem appears to be that the cdce pass runs afterwards, and introduces:

  _3 = u_2(D)->x;
  _6 = _3 * _3;
  _7 = u_2(D)->y;
  _8 = _7 * _7;
  _9 = _6 + _8;
  DCE_COND_LB.7_10 = _9;
  DCE_COND_LB_TEST.8_11 = DCE_COND_LB.7_10 u>= 0.0;
  if (DCE_COND_LB_TEST.8_11 != 0)
    goto <bb 5>; [99.95%]
  else
    goto <bb 4>; [0.05%]

  <bb 5> [local count: 1073204960]:
  _5 = .SQRT (_9);
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 536864]:
  _12 = sqrt (_9);

without checking the there is a global range for _9 which is [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN

now, does that Nan cause u>= 0 not to be true? I notice we don't fold that in vrp2.   I will let aldy pitch in on the rest of this.
Comment 4 Jakub Jelinek 2022-11-08 14:53:24 UTC
The cdce case is something I've mentioned today:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605338.html
The comparisons in there are artificial and so unlike user comparisons we should (if they were marked somehow) be able to optimize them away if frange can prove their result.
But that isn't something happening on the #c0 testcase, is it?
Comment 5 Andrew Macleod 2022-11-08 15:14:30 UTC
(In reply to Jakub Jelinek from comment #4)
> The cdce case is something I've mentioned today:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605338.html
> The comparisons in there are artificial and so unlike user comparisons we
> should (if they were marked somehow) be able to optimize them away if frange
> can prove their result.
> But that isn't something happening on the #c0 testcase, is it?

in vrp2 I see:
384      range_of_stmt () at stmt if (_9 u>= 0.0)
385        range_of_expr(_9) at stmt if (_9 u>= 0.0)
           TRUE : (385) range_of_expr (_9) [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN
         TRUE : (384) range_of_stmt () [irange] bool VARYING

so we think that 
[frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN  u>=   0.0  does not fold.  

possibly some signalling NaN thing not allowing us to remove it?
Comment 6 Jakub Jelinek 2022-11-08 15:34:02 UTC
__builtin_isfinite (x) is implemented as
~UNGT_EXPR<ABS_EXPR<x>, DBL_MAX>.
So, if we have:
  _3 = ABS_EXPR <x_2(D)>;
  _4 = _3 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _5 = ~_4;
  return _5;
for assume function, we should start with [1,1] assumption on _5, that implies
_4 [0,0], and from UNGT_EXPR being false, we should derive that
_3 is [-inf, 1.79769313486231570814527423731704356798070567525844996599e+308] and not NaN (if any operand is NaN, UNGT_EXPR is true, it stands for unordered or greater).
And from _3 being [-inf, 1.79769313486231570814527423731704356798070567525844996599e+308] not NaN
we should derive that x_2(D) is [-1.79769313486231570814527423731704356798070567525844996599e+308,
1.79769313486231570814527423731704356798070567525844996599e+308] not NaN
(aka finite number).
Comment 7 Andrew Macleod 2022-11-08 15:53:46 UTC
(In reply to Jakub Jelinek from comment #6)
> __builtin_isfinite (x) is implemented as
> ~UNGT_EXPR<ABS_EXPR<x>, DBL_MAX>.
> So, if we have:
>   _3 = ABS_EXPR <x_2(D)>;
>   _4 = _3 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
>   _5 = ~_4;
>   return _5;
> for assume function, we should start with [1,1] assumption on _5, that
> implies
> _4 [0,0], and from UNGT_EXPR being false, we should derive that
> _3 is [-inf,
> 1.79769313486231570814527423731704356798070567525844996599e+308] and not NaN
> (if any operand is NaN, UNGT_EXPR is true, it stands for unordered or
> greater).
> And from _3 being [-inf,
> 1.79769313486231570814527423731704356798070567525844996599e+308] not NaN
> we should derive that x_2(D) is
> [-1.79769313486231570814527423731704356798070567525844996599e+308,
> 1.79769313486231570814527423731704356798070567525844996599e+308] not NaN
> (aka finite number).

oh, we're talking different cases.  lets se.

x_2(D) -> [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +-NAN
_3 -> [frange] double [-Inf, 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)]
_4 -> [irange] bool [0, 0] NONZERO 0x0
_5 -> [irange] bool [1, 1]

so we arent getting that x_2 is not NaN.
THat would be a failing of foperator_abs::op1_range to not add the nans if the LHS has no NaNs then?  it looks like the range is correct, but non the NaN.

So I guess there are 2 different issues in this PR.  I was orignally looking at the second one.
Comment 8 Jakub Jelinek 2022-11-08 15:55:29 UTC
Under debugger I see unordered_gt::op1_range coming up with:
[frange] double [-Inf, 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)]
which looks right.
But then foperator_abs::op1_range makes:
[frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +-NAN
out of that.  That is correct except for the +-NAN part.
Comment 9 Jakub Jelinek 2022-11-08 16:00:20 UTC
Before the
1281	  // Then add the negative of each pair:
1282	  // ABS(op1) = [5,20] would yield op1 => [-20,-5][5,20].
1283	  r.union_ (frange (type,
1284			    real_value_negate (&positives.upper_bound ()),
1285			    real_value_negate (&positives.lower_bound ())));
part r is
[frange] double [0.0 (0x0.0p+0), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)]
so that is still ok.
So perhaps we need to clear_nan on the newly created frange for the negative part
(or, if r is a maybe nan make sure it has varying bit and if it is not nan, not be in there).
Comment 10 Jakub Jelinek 2022-11-08 17:25:06 UTC
So:
--- gcc/range-op-float.cc.jj	2022-11-06 11:56:27.138137781 +0100
+++ gcc/range-op-float.cc	2022-11-08 18:13:18.026974667 +0100
@@ -1280,9 +1280,10 @@ foperator_abs::op1_range (frange &r, tre
     return true;
   // Then add the negative of each pair:
   // ABS(op1) = [5,20] would yield op1 => [-20,-5][5,20].
-  r.union_ (frange (type,
-		    real_value_negate (&positives.upper_bound ()),
-		    real_value_negate (&positives.lower_bound ())));
+  frange negatives (type, real_value_negate (&positives.upper_bound ()),
+		    real_value_negate (&positives.lower_bound ()));
+  negatives.clear_nan ();
+  r.union_ (negatives);
   return true;
 }
 
fixes the abs backwards propagation, but for some reason it doesn't help to help the assume case.
*.assumptions has:
x_2(D) -> [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+3
08 (0x0.fffffffffffff8p+1024)]
but
  # RANGE [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+3
08 (0x0.fffffffffffff8p+1024)] +-NAN
  double x_2(D) = x;
but then say vrp2:
_Z3bard._assume.0 assume inferred range of x_1(D) (param x) = [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.7976931
3486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +-NAN
Global Exported: _3 = [frange] double [0.0 (0x0.0p+0), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +NAN
so it uses the range with +-NAN rather than without it.
Comment 11 Aldy Hernandez 2022-11-09 14:20:17 UTC
(In reply to Andrew Macleod from comment #5)
> (In reply to Jakub Jelinek from comment #4)
> > The cdce case is something I've mentioned today:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605338.html
> > The comparisons in there are artificial and so unlike user comparisons we
> > should (if they were marked somehow) be able to optimize them away if frange
> > can prove their result.
> > But that isn't something happening on the #c0 testcase, is it?
> 
> in vrp2 I see:
> 384      range_of_stmt () at stmt if (_9 u>= 0.0)
> 385        range_of_expr(_9) at stmt if (_9 u>= 0.0)
>            TRUE : (385) range_of_expr (_9) [frange] double [-0.0
> (-0x0.0p+0), +Inf] +-NAN
>          TRUE : (384) range_of_stmt () [irange] bool VARYING
> 
> so we think that 
> [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN  u>=   0.0  does not fold.  
> 
> possibly some signalling NaN thing not allowing us to remove it?

By vrp2 I see:

=========== BB 2 ============
Imports: _6  _8  
Exports: _6  _8  _9  
         _9 : _6(I)  _8(I)  
    <bb 2> [local count: 1073741824]:
    _3 = u_2(D)->x;
    _6 = _3 * _3;
    _7 = u_2(D)->y;
    _8 = _7 * _7;
    _9 = _6 + _8;
    if (_9 u>= 0.0)
      goto <bb 3>; [99.95%]
    else
      goto <bb 4>; [0.05%]

_9 : [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN
2->4  (F) _9 :  [frange] double [-0.0 (-0x0.0p+0), 0.0 (0x0.0p+0)]
2->3  (T) _9 :  [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN

We can't fold the u>= 0.0 because that would remove a possible NAN at runtime, and it would be user-visible for signalling NANs.  At least, that's been my understanding all along.  You'll notice most of folding of relops in range-op-float.cc is predicated by !maybe_isnan(op1, op2) to disable them if NAN is a possibility, which in the case of _9 could happen:

  else if (!maybe_isnan (op1, op2))
    {
      if (real_compare (GE_EXPR, &op1.lower_bound (), &op2.upper_bound ()))
	r = range_true (type);
      else if (!real_compare (GE_EXPR, &op1.upper_bound (), &op2.lower_bound ()))
	r = range_false (type);
      else
	r = range_true_and_false (type);
    }

However, we should be able to mark that the 2->4 edge as undefined.  For example, on the 2->4 edge, _9 is technically UNDEFINED.  But we can't do that because frange's are represented as closed intervals, so we can't represent < 0.0 on the false side.  Instead we represent it as +-0.0 which intersected with _9 is still +-0.0.

If we could represent < 0.0 with one less ulp, say -0.000000000000001 (or whatever), intersecting that with the known _9 would yield UNDEFINED.  I don't know if that would help in this PR, but ISTM the threader would be able to do something (or some other pass??).

Unfortunately, we don't represent < and > very well.  Technically all we would need is to tweak range-op-float.cc's build_gt() and build_lt() to do real_nexafter, but as with everything FP, the devil is in the details (what do we do when we go past the representable range for -ffinite-math-only, etc etc).  It does seem a bit late in the cycle to fudge with this, but it is possible.

Again, I don't know if this helps the PR, but these are the 2 issues I see.
Comment 12 Aldy Hernandez 2022-11-09 14:27:56 UTC
It looks like the code reading from the blob in SSA_NAME_RANGE_INFO and populating an frange is always leaving the NAN bit toggled even if it wasn't in the stored range.

Does this help?

diff --git a/gcc/value-range-storage.cc b/gcc/value-range-storage.cc
index 462447ba250..7ba7a16edc2 100644
--- a/gcc/value-range-storage.cc
+++ b/gcc/value-range-storage.cc
@@ -283,6 +283,8 @@ frange_storage_slot::get_frange (frange &r, tree type) const
   // make sure to set the NAN sign if known.
   if (HONOR_NANS (type) && (m_pos_nan ^ m_neg_nan) == 1)
     r.update_nan (m_neg_nan);
+  else if (!m_pos_nan && !m_neg_nan)
+    r.clear_nan ();
 }
 
 bool
Comment 13 Aldy Hernandez 2022-11-09 14:30:42 UTC
Created attachment 53861 [details]
preprocessed testcase for comment #2
Comment 14 Pilar Latiesa 2022-11-09 15:01:39 UTC
I have tested the testcase in comment #1 with Clang, and I realized that Clang trunk avoids the tailcall to sqrt even without any hint with __builtin_unreachable: https://godbolt.org/z/5sb8bYcoq 

Clang is somehow smart enough to realize that dot(u, u) is always non-negative.
Comment 15 Jakub Jelinek 2022-11-09 15:10:01 UTC
We don't have multiplication wired in frange, that is something we talked about today on gcc-patches.
Comment 16 Jakub Jelinek 2022-11-09 15:38:23 UTC
I've filed PR107591 for the lack of x * x range optimization.
Comment 17 GCC Commits 2022-11-09 17:21:33 UTC
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>:

https://gcc.gnu.org/g:4eadbe80060ab6c45193a1a57fac84b035e1c328

commit r13-3860-g4eadbe80060ab6c45193a1a57fac84b035e1c328
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Wed Nov 9 16:05:08 2022 +0100

    Clear NAN when reading back a global range if necessary.
    
    When reading back from the global store, we must clear the NAN bit if
    necessary.  The reason it's not happening is because the constructor
    sets a NAN by default (when HONOR_NANS).  We must be careful to clear
    the NAN bit if the original range didn't have a NAN.
    
    I have commented the reason we use the constructor instead of filling
    out the fields by hand, because it wasn't clear at re-reading this
    code.
    
            PR 107569/tree-optimization
    
    gcc/ChangeLog:
    
            * value-range-storage.cc (frange_storage_slot::get_frange): Clear
            NAN if appropriate.
            * value-range.cc (range_tests_floats): New test.
Comment 18 Jakub Jelinek 2022-11-09 18:21:20 UTC
Ok, just so that I don't just kibbitz/review frange stuff but also try to do something, here is my so far untested attempt at normal multiplication fold_range  (not the x * x stuff discussed elsewhere):

--- range-op-float.cc.jj1	2022-11-09 18:00:28.612247613 +0100
+++ range-op-float.cc	2022-11-09 19:06:11.075716000 +0100
@@ -1908,6 +1908,88 @@ class foperator_minus : public range_ope
   }
 } fop_minus;
 
+/* Wrapper around frange_arithmetics, that computes the result
+   if inexact rounded to both directions.  Also, if one of the
+   operands is +-0.0 and another +-inf, return +-0.0 rather than
+   NAN.  */
+
+static void
+frange_mult (tree type, REAL_VALUE_TYPE &result_lb, REAL_VALUE_TYPE &result_ub,
+	     const REAL_VALUE_TYPE &op1, const REAL_VALUE_TYPE &op2)
+{
+  if (real_iszero (&op1) && real_isinf (&op2))
+    {
+      result_lb = op1;
+      if (real_isneg (&op2))
+	real_value_negate (&result_lb);
+      result_ub = result_lb;
+    }
+  else if (real_isinf (&op1) && real_iszero (&op2))
+    {
+      result_lb = op2;
+      if (real_isneg (&op1))
+	real_value_negate (&result_lb);
+      result_ub = result_lb;
+    }
+  else
+    {
+      frange_arithmetic (MULT_EXPR, type, result_lb, op1, op2, dconstninf);
+      frange_arithmetic (MULT_EXPR, type, result_ub, op1, op2, dconstinf);
+    }
+}
+
+class foperator_mult : public range_operator_float
+{
+  void rv_fold (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub, bool &maybe_nan,
+		tree type,
+		const REAL_VALUE_TYPE &lh_lb,
+		const REAL_VALUE_TYPE &lh_ub,
+		const REAL_VALUE_TYPE &rh_lb,
+		const REAL_VALUE_TYPE &rh_ub) const final override
+  {
+    REAL_VALUE_TYPE cp[8];
+    // Do a cross-product.
+    frange_mult (type, cp[0], cp[4], lh_lb, rh_lb);
+    frange_mult (type, cp[1], cp[5], lh_lb, rh_ub);
+    frange_mult (type, cp[2], cp[6], lh_ub, rh_lb);
+    frange_mult (type, cp[3], cp[7], lh_ub, rh_ub);
+    for (int i = 1; i < 3; ++i)
+      {
+	if (real_less (&cp[i], &cp[0])
+	    || (real_iszero (&cp[0]) && real_isnegzero (&cp[i])))
+	  std::swap (cp[i], cp[0]);
+	if (real_less (&cp[4], &cp[i + 4])
+	    || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4])))
+	  std::swap (cp[i + 4], cp[4]);
+      }
+    lb = cp[0];
+    ub = cp[4];
+
+    // [+-0, +-0] * [+INF,+INF] (or [-INF,-INF] or swapped is a known NaN.
+    if ((real_iszero (&lh_lb) && real_iszero (&lh_ub)
+	 && real_isinf (&rh_lb) && real_isinf (&rh_ub, real_isneg (&rh_lb)))
+	|| (real_iszero (&rh_lb) && real_iszero (&rh_ub)
+	    && real_isinf (&lh_lb) && real_isinf (&lh_ub, real_isneg (&lh_lb))))
+      {
+	real_nan (&lb, NULL, 0, TYPE_MODE (type));
+	ub = lb;
+	maybe_nan = true;
+      }
+    // Otherwise, if one range includes zero and the other ends with +-INF,
+    // it is a maybe NaN.
+    else if (real_compare (LE_EXPR, &lh_lb, &dconst0)
+	     && real_compare (GE_EXPR, &lh_ub, &dconst0)
+	     && (real_isinf (&rh_lb) || real_isinf (&rh_ub)))
+      maybe_nan = true;
+    else if (real_compare (LE_EXPR, &rh_lb, &dconst0)
+	     && real_compare (GE_EXPR, &rh_ub, &dconst0)
+	     && (real_isinf (&lh_lb) || real_isinf (&lh_ub)))
+      maybe_nan = true;
+    else
+      maybe_nan = false;
+  }
+} fop_mult;
+
 // Instantiate a range_op_table for floating point operations.
 static floating_op_table global_floating_table;
 
@@ -1942,6 +2024,7 @@ floating_op_table::floating_op_table ()
   set (NEGATE_EXPR, fop_negate);
   set (PLUS_EXPR, fop_plus);
   set (MINUS_EXPR, fop_minus);
+  set (MULT_EXPR, fop_mult);
 }
 
 // Return a pointer to the range_operator_float instance, if there is

For
float
foo (float x, float y)
{
  if (x < -17.0f || x > 19.5f || y < -25.0f || y > 15.5f)
    return 0.0;
  return x * y;
}

float
bar (float x, float y)
{
  if (x > -17.0f && x < 19.5f && y > -25.0f && y < 15.5f)
    return x * y;
  return 0.0;
}
the product range is
[frange] float [-4.875e+2 (-0x0.f3cp+9), 4.25e+2 (0x0.d48p+9)] +-NAN
in the first case and
[frange] float [-4.875e+2 (-0x0.f3cp+9), 4.25e+2 (0x0.d48p+9)]
in the latter (where NAN can't appear).  And while +-0.0 is in the range of even both operands, neither range includes any infinities.
Comment 19 Aldy Hernandez 2022-11-10 09:32:06 UTC
(In reply to Jakub Jelinek from comment #18)
> Ok, just so that I don't just kibbitz/review frange stuff but also try to do
> something, here is my so far untested attempt at normal multiplication
> fold_range  (not the x * x stuff discussed elsewhere):

Looking good!  Thanks for tackling the hardest of them all.
Comment 20 Jakub Jelinek 2022-11-10 09:36:24 UTC
Created attachment 53865 [details]
gcc13-pr107569.patch

Here is an updated patch (including the incremental patch) with some fixes.
Previous version caused some ICEs, that is now fixed, but there is
+FAIL: gcc.dg/fold-overflow-1.c scan-assembler-times 2139095040 2
regression (will file a separate PR about it, it isn't just the multiplication
regression but already addition), and
+FAIL: gfortran.dg/fmt_g0_6.f08   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
+FAIL: gfortran.dg/fmt_g0_6.f08   -O3 -g  execution test
+FAIL: math
+FAIL: special_functions/08_cyl_bessel_j/check_value.cc execution test
+FAIL: tr1/5_numerical_facilities/special_functions/09_cyl_bessel_j/check_value.cc execution test
I need to investigate.
Comment 21 Jakub Jelinek 2022-11-10 10:22:42 UTC
Ok, found a brown paper bag issue in the patch:
--- gcc/range-op-float.cc2022-11-09 21:31:09.420369509 +0100
+++ gcc/range-op-float.cc2022-11-09 21:31:09.420369509 +0100
@@ -1981,7 +1981,7 @@
         frange_mult (type, cp[2], cp[6], lh_ub, rh_lb);
       }
     frange_mult (type, cp[3], cp[7], lh_ub, rh_ub);
-    for (int i = 1; i < 3; ++i)
+    for (int i = 1; i < 4; ++i)
       {
         if (real_less (&cp[i], &cp[0])
             || (real_iszero (&cp[0]) && real_isnegzero (&cp[i])))
With this the gfortran.dg/fmt_g0_6.f08 FAILs are gone.
Comment 22 Jakub Jelinek 2022-11-10 10:29:13 UTC
Note, I was using:
double
foo (int x, int y)
{
  double r;
  if (x >= 0)
    return 0.0;
  switch (y)
    {
    case 1: r = 0.0; break;
    case 2: r = 1.0; break;
    default: r = 0.5; break;
    }
  return r * __builtin_pow (10.0, x);
}
as reduced testcase of what the Fortran testcase does, but I see there some unrelated problem:
Folding statement: _2 = __builtin_pow (1.0e+1, _1);
Global Exported: _2 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN
The +NAN looks suspicious, shouldn't that be +-NAN ?
Of course once we handle POW builtins, if we are smart enough we should see that it is 10.0 ** [INT_MIN, -1] and so [0.0, 1.0e-1] (plus some larger ulp error because library functions aren't exactly 0.5ulp precise all the time).  But when we don't know what __builtin_pow does (from frange perspective), I don't see what
tells us that NAN with negative sign can't appear.
Comment 23 Jakub Jelinek 2022-11-10 10:40:08 UTC
Created attachment 53866 [details]
gcc13-pr107569.patch

Updated version of the patch I'll test now (if you don't have any nits).
Besides the thinko I've changed rv_fold to have relation_kind argument rather
than relation_trio, passing the latter to a function that doesn't take ranges of
the operands/result but just the real values is weird.
Comment 24 Aldy Hernandez 2022-11-10 11:34:07 UTC
(In reply to Jakub Jelinek from comment #22)

> Folding statement: _2 = __builtin_pow (1.0e+1, _1);
> Global Exported: _2 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN
> The +NAN looks suspicious, shouldn't that be +-NAN ?
> Of course once we handle POW builtins, if we are smart enough we should see
> that it is 10.0 ** [INT_MIN, -1] and so [0.0, 1.0e-1] (plus some larger ulp
> error because library functions aren't exactly 0.5ulp precise all the time).
> But when we don't know what __builtin_pow does (from frange perspective), I
> don't see what
> tells us that NAN with negative sign can't appear.

Yeah, that +NAN looks very suspicious.  For that matter, it took me a while to figure out how we know that _2 can't be negative, because we don't have any range-op entries for __builtin_pow.

So...here's a trick to figure this out: --param=ranger-debug=tracegori

You'll see in the *evrp dump:

Folding statement: _2 = __builtin_pow (1.0e+1, _1);
45       range_of_stmt (_2) at stmt _2 = __builtin_pow (1.0e+1, _1);
         TRUE : (45) range_of_stmt (_2) [frange] double [0.0 (0x0.0p+0), +Inf] +NAN
46       range_of_expr(_1) at stmt _2 = __builtin_pow (1.0e+1, _1);
         TRUE : (46) range_of_expr (_1) [frange] double VARYING +-NAN
47       range_of_stmt (_2) at stmt _2 = __builtin_pow (1.0e+1, _1);
         TRUE : (47)  cached (_2) [frange] double [0.0 (0x0.0p+0), +Inf] +NAN
Global Exported: _2 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN

So ranger was able to figure out immediately that _2 was positive.

Andrew added smarts to break into any given place, so we can break where the counter is 45:

(gdb) break breakpoint if index == 45

Yes, amazingly there's only one function named breakpoint() in the entire compiler ;-).

If you single step from there on, we run into:

  if (gimple_stmt_nonnegative_warnv_p (call, &strict_overflow_p))
    r.set_nonnegative (type);
  else if (gimple_call_nonnull_result_p (call)
	   || gimple_call_nonnull_arg (call))
    r.set_nonzero (type);
  else
    r.set_varying (type);

IIRC, we had some discussion upstream about the meaning of set_nonnegative, and we all agreed that nuking -NAN was the right thing.  Neat, huh? :)
Comment 25 Aldy Hernandez 2022-11-10 11:34:57 UTC
(In reply to Jakub Jelinek from comment #23)
> Created attachment 53866 [details]
> gcc13-pr107569.patch
> 
> Updated version of the patch I'll test now (if you don't have any nits).
> Besides the thinko I've changed rv_fold to have relation_kind argument rather
> than relation_trio, passing the latter to a function that doesn't take
> ranges of
> the operands/result but just the real values is weird.

And yes, OK by me.  You're far more qualified to opine in this area :).  Thanks.
Comment 26 Jakub Jelinek 2022-11-10 12:22:04 UTC
(In reply to Aldy Hernandez from comment #24)
> If you single step from there on, we run into:
> 
>   if (gimple_stmt_nonnegative_warnv_p (call, &strict_overflow_p))
>     r.set_nonnegative (type);
>   else if (gimple_call_nonnull_result_p (call)
> 	   || gimple_call_nonnull_arg (call))
>     r.set_nonzero (type);
>   else
>     r.set_varying (type);
> 
> IIRC, we had some discussion upstream about the meaning of set_nonnegative,
> and we all agreed that nuking -NAN was the right thing.  Neat, huh? :)

Is this done only for statements for which there isn't a ranges handler?
If so, given the IEEE 754 non-guarantee of NAN signs except for copy, abs, copysign and negate I'd say that we should have a ranges handler for all those ops and for anything else assume NAN sign is VARYING, including the above spot.
As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't
say sqrt is nonnegative (even when sqrt (-0.0) is -0.0).
Comment 27 Aldy Hernandez 2022-11-10 12:47:25 UTC
(In reply to Jakub Jelinek from comment #26)
> (In reply to Aldy Hernandez from comment #24)
> > If you single step from there on, we run into:
> > 
> >   if (gimple_stmt_nonnegative_warnv_p (call, &strict_overflow_p))
> >     r.set_nonnegative (type);
> >   else if (gimple_call_nonnull_result_p (call)
> > 	   || gimple_call_nonnull_arg (call))
> >     r.set_nonzero (type);
> >   else
> >     r.set_varying (type);
> > 
> > IIRC, we had some discussion upstream about the meaning of set_nonnegative,
> > and we all agreed that nuking -NAN was the right thing.  Neat, huh? :)
> 
> Is this done only for statements for which there isn't a ranges handler?
> If so, given the IEEE 754 non-guarantee of NAN signs except for copy, abs,
> copysign and negate I'd say that we should have a ranges handler for all
> those ops and for anything else assume NAN sign is VARYING, including the
> above spot.

We're doing this for all GIMPLE_CALL's, so copy, abs, copysign are handled separately, either as builtins or in the IL directly (i.e. not calls).  Hmmm, maybe it's time to revisit what frange::set_nnonnegative() means (again).

Lemme think about this...there are very few set_nonnegative() calls in ranger.  It should be easy to contain.

> As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't
> say sqrt is nonnegative (even when sqrt (-0.0) is -0.0).

It seems tree_call_nonnegative_warnv_p is already doing the right thing for sqrt?

    CASE_CFN_SQRT:
    CASE_CFN_SQRT_FN:
      /* sqrt(-0.0) is -0.0.  */
      if (!HONOR_SIGNED_ZEROS (type))
	return true;
      return RECURSE (arg0);
Comment 28 Jakub Jelinek 2022-11-10 12:54:00 UTC
(In reply to Aldy Hernandez from comment #27)
> > As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't
> > say sqrt is nonnegative (even when sqrt (-0.0) is -0.0).
> 
> It seems tree_call_nonnegative_warnv_p is already doing the right thing for
> sqrt?
> 
>     CASE_CFN_SQRT:
>     CASE_CFN_SQRT_FN:
>       /* sqrt(-0.0) is -0.0.  */
>       if (!HONOR_SIGNED_ZEROS (type))
> 	return true;
>       return RECURSE (arg0);

Ah, ok.  So, if we during stage3 go through all the cases and convince ourselves it is good enough, it might be ok for signed zeros then.  Still the sign of NAN is separate question from that, and there we should set +NAN only when we see
CASE_CFN_FABS{,_FN} or ABS_EXPR, toggle the sign of NEGATE_EXPR, inherit from second argument for CASE_CFN_COPYSIGN{,_FN} and copy over for copies.
Comment 29 Aldy Hernandez 2022-11-10 13:19:08 UTC
(In reply to Jakub Jelinek from comment #28)
> (In reply to Aldy Hernandez from comment #27)
> > > As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't
> > > say sqrt is nonnegative (even when sqrt (-0.0) is -0.0).
> > 
> > It seems tree_call_nonnegative_warnv_p is already doing the right thing for
> > sqrt?
> > 
> >     CASE_CFN_SQRT:
> >     CASE_CFN_SQRT_FN:
> >       /* sqrt(-0.0) is -0.0.  */
> >       if (!HONOR_SIGNED_ZEROS (type))
> > 	return true;
> >       return RECURSE (arg0);
> 
> Ah, ok.  So, if we during stage3 go through all the cases and convince
> ourselves it is good enough, it might be ok for signed zeros then.  Still
> the sign of NAN is separate question from that, and there we should set +NAN
> only when we see
> CASE_CFN_FABS{,_FN} or ABS_EXPR, toggle the sign of NEGATE_EXPR, inherit
> from second argument for CASE_CFN_COPYSIGN{,_FN} and copy over for copies.

From what I've seen, ABS is represented in the IL directly without the need for a builtin, at least by evrp time.  Is this not always the case?

Negate, copysign, and copy are already handled specially so no need to do anything there.

I have a patch I'm testing for the generic case.
Comment 30 Aldy Hernandez 2022-11-10 13:35:13 UTC
Created attachment 53869 [details]
do not set NAN sign in frange::set_nonnegative()

proposed patch in testing
Comment 31 Jakub Jelinek 2022-11-10 16:12:53 UTC
Created attachment 53873 [details]
gcc13-pr107569-div.patch

This is what I meant by complete nightmare - division.
Comment 32 Aldy Hernandez 2022-11-10 17:50:12 UTC
(In reply to Jakub Jelinek from comment #31)
> Created attachment 53873 [details]
> gcc13-pr107569-div.patch
> 
> This is what I meant by complete nightmare - division.

Ugh, yeah.  That's pretty bad.  (Not your code, the inevitable special casing.)

Could you abstract out functionality into short inline functions to make it more readable?  If it's too much hassle, or doesn't improve readability then don't... after all, it's all compartmentalized in a function in range-ops..which is the whole points of range-ops, being able to sand box all this knowledge.

Out of curiosity, why is it so much more complex than the integer versions which share a lot of code?
Comment 33 Aldy Hernandez 2022-11-10 19:14:31 UTC
(In reply to Jakub Jelinek from comment #31)
> Created attachment 53873 [details]
> gcc13-pr107569-div.patch
> 
> This is what I meant by complete nightmare - division.

We can take this to gcc-patches when you're done, but just a few thoughts...

+    // If +-0.0 is in both ranges, it is a maybe NAN.
+    if (real_compare (LE_EXPR, &lh_lb, &dconst0)
+	&& real_compare (GE_EXPR, &lh_ub, &dconst0)
+	&& real_compare (LE_EXPR, &rh_lb, &dconst0)
+	&& real_compare (GE_EXPR, &rh_ub, &dconst0))

Perhaps we could provide frange::contains_zero_p ()?

+    // +-0.0 / +-0.0 or +-INF / +-INF is a known NAN.
+    if ((real_iszero (&lh_lb)
+	 && real_iszero (&lh_ub)
+	 && real_iszero (&rh_lb)
+	 && real_iszero (&rh_ub))

This looks like frange::contains_zerp_p () as well.

+	|| (real_isinf (&lh_lb)
+	    && real_isinf (&lh_ub, real_isneg (&lh_lb))
+	    && real_isinf (&rh_lb)
+	    && real_isinf (&rh_ub, real_isneg (&rh_lb))))

Note that, real_isinf with only one argument checks for +-INF.  But I think what you're looking for is frange::maybe_isinf.

Could your patch be simplified with some of these?

  // fpclassify like API
  bool known_isfinite () const;
  bool known_isnan () const;
  bool known_isinf () const;
  bool maybe_isnan () const;
  bool maybe_isnan (bool sign) const;
  bool maybe_isinf () const;
  bool signbit_p (bool &signbit) const;
  bool nan_signbit_p (bool &signbit) const;

We should ultimately avoid peeking at the end points unnecessarily in order to prepare ourselves for next release when we (hopefully) have sub-ranges.
Comment 34 Jakub Jelinek 2022-11-10 19:29:53 UTC
(In reply to Aldy Hernandez from comment #33)
> (In reply to Jakub Jelinek from comment #31)
> > Created attachment 53873 [details]
> > gcc13-pr107569-div.patch
> > 
> > This is what I meant by complete nightmare - division.
> 
> We can take this to gcc-patches when you're done, but just a few thoughts...
> 
> +    // If +-0.0 is in both ranges, it is a maybe NAN.
> +    if (real_compare (LE_EXPR, &lh_lb, &dconst0)
> +	&& real_compare (GE_EXPR, &lh_ub, &dconst0)
> +	&& real_compare (LE_EXPR, &rh_lb, &dconst0)
> +	&& real_compare (GE_EXPR, &rh_ub, &dconst0))
> 
> Perhaps we could provide frange::contains_zero_p ()?

Well, contains_p in irange is a method on the value range, while here we don't
have a frange, but just naked REAL_VALUE_TYPEs.
It is twice contains_zero_p...
> 
> +    // +-0.0 / +-0.0 or +-INF / +-INF is a known NAN.
> +    if ((real_iszero (&lh_lb)
> +	 && real_iszero (&lh_ub)
> +	 && real_iszero (&rh_lb)
> +	 && real_iszero (&rh_ub))
> 
> This looks like frange::contains_zerp_p () as well.

No, this is twice zero_p.  Due to signed zeros it isn't a singleton + contains_zero_p, just both boundaries are zero.

> +	|| (real_isinf (&lh_lb)
> +	    && real_isinf (&lh_ub, real_isneg (&lh_lb))
> +	    && real_isinf (&rh_lb)
> +	    && real_isinf (&rh_ub, real_isneg (&rh_lb))))
> 
> Note that, real_isinf with only one argument checks for +-INF.

I know.  I'm intentionally using one and 2 argument ones to verify
that lh is either [INF,INF] or [-INF,-INF], but not [-INF,INF].

>  But I think
> what you're looking for is frange::maybe_isinf.

Again, that works on frange, which I don't have here.
> 
> Could your patch be simplified with some of these?
> 
>   // fpclassify like API
>   bool known_isfinite () const;
>   bool known_isnan () const;
>   bool known_isinf () const;
>   bool maybe_isnan () const;
>   bool maybe_isnan (bool sign) const;
>   bool maybe_isinf () const;
>   bool signbit_p (bool &signbit) const;
>   bool nan_signbit_p (bool &signbit) const;
> 
> We should ultimately avoid peeking at the end points unnecessarily in order
> to prepare ourselves for next release when we (hopefully) have sub-ranges.

No, see above (at least for now).  The peeking at the end points is needed because those end points behave weirdly.
Comment 35 Aldy Hernandez 2022-11-11 07:15:17 UTC
(In reply to Jakub Jelinek from comment #34)
> (In reply to Aldy Hernandez from comment #33)

> > what you're looking for is frange::maybe_isinf.
> 
> Again, that works on frange, which I don't have here.

Ughhh...yeah...brain fart.  Sorry.
Comment 36 GCC Commits 2022-11-12 08:42:52 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2f7f9edd28d75a85a33599978f23811e679e443d

commit r13-3923-g2f7f9edd28d75a85a33599978f23811e679e443d
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Nov 12 09:33:01 2022 +0100

    range-op: Implement floating point multiplication fold_range [PR107569]
    
    The following patch implements frange multiplication, including the
    special case of x * x.  The callers don't tell us that it is x * x,
    just that it is either z = x * x or if (x == y) z = x * y;
    For irange that makes no difference, but for frange it can mean
    x is -0.0 and y is 0.0 if they have the same range that includes both
    signed and unsigned zeros, so we need to assume result could be -0.0.
    
    The patch causes one regression:
    +FAIL: gcc.dg/fold-overflow-1.c scan-assembler-times 2139095040 2
    but that is already tracked in PR107608 and affects not just the newly
    added multiplication, but addition and other floating point operations
    (and doesn't seem like a ranger bug but dce or whatever else).
    
    2022-11-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/107569
            PR tree-optimization/107591
            * range-op.h (range_operator_float::rv_fold): Add relation_kind
            argument.
            * range-op-float.cc (range_operator_float::fold_range): Name
            last argument trio and pass trio.op1_op2 () as last argument to
            rv_fold.
            (range_operator_float::rv_fold): Add relation_kind argument.
            (foperator_plus::rv_fold, foperator_minus::rv_fold): Likewise.
            (foperator_mult): New class.
            (floating_op_table::floating_op_table): Use foperator_mult for
            MULT_EXPR.
Comment 37 GCC Commits 2022-11-12 08:42:58 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2d5c4a16dd833aa083f13dd3e78e3ef38afe6ebb

commit r13-3924-g2d5c4a16dd833aa083f13dd3e78e3ef38afe6ebb
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Nov 12 09:35:16 2022 +0100

    range-op: Implement floating point division fold_range [PR107569]
    
    Here is the floating point division fold_range implementation,
    as I wrote in the last mail, we could outline some of the common parts
    into static methods with descriptive names and share them between
    foperator_div and foperator_mult.
    
    Regressions are
    +FAIL: gcc.dg/pr95115.c execution test
    +FAIL: libphobos.phobos/std/math/hardware.d execution test
    +FAIL: libphobos.phobos_shared/std/math/hardware.d execution test
    The first test is we have:
      # RANGE [frange] double [] +-NAN
      _3 =  Inf /  Inf;
      if (_3 ord _3)
        goto <bb 3>; [INV]
      else
        goto <bb 4>; [INV]
    
      <bb 3> :
      abort ();
    
      <bb 4> :
    before evrp, the range is correct, Inf / Inf is known NAN of unknown
    sign.  evrp correctly folds _3 ord _3 into false and the
      _3 =  Inf /  Inf;
    remains in the IL, but then comes dse1 and removes it as dead
    statement.  So, I think yet another example of the PR107608 problems
    where DCE? removes dead statements which raise floating point exceptions.
    And -fno-delete-dead-exceptions doesn't help.
    
    2022-11-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/107569
            * range-op-float.cc (foperator_div): New class.
            (floating_op_table::floating_op_table): Use foperator_div
            for RDIV_EXPR.
Comment 38 GCC Commits 2022-11-12 08:43:03 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:5747470efa8ff0ac82bb5f53d737b29a44f18118

commit r13-3925-g5747470efa8ff0ac82bb5f53d737b29a44f18118
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Nov 12 09:36:59 2022 +0100

    range-op: Cleanup floating point multiplication and division fold_range [PR107569]
    
    Admittedly there are many similar spots with the foperator_div case
    (but also with significant differences), so perhaps if foperator_{mult,div}
    inherit from some derived class from range_operator_float and that class
    would define various smaller helper static? methods, like this
    discussed in the PR - contains_zero_p, singleton_nan_p, zero_p,
    that
    +           bool must_have_signbit_zero = false;
    +           bool must_have_signbit_nonzero = false;
    +           if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
    +               && real_isneg (&rh_lb) == real_isneg (&rh_ub))
    +             {
    +               if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
    +                 must_have_signbit_zero = true;
    +               else
    +                 must_have_signbit_nonzero = true;
    +             }
    returned as -1/0/1 int, and those set result (based on the above value) to
    [+INF, +INF], [-INF, -INF] or [-INF, +INF]
    or
    [+0, +0], [-0, -0] or [-0, +0]
    or
    [+0, +INF], [-INF, -0] or [-INF, +INF]
    and the
    +    for (int i = 1; i < 4; ++i)
    +      {
    +       if (real_less (&cp[i], &cp[0])
    +           || (real_iszero (&cp[0]) && real_isnegzero (&cp[i])))
    +         std::swap (cp[i], cp[0]);
    +       if (real_less (&cp[4], &cp[i + 4])
    +           || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4])))
    +         std::swap (cp[i + 4], cp[4]);
    +      }
    block, it could be smaller and more readable.
    
    2022-11-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/107569
            * range-op-float.cc (zero_p, contains_p, singleton_inf_p,
            signbit_known_p, zero_range, inf_range, zero_to_inf_range): New
            functions.
            (foperator_mult_div_base): New class.
            (foperator_mult, foperator_div): Derive from that and use
            protected static method from it as well as above new functions
            to simplify the code.
Comment 39 Romain Geissler 2022-11-30 01:43:11 UTC
Hi,

FYI, when trying to bootstrap a GNU toolchain with glibc (current master branch), binutils (current 2.39 release branch) and gcc, I noticed that the fixes done for this bug caused the following regressions on glibc side (tested on x86_64 Linux):

FAIL: math/test-double-cosh
FAIL: math/test-double-exp10
FAIL: math/test-double-expm1
FAIL: math/test-double-lgamma
FAIL: math/test-double-log1p
FAIL: math/test-double-tgamma
FAIL: math/test-float-cosh
FAIL: math/test-float-expm1
FAIL: math/test-float-lgamma
FAIL: math/test-float-log1p
FAIL: math/test-float-tgamma
FAIL: math/test-float128-catan
FAIL: math/test-float128-catanh
FAIL: math/test-float128-cosh
FAIL: math/test-float128-exp10
FAIL: math/test-float128-lgamma
FAIL: math/test-float128-log
FAIL: math/test-float128-log1p
FAIL: math/test-float128-pow
FAIL: math/test-float128-tgamma
FAIL: math/test-float128-y0
FAIL: math/test-float128-y1
FAIL: math/test-float32-cosh
FAIL: math/test-float32-expm1
FAIL: math/test-float32-lgamma
FAIL: math/test-float32-log1p
FAIL: math/test-float32-tgamma
FAIL: math/test-float32x-cosh
FAIL: math/test-float32x-exp10
FAIL: math/test-float32x-expm1
FAIL: math/test-float32x-lgamma
FAIL: math/test-float32x-log1p
FAIL: math/test-float32x-tgamma
FAIL: math/test-float64-cosh
FAIL: math/test-float64-exp10
FAIL: math/test-float64-expm1
FAIL: math/test-float64-lgamma
FAIL: math/test-float64-log1p
FAIL: math/test-float64-tgamma
FAIL: math/test-float64x-cosh
FAIL: math/test-float64x-lgamma
FAIL: math/test-float64x-tgamma
FAIL: math/test-ldouble-cosh
FAIL: math/test-ldouble-lgamma
FAIL: math/test-ldouble-tgamma

These tests are working with gcc-13-3916-g5b6ce16adec (daily bump from 12th of November) and failing with gcc-13-3933-g30d77d49628 (daily bump from 13th of November).

I did try to use the current master, it still fails. I also tried to patch from PR107879 and it still fails.

Cheers,
Romain
Comment 40 Jakub Jelinek 2022-12-08 16:24:26 UTC
The current state of #c0 testcase is that bar is actually optimized into return 1;
Folding statement: .ASSUME (_Z3bard._assume.0, x_1(D));
_Z3bard._assume.0 assume inferred range of x_1(D) (param x) = [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.7976931
3486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)]
during vrp1 and
Folding statement: _4 = _3 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
Queued stmt for removal.  Folds to: 0

It is correct to optimize the comparison even with -ftrapping-math - comparisons only
emit exceptions on NaNs, the quiet ones like u> even just on sNaNs, and the range proves that the non-constant operand is never a NAN and the other isn't either (it is constant).

On the other side, foo isn't optimized.
  # RANGE [frange] double [0.0 (0x0.0p+0), +Inf] +NAN
  _6 = ABS_EXPR <x_3(D)>;
  _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _8 = ~_4;
  if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  __builtin_unreachable ();

  <bb 4> :
  # RANGE [frange] double [0.0 (0x0.0p+0), +Inf] +NAN
  _9 = ABS_EXPR <x_3(D)>;
  _10 = _9 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _11 = ~_10;
  return _11;
is turned by fre1 into:
  _6 = ABS_EXPR <x_3(D)>;
  _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _8 = ~_4;
  if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  __builtin_unreachable ();
  
  <bb 4> :
  return _8;
and while e.g. evrp figures correctly out that
2->3  (T) _6 :  [frange] double [+Inf, +Inf] +NAN
2->4  (F) x_3(D) :      [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.797693134862315708145274237317043567980705675
25844996599e+308 (0x0.fffffffffffff8p+1024)]
2->4  (F) _4 :  [irange] bool [0, 0] NONZERO 0x0
2->4  (F) _6 :  [frange] double [0.0 (0x0.0p+0), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)]
it doesn't do the extra step of figuring out that when _4 on the 2->4 edge must be 0,
then _8 on that edge must be 1.
And, the finite range say for _6 or x_3(D) isn't stored into global state (if there would be say some possibly not returning call between _6 definition and uses or
for x_3(D) between start of function and the uses, we obviously couldn't store it as a global range; in this case we could if we proved that isn't possible, i.e. that if the function is reached then return _8; is reached too).
And then during vrp1 the same problem and __builtin_unreachable () is removed with nothing noted anywhere.
Andrew, any thoughts?
Comment 41 Jakub Jelinek 2022-12-08 16:32:42 UTC
As for #c1, with trunk -O3 -march=skylake it is:
        vmovsd  8(%rdi), %xmm1
        vmovsd  (%rdi), %xmm0
        vmulsd  %xmm1, %xmm1, %xmm1
        vfmadd132sd     %xmm0, %xmm1, %xmm0
        vsqrtsd %xmm0, %xmm0, %xmm0
which is I think what we want.
Comment 42 Jakub Jelinek 2022-12-08 16:37:34 UTC
On #c0 foo, this was previously optimized in dom2 which optimized
  _4 = ABS_EXPR <x_2(D)>;
  _3 = _4 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _5 = ~_3;
  if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 3> [count: 0]:
  __builtin_unreachable ();

  <bb 4> [local count: 1073741824]:
  return _5;
without any frange related stuff as:
-  if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
+  if (_3 != 0)
and
-  return _5;
+  return 1;

But because __builtin_unreachable () is now removed earlier (vrp1 already; without providing useful global range though), we don't do that anymore.
Comment 43 Andrew Macleod 2022-12-09 15:34:15 UTC
(In reply to Jakub Jelinek from comment #42)
> On #c0 foo, this was previously optimized in dom2 which optimized
>   _4 = ABS_EXPR <x_2(D)>;
>   _3 = _4 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
>   _5 = ~_3;
>   if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
>     goto <bb 3>; [0.00%]
>   else
>     goto <bb 4>; [100.00%]
> 
>   <bb 3> [count: 0]:
>   __builtin_unreachable ();
> 
>   <bb 4> [local count: 1073741824]:
>   return _5;
> without any frange related stuff as:
> -  if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
> +  if (_3 != 0)
> and
> -  return _5;
> +  return 1;
> 
> But because __builtin_unreachable () is now removed earlier (vrp1 already;
> without providing useful global range though), we don't do that anymore.

Hmm. its because when vrp1 removes the branch to builtin_unreachable, it calculates the global range of _4 as the union of all the ranges at each use.. this is to avoid issues where the unreachable call may not post-dominate an earlier use.

THe initial use of _4 still is resolved to only [0, max] so we end up only using that for the range.

Im experimenting with doing things the same way, except NOT removing the branch in VRP, but continuing to set the global range the way we do.  I might also be able to revisit the branch *after* all the globals have been set, and see if the globals values now cause the condition to fold, and if they do, remove them only  then...  

ThIs would leave the above case for DOM2 (or someone else to resolve).   It seems like it might be a reasonable balance...  ie, we only remove the unreachable in VRP1 if we can determine with 100% positivity that setting the global value will result in the branch being able to remove.

Its also possible it could also just be left... and another pass should remove it shortly as it folds away...  and certainly by VRP2 time...   

Anyway, I'll try it a few ways and see what seems to work best.
Comment 44 Richard Biener 2023-02-27 10:07:38 UTC
I can confirm that foo() is still not optimized but we now optimize bar()
in VRP1.  VRP1 sees

bool foo (double x)
{
  bool _3;
  double _4;
  bool _5;

  <bb 2> [local count: 1073741824]:
  _4 = ABS_EXPR <x_2(D)>;
  _3 = _4 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _5 = ~_3;
  if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 3> [count: 0]:
  __builtin_unreachable ();

  <bb 4> [local count: 1073741824]:
  return _5;
Comment 45 Jakub Jelinek 2023-03-22 15:26:51 UTC
For the #c0 foo function, one simple fix would be something like
--- gcc/passes.def.jj	2023-01-02 09:32:39.539037434 +0100
+++ gcc/passes.def	2023-03-22 16:12:57.387652639 +0100
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.
 	  NEXT_PASS (pass_forwprop);
           NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
 	  NEXT_PASS (pass_sra_early);
+	  NEXT_PASS (pass_dce);
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
The problem there is that ccp1 and forwprop1 passes result in some dead statements:
  _6 = ABS_EXPR <x_3(D)>;
  _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _8 = ~_4;
  _12 = _8;
  _1 = _12;
  retval.0_5 = ~_1;
  if (retval.0_5 != 0)
by ccp1 into:
  _6 = ABS_EXPR <x_3(D)>;
  _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _8 = ~_4;
  if (_4 != 0)
and forwprop1:
  _6 = ABS_EXPR <x_3(D)>;
  _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _8 = ~_4;
  if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
So, now both _8 and _4 setters are dead.
Then comes fre1 and happily uses them again, which results in undesirable
  _6 = ABS_EXPR <x_3(D)>;
  _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _8 = ~_4;
  if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  __builtin_unreachable ();

  <bb 4> :
  return _8;
With the extra dce, we get
  _6 = ABS_EXPR <x_3(D)>;
  if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  __builtin_unreachable ();

  <bb 4> :
  _9 = ABS_EXPR <x_3(D)>;
  _10 = _9 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _11 = ~_10;
  return _11;
before fre1 and optimize that into:
  _6 = ABS_EXPR <x_3(D)>;
  if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  __builtin_unreachable ();

  <bb 4> :
  return 1;
Comment 46 Jakub Jelinek 2023-03-22 16:07:52 UTC
And another possibility might be try to keep __builtin_unreachable () in the IL more often; in this testcase nothing from the __builtin_unreachable () is really visible in any global ranges, they all are the same without/with the __builtin_unreachable.  So, can't we remove those just in final_p case and not otherwise?  Dom would then be able to optimize this.
Comment 47 Jakub Jelinek 2023-03-23 08:38:18 UTC
I have tested the
--- gcc/passes.def.jj	2023-03-22 16:59:45.378390155 +0100
+++ gcc/passes.def	2023-03-22 22:07:35.272803901 +0100
@@ -88,6 +88,7 @@ along with GCC; see the file COPYING3.
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  NEXT_PASS (pass_dce);
 	  NEXT_PASS (pass_fre, true /* may_iterate */);
 	  NEXT_PASS (pass_early_vrp);
 	  NEXT_PASS (pass_merge_phi);
patch and it unfortunately has some regressions on x86_64-linux, haven't verified though what is just that testcases would need to be tweaking and where we actually end up with worse code:
UNRESOLVED: gcc.dg/bic-bitmask-13.c scan-tree-dump-not dce7 "&\\\\s* 4294967040"
UNRESOLVED: gcc.dg/bic-bitmask-13.c scan-tree-dump-times dce7 "<=\\\\s* 255" 1
UNRESOLVED: gcc.dg/bic-bitmask-14.c scan-tree-dump-not dce7 "&\\\\s* 4294967040"
UNRESOLVED: gcc.dg/bic-bitmask-14.c scan-tree-dump-times dce7 "<=\\\\s* 255" 1
UNRESOLVED: gcc.dg/bic-bitmask-15.c scan-tree-dump-not dce7 "&\\\\s* 4294967040"
UNRESOLVED: gcc.dg/bic-bitmask-15.c scan-tree-dump-times dce7 "=\\\\s* 1" 1
UNRESOLVED: gcc.dg/bic-bitmask-16.c scan-tree-dump-not dce7 "&\\\\s* 4294967040"
UNRESOLVED: gcc.dg/bic-bitmask-16.c scan-tree-dump-times dce7 ">\\\\s* 255" 1
UNRESOLVED: gcc.dg/bic-bitmask-17.c scan-tree-dump-not dce7 "&\\\\s* 4294967040"
UNRESOLVED: gcc.dg/bic-bitmask-17.c scan-tree-dump-times dce7 "<=\\\\s* 255" 1
UNRESOLVED: gcc.dg/bic-bitmask-18.c scan-tree-dump-times dce7 " = 0;" 1
UNRESOLVED: gcc.dg/bic-bitmask-19.c scan-tree-dump-not dce7 "&\\\\s* 4294967294"
UNRESOLVED: gcc.dg/bic-bitmask-19.c scan-tree-dump-times dce7 ">\\\\s* 1" 1
UNRESOLVED: gcc.dg/bic-bitmask-20.c scan-tree-dump dce7 "&\\\\s* 4294967290"
UNRESOLVED: gcc.dg/bic-bitmask-20.c scan-tree-dump-not dce7 "<=\\\\s* 4294967289"
UNRESOLVED: gcc.dg/bic-bitmask-21.c scan-tree-dump dce7 "<=\\\\s* 255"
UNRESOLVED: gcc.dg/bic-bitmask-21.c scan-tree-dump-not dce7 "&\\\\s* 4294967290"
UNRESOLVED: gcc.dg/bic-bitmask-22.c scan-tree-dump dce7 ">\\\\s* 255"
UNRESOLVED: gcc.dg/bic-bitmask-22.c scan-tree-dump-not dce7 "&\\\\s* 4294967290"
FAIL: gcc.dg/pr23911.c scan-tree-dump-times dce3 "(?n)IMAGPART_EXPR.*= 0\\\\.0" 2
FAIL: gcc.dg/pr23911.c scan-tree-dump-times dce3 "(?n)REALPART_EXPR.*= 1\\\\.0e\\\\+0" 2
FAIL: gcc.dg/guality/pr45003-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 10 a == 0x8078
FAIL: gcc.dg/guality/pr45003-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 19 a == 0xffff8078
FAIL: gcc.dg/guality/pr45003-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 10 a == -32648
FAIL: gcc.dg/guality/pr45003-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 19 a == 0x8078
UNRESOLVED: gcc.dg/tree-ssa/20030731-2.c scan-tree-dump-times dce2 "if " 1
UNRESOLVED: gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times dce7 "->code" 0
UNRESOLVED: gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times dce7 "if " 0
UNRESOLVED: gcc.dg/tree-ssa/20040305-1.c scan-tree-dump-times dce2 "if " 2
FAIL: gcc.dg/tree-ssa/alias-37.c scan-tree-dump dse1 "Deleted dead store"
UNRESOLVED: gcc.dg/tree-ssa/cfgcleanup-1.c scan-tree-dump-times dce2 "if " 0
FAIL: gcc.dg/tree-ssa/evrp10.c scan-tree-dump-times evrp "\\\\[-128, 127\\\\]" 9
FAIL: gcc.dg/tree-ssa/evrp6.c scan-tree-dump evrp "\\\\[12, \\\\+INF"
UNRESOLVED: gcc.dg/tree-ssa/pr69270-2.c scan-tree-dump-times dce2 "usecount_[0-9]+ = usecount_[0-9]+ . 1;" 0
FAIL: gcc.dg/tree-ssa/pr79697.c scan-tree-dump cddce1 "Deleting : __builtin_strdup"
FAIL: gcc.dg/tree-ssa/pr79697.c scan-tree-dump cddce1 "Deleting : __builtin_strndup"
FAIL: gcc.dg/tree-ssa/ssa-dse-32.c scan-tree-dump-times dse1 "Deleted dead store" 1
UNRESOLVED: gcc.dg/vect/pr26359.c -flto -ffat-lto-objects  scan-tree-dump-times dce6 "Deleting : vect_" 0
UNRESOLVED: gcc.dg/vect/pr26359.c scan-tree-dump-times dce6 "Deleting : vect_" 0
UNRESOLVED: gcc.dg/vect/vect-bic-bitmask-23.c -flto -ffat-lto-objects  scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1, 65535 }"
UNRESOLVED: gcc.dg/vect/vect-bic-bitmask-23.c scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1, 65535 }"
FAIL: g++.dg/cpp1y/new1.C  -std=gnu++11  scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8
FAIL: g++.dg/cpp1y/new1.C  -std=gnu++14  scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8
FAIL: g++.dg/cpp1y/new1.C  -std=gnu++17  scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8
FAIL: g++.dg/cpp1y/new1.C  -std=gnu++20  scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8
FAIL: g++.dg/cpp1y/new1.C  -std=gnu++2b  scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8
FAIL: g++.dg/cpp1y/new1.C  -std=gnu++98  scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8
FAIL: g++.dg/pr94314-2.C  -std=gnu++11  scan-tree-dump-times cddce1 "Deleting : operator delete" 2
FAIL: g++.dg/pr94314-2.C  -std=gnu++14  scan-tree-dump-times cddce1 "Deleting : operator delete" 2
FAIL: g++.dg/pr94314-2.C  -std=gnu++17  scan-tree-dump-times cddce1 "Deleting : operator delete" 2
FAIL: g++.dg/pr94314-2.C  -std=gnu++20  scan-tree-dump-times cddce1 "Deleting : operator delete" 2
FAIL: g++.dg/pr94314-2.C  -std=gnu++2b  scan-tree-dump-times cddce1 "Deleting : operator delete" 2
FAIL: g++.dg/pr94314-2.C  -std=gnu++98  scan-tree-dump-times cddce1 "Deleting : operator delete" 2
FAIL: g++.dg/warn/Wmismatched-new-delete-5.C  -std=gnu++11  (test for warnings, line 36)
FAIL: g++.dg/warn/Wmismatched-new-delete-5.C  -std=gnu++14  (test for warnings, line 36)
FAIL: g++.dg/warn/Wmismatched-new-delete-5.C  -std=gnu++17  (test for warnings, line 36)
FAIL: g++.dg/warn/Wmismatched-new-delete-5.C  -std=gnu++20  (test for warnings, line 36)
FAIL: g++.dg/warn/Wmismatched-new-delete-5.C  -std=gnu++2b  (test for warnings, line 36)
FAIL: g++.dg/warn/Wmismatched-new-delete-5.C  -std=gnu++98  (test for warnings, line 36)

The UNRESOLVED ones likely just need to replace dce7 with dce8.
Comment 48 Richard Biener 2023-03-23 13:47:30 UTC
So instead of an extra DCE pass one could try harder to not leave around dead code after folding, for example with the following (doesn't fix the testcase
yet, all the custom folding code would need adjustments as well).  But it's
not 100% reliable and it comes at a cost even when folding isn't successful
(while we don't update stmt operands immediately, what they point to is
changed by the actual folding already).

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index e34f0888954..133619899cd 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "tree-ssa.h"
 #include "gimple-range.h"
+#include "tree-ssa-dce.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
@@ -3466,6 +3467,7 @@ pass_forwprop::execute (function *fun)
   auto_vec<gimple *, 32> to_remove;
   to_purge = BITMAP_ALLOC (NULL);
   bitmap need_ab_cleanup = BITMAP_ALLOC (NULL);
+  auto_bitmap simple_dce_seed;
   for (int i = 0; i < postorder_num; ++i)
     {
       gimple_stmt_iterator gsi;
@@ -3846,8 +3848,16 @@ pass_forwprop::execute (function *fun)
                                   && gimple_call_noreturn_p (stmt));
              changed = false;
 
+             auto_vec<int, 4> seeds;
+             use_operand_p use_p;
+             ssa_op_iter iter;
+             FOR_EACH_SSA_USE_OPERAND (use_p, orig_stmt, iter, SSA_OP_USE)
+               seeds.safe_push (SSA_NAME_VERSION (USE_FROM_PTR (use_p)));
+
              if (fold_stmt (&gsi, fwprop_ssa_val))
                {
+                 for (int seed : seeds)
+                   bitmap_set_bit (simple_dce_seed, seed);
                  changed = true;
                  stmt = gsi_stmt (gsi);
                  /* Cleanup the CFG if we simplified a condition to
@@ -4037,6 +4047,8 @@ pass_forwprop::execute (function *fun)
       cfg_changed |= fixup_noreturn_call (stmt);
     }
 
+  simple_dce_from_worklist (simple_dce_seed);
+
   cfg_changed |= gimple_purge_all_dead_eh_edges (to_purge);
   cfg_changed |= gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup);
   BITMAP_FREE (to_purge);


Another possibility is preventing FRE/PRE from picking up dead stmts, as
suggested this does fix the testcase.  The extra stmt removal is not
strictly necessary.  I'm going to test sth like this.

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index d5b081a309f..9f77b7a5647 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -6683,6 +6683,12 @@ eliminate_dom_walker::eliminate_stmt (basic_block b, gimple_stmt_iterator *gsi)
               && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt))
               && is_global_var (gimple_assign_rhs1 (stmt)))))
     {
+      if (has_zero_uses (lhs))
+       {
+         to_remove.safe_push (stmt);
+         return;
+       }
+
       sprime = eliminate_avail (b, lhs);
       if (!sprime)
        {
@@ -7200,7 +7206,8 @@ eliminate_dom_walker::eliminate_stmt (basic_block b, gimple_stmt_iterator *gsi)
      continue with the next stmt above and skip this.  */
   def_operand_p defp;
   FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF)
-    eliminate_push_avail (b, DEF_FROM_PTR (defp));
+    if (! has_zero_uses (DEF_FROM_PTR (defp)))
+      eliminate_push_avail (b, DEF_FROM_PTR (defp));
 }
 
 /* Perform elimination for the basic-block B during the domwalk.  */
@@ -8048,7 +8055,8 @@ process_bb (rpo_elim &avail, basic_block bb,
        /* If not eliminating, make all not already available defs
           available.  */
        FOR_EACH_SSA_TREE_OPERAND (op, gsi_stmt (gsi), i, SSA_OP_DEF)
-         if (! avail.eliminate_avail (bb, op))
+         if (! has_zero_uses (op)
+             && ! avail.eliminate_avail (bb, op))
            avail.eliminate_push_avail (bb, op);
     }
Comment 49 Jakub Jelinek 2023-03-23 14:30:31 UTC
Does the has_zero_uses patch work for this?  As there is both:
  _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308;
  _8 = ~_4;
and _8 has_zero_uses while _4 has uses but all are dead.  So, I assume we wouldn't try
to reuse _8 but perhaps we'd reuse _4?
Comment 50 GCC Commits 2023-03-23 16:22:20 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:41ade3399bd1ec9927be1bb818965831232eda4b

commit r13-6834-g41ade3399bd1ec9927be1bb818965831232eda4b
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Mar 23 14:52:01 2023 +0100

    tree-optimization/107569 - avoid wrecking earlier folding in FRE/PRE
    
    The following avoids picking up dead code left over from folding
    during FRE/PRE, effectively undoing propagations.
    
            PR tree-optimization/107569
            * tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt):
            Do not push SSA names with zero uses as available leader.
            (process_bb): Likewise.
    
            * g++.dg/opt/pr107569.C: New testcase.
Comment 51 Richard Biener 2023-03-23 16:22:38 UTC
Fixed.