Since r13-3596-ge7310e24b1c0ca67b1bb507c1330b2bf39e59e32 we don't optimize foo in 1 into return 1: namespace std { constexpr bool isfinite (float x) { return __builtin_isfinite (x); } constexpr bool isfinite (double x) { return __builtin_isfinite (x); } constexpr bool isfinite (long double x) { return __builtin_isfinite (x); } } bool foo (double x) { if (!std::isfinite (x)) __builtin_unreachable (); return std::isfinite (x); } bool bar (double x) { [[assume (std::isfinite (x))]]; return std::isfinite (x); } While bar hasn't been optimized before, it would be nice to make it working too.
I also noted that the following testcase: #include <cmath> struct TVec { double x, y; }; double dot(TVec const &u, TVec const &v) { return u.x * v.x + u.y * v.y; } double mag(TVec const &u) { if (!(dot(u, u) >= 0.0)) __builtin_unreachable(); return std::sqrt(dot(u, u)); } results in: (with -O3 -march=skylake --param=vrp1-mode=vrp) mag(TVec const&): vmovsd xmm1, QWORD PTR [rdi+8] vmovsd xmm0, QWORD PTR [rdi] vmulsd xmm1, xmm1, xmm1 vfmadd132sd xmm0, xmm1, xmm0 vsqrtsd xmm0, xmm0, xmm0 ret (with -O3 -march=skylake --param=vrp1-mode=ranger) mag(TVec const&): vmovsd xmm1, QWORD PTR [rdi+8] vmovsd xmm0, QWORD PTR [rdi] vmulsd xmm1, xmm1, xmm1 vfmadd132sd xmm0, xmm1, xmm0 vxorpd xmm1, xmm1, xmm1 vucomisd xmm1, xmm0 ja .L11 vsqrtsd xmm0, xmm0, xmm0 ret .L11: jmp sqrt
Perhaps we should in vrp1 defer removal of __builtin_unreachable (), at least the cases where we don't turn that into some useful range? That of course doesn't improve the assume attribute case.
(In reply to Jakub Jelinek from comment #2) > Perhaps we should in vrp1 defer removal of __builtin_unreachable (), at least > the cases where we don't turn that into some useful range? > That of course doesn't improve the assume attribute case. well we do improve it tho. in vrp1: Global Exported: _5 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN Global Exported (via unreachable): _9 = [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN Removing basic block 3 Merging blocks 2 and 4 double mag (const struct TVec & u) { double _3; double _5; double _6; double _7; double _8; double _9; <bb 2> [local count: 1073741824]: _3 = u_2(D)->x; _6 = _3 * _3; _7 = u_2(D)->y; _8 = _7 * _7; _9 = _6 + _8; _5 = sqrt (_9); return _5; The problem appears to be that the cdce pass runs afterwards, and introduces: _3 = u_2(D)->x; _6 = _3 * _3; _7 = u_2(D)->y; _8 = _7 * _7; _9 = _6 + _8; DCE_COND_LB.7_10 = _9; DCE_COND_LB_TEST.8_11 = DCE_COND_LB.7_10 u>= 0.0; if (DCE_COND_LB_TEST.8_11 != 0) goto <bb 5>; [99.95%] else goto <bb 4>; [0.05%] <bb 5> [local count: 1073204960]: _5 = .SQRT (_9); goto <bb 3>; [100.00%] <bb 4> [local count: 536864]: _12 = sqrt (_9); without checking the there is a global range for _9 which is [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN now, does that Nan cause u>= 0 not to be true? I notice we don't fold that in vrp2. I will let aldy pitch in on the rest of this.
The cdce case is something I've mentioned today: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605338.html The comparisons in there are artificial and so unlike user comparisons we should (if they were marked somehow) be able to optimize them away if frange can prove their result. But that isn't something happening on the #c0 testcase, is it?
(In reply to Jakub Jelinek from comment #4) > The cdce case is something I've mentioned today: > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605338.html > The comparisons in there are artificial and so unlike user comparisons we > should (if they were marked somehow) be able to optimize them away if frange > can prove their result. > But that isn't something happening on the #c0 testcase, is it? in vrp2 I see: 384 range_of_stmt () at stmt if (_9 u>= 0.0) 385 range_of_expr(_9) at stmt if (_9 u>= 0.0) TRUE : (385) range_of_expr (_9) [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN TRUE : (384) range_of_stmt () [irange] bool VARYING so we think that [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN u>= 0.0 does not fold. possibly some signalling NaN thing not allowing us to remove it?
__builtin_isfinite (x) is implemented as ~UNGT_EXPR<ABS_EXPR<x>, DBL_MAX>. So, if we have: _3 = ABS_EXPR <x_2(D)>; _4 = _3 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _5 = ~_4; return _5; for assume function, we should start with [1,1] assumption on _5, that implies _4 [0,0], and from UNGT_EXPR being false, we should derive that _3 is [-inf, 1.79769313486231570814527423731704356798070567525844996599e+308] and not NaN (if any operand is NaN, UNGT_EXPR is true, it stands for unordered or greater). And from _3 being [-inf, 1.79769313486231570814527423731704356798070567525844996599e+308] not NaN we should derive that x_2(D) is [-1.79769313486231570814527423731704356798070567525844996599e+308, 1.79769313486231570814527423731704356798070567525844996599e+308] not NaN (aka finite number).
(In reply to Jakub Jelinek from comment #6) > __builtin_isfinite (x) is implemented as > ~UNGT_EXPR<ABS_EXPR<x>, DBL_MAX>. > So, if we have: > _3 = ABS_EXPR <x_2(D)>; > _4 = _3 u> 1.79769313486231570814527423731704356798070567525844996599e+308; > _5 = ~_4; > return _5; > for assume function, we should start with [1,1] assumption on _5, that > implies > _4 [0,0], and from UNGT_EXPR being false, we should derive that > _3 is [-inf, > 1.79769313486231570814527423731704356798070567525844996599e+308] and not NaN > (if any operand is NaN, UNGT_EXPR is true, it stands for unordered or > greater). > And from _3 being [-inf, > 1.79769313486231570814527423731704356798070567525844996599e+308] not NaN > we should derive that x_2(D) is > [-1.79769313486231570814527423731704356798070567525844996599e+308, > 1.79769313486231570814527423731704356798070567525844996599e+308] not NaN > (aka finite number). oh, we're talking different cases. lets se. x_2(D) -> [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +-NAN _3 -> [frange] double [-Inf, 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] _4 -> [irange] bool [0, 0] NONZERO 0x0 _5 -> [irange] bool [1, 1] so we arent getting that x_2 is not NaN. THat would be a failing of foperator_abs::op1_range to not add the nans if the LHS has no NaNs then? it looks like the range is correct, but non the NaN. So I guess there are 2 different issues in this PR. I was orignally looking at the second one.
Under debugger I see unordered_gt::op1_range coming up with: [frange] double [-Inf, 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] which looks right. But then foperator_abs::op1_range makes: [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +-NAN out of that. That is correct except for the +-NAN part.
Before the 1281 // Then add the negative of each pair: 1282 // ABS(op1) = [5,20] would yield op1 => [-20,-5][5,20]. 1283 r.union_ (frange (type, 1284 real_value_negate (&positives.upper_bound ()), 1285 real_value_negate (&positives.lower_bound ()))); part r is [frange] double [0.0 (0x0.0p+0), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] so that is still ok. So perhaps we need to clear_nan on the newly created frange for the negative part (or, if r is a maybe nan make sure it has varying bit and if it is not nan, not be in there).
So: --- gcc/range-op-float.cc.jj 2022-11-06 11:56:27.138137781 +0100 +++ gcc/range-op-float.cc 2022-11-08 18:13:18.026974667 +0100 @@ -1280,9 +1280,10 @@ foperator_abs::op1_range (frange &r, tre return true; // Then add the negative of each pair: // ABS(op1) = [5,20] would yield op1 => [-20,-5][5,20]. - r.union_ (frange (type, - real_value_negate (&positives.upper_bound ()), - real_value_negate (&positives.lower_bound ()))); + frange negatives (type, real_value_negate (&positives.upper_bound ()), + real_value_negate (&positives.lower_bound ())); + negatives.clear_nan (); + r.union_ (negatives); return true; } fixes the abs backwards propagation, but for some reason it doesn't help to help the assume case. *.assumptions has: x_2(D) -> [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+3 08 (0x0.fffffffffffff8p+1024)] but # RANGE [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.79769313486231570814527423731704356798070567525844996599e+3 08 (0x0.fffffffffffff8p+1024)] +-NAN double x_2(D) = x; but then say vrp2: _Z3bard._assume.0 assume inferred range of x_1(D) (param x) = [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.7976931 3486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +-NAN Global Exported: _3 = [frange] double [0.0 (0x0.0p+0), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] +NAN so it uses the range with +-NAN rather than without it.
(In reply to Andrew Macleod from comment #5) > (In reply to Jakub Jelinek from comment #4) > > The cdce case is something I've mentioned today: > > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605338.html > > The comparisons in there are artificial and so unlike user comparisons we > > should (if they were marked somehow) be able to optimize them away if frange > > can prove their result. > > But that isn't something happening on the #c0 testcase, is it? > > in vrp2 I see: > 384 range_of_stmt () at stmt if (_9 u>= 0.0) > 385 range_of_expr(_9) at stmt if (_9 u>= 0.0) > TRUE : (385) range_of_expr (_9) [frange] double [-0.0 > (-0x0.0p+0), +Inf] +-NAN > TRUE : (384) range_of_stmt () [irange] bool VARYING > > so we think that > [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN u>= 0.0 does not fold. > > possibly some signalling NaN thing not allowing us to remove it? By vrp2 I see: =========== BB 2 ============ Imports: _6 _8 Exports: _6 _8 _9 _9 : _6(I) _8(I) <bb 2> [local count: 1073741824]: _3 = u_2(D)->x; _6 = _3 * _3; _7 = u_2(D)->y; _8 = _7 * _7; _9 = _6 + _8; if (_9 u>= 0.0) goto <bb 3>; [99.95%] else goto <bb 4>; [0.05%] _9 : [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN 2->4 (F) _9 : [frange] double [-0.0 (-0x0.0p+0), 0.0 (0x0.0p+0)] 2->3 (T) _9 : [frange] double [-0.0 (-0x0.0p+0), +Inf] +-NAN We can't fold the u>= 0.0 because that would remove a possible NAN at runtime, and it would be user-visible for signalling NANs. At least, that's been my understanding all along. You'll notice most of folding of relops in range-op-float.cc is predicated by !maybe_isnan(op1, op2) to disable them if NAN is a possibility, which in the case of _9 could happen: else if (!maybe_isnan (op1, op2)) { if (real_compare (GE_EXPR, &op1.lower_bound (), &op2.upper_bound ())) r = range_true (type); else if (!real_compare (GE_EXPR, &op1.upper_bound (), &op2.lower_bound ())) r = range_false (type); else r = range_true_and_false (type); } However, we should be able to mark that the 2->4 edge as undefined. For example, on the 2->4 edge, _9 is technically UNDEFINED. But we can't do that because frange's are represented as closed intervals, so we can't represent < 0.0 on the false side. Instead we represent it as +-0.0 which intersected with _9 is still +-0.0. If we could represent < 0.0 with one less ulp, say -0.000000000000001 (or whatever), intersecting that with the known _9 would yield UNDEFINED. I don't know if that would help in this PR, but ISTM the threader would be able to do something (or some other pass??). Unfortunately, we don't represent < and > very well. Technically all we would need is to tweak range-op-float.cc's build_gt() and build_lt() to do real_nexafter, but as with everything FP, the devil is in the details (what do we do when we go past the representable range for -ffinite-math-only, etc etc). It does seem a bit late in the cycle to fudge with this, but it is possible. Again, I don't know if this helps the PR, but these are the 2 issues I see.
It looks like the code reading from the blob in SSA_NAME_RANGE_INFO and populating an frange is always leaving the NAN bit toggled even if it wasn't in the stored range. Does this help? diff --git a/gcc/value-range-storage.cc b/gcc/value-range-storage.cc index 462447ba250..7ba7a16edc2 100644 --- a/gcc/value-range-storage.cc +++ b/gcc/value-range-storage.cc @@ -283,6 +283,8 @@ frange_storage_slot::get_frange (frange &r, tree type) const // make sure to set the NAN sign if known. if (HONOR_NANS (type) && (m_pos_nan ^ m_neg_nan) == 1) r.update_nan (m_neg_nan); + else if (!m_pos_nan && !m_neg_nan) + r.clear_nan (); } bool
Created attachment 53861 [details] preprocessed testcase for comment #2
I have tested the testcase in comment #1 with Clang, and I realized that Clang trunk avoids the tailcall to sqrt even without any hint with __builtin_unreachable: https://godbolt.org/z/5sb8bYcoq Clang is somehow smart enough to realize that dot(u, u) is always non-negative.
We don't have multiplication wired in frange, that is something we talked about today on gcc-patches.
I've filed PR107591 for the lack of x * x range optimization.
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>: https://gcc.gnu.org/g:4eadbe80060ab6c45193a1a57fac84b035e1c328 commit r13-3860-g4eadbe80060ab6c45193a1a57fac84b035e1c328 Author: Aldy Hernandez <aldyh@redhat.com> Date: Wed Nov 9 16:05:08 2022 +0100 Clear NAN when reading back a global range if necessary. When reading back from the global store, we must clear the NAN bit if necessary. The reason it's not happening is because the constructor sets a NAN by default (when HONOR_NANS). We must be careful to clear the NAN bit if the original range didn't have a NAN. I have commented the reason we use the constructor instead of filling out the fields by hand, because it wasn't clear at re-reading this code. PR 107569/tree-optimization gcc/ChangeLog: * value-range-storage.cc (frange_storage_slot::get_frange): Clear NAN if appropriate. * value-range.cc (range_tests_floats): New test.
Ok, just so that I don't just kibbitz/review frange stuff but also try to do something, here is my so far untested attempt at normal multiplication fold_range (not the x * x stuff discussed elsewhere): --- range-op-float.cc.jj1 2022-11-09 18:00:28.612247613 +0100 +++ range-op-float.cc 2022-11-09 19:06:11.075716000 +0100 @@ -1908,6 +1908,88 @@ class foperator_minus : public range_ope } } fop_minus; +/* Wrapper around frange_arithmetics, that computes the result + if inexact rounded to both directions. Also, if one of the + operands is +-0.0 and another +-inf, return +-0.0 rather than + NAN. */ + +static void +frange_mult (tree type, REAL_VALUE_TYPE &result_lb, REAL_VALUE_TYPE &result_ub, + const REAL_VALUE_TYPE &op1, const REAL_VALUE_TYPE &op2) +{ + if (real_iszero (&op1) && real_isinf (&op2)) + { + result_lb = op1; + if (real_isneg (&op2)) + real_value_negate (&result_lb); + result_ub = result_lb; + } + else if (real_isinf (&op1) && real_iszero (&op2)) + { + result_lb = op2; + if (real_isneg (&op1)) + real_value_negate (&result_lb); + result_ub = result_lb; + } + else + { + frange_arithmetic (MULT_EXPR, type, result_lb, op1, op2, dconstninf); + frange_arithmetic (MULT_EXPR, type, result_ub, op1, op2, dconstinf); + } +} + +class foperator_mult : public range_operator_float +{ + void rv_fold (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub, bool &maybe_nan, + tree type, + const REAL_VALUE_TYPE &lh_lb, + const REAL_VALUE_TYPE &lh_ub, + const REAL_VALUE_TYPE &rh_lb, + const REAL_VALUE_TYPE &rh_ub) const final override + { + REAL_VALUE_TYPE cp[8]; + // Do a cross-product. + frange_mult (type, cp[0], cp[4], lh_lb, rh_lb); + frange_mult (type, cp[1], cp[5], lh_lb, rh_ub); + frange_mult (type, cp[2], cp[6], lh_ub, rh_lb); + frange_mult (type, cp[3], cp[7], lh_ub, rh_ub); + for (int i = 1; i < 3; ++i) + { + if (real_less (&cp[i], &cp[0]) + || (real_iszero (&cp[0]) && real_isnegzero (&cp[i]))) + std::swap (cp[i], cp[0]); + if (real_less (&cp[4], &cp[i + 4]) + || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4]))) + std::swap (cp[i + 4], cp[4]); + } + lb = cp[0]; + ub = cp[4]; + + // [+-0, +-0] * [+INF,+INF] (or [-INF,-INF] or swapped is a known NaN. + if ((real_iszero (&lh_lb) && real_iszero (&lh_ub) + && real_isinf (&rh_lb) && real_isinf (&rh_ub, real_isneg (&rh_lb))) + || (real_iszero (&rh_lb) && real_iszero (&rh_ub) + && real_isinf (&lh_lb) && real_isinf (&lh_ub, real_isneg (&lh_lb)))) + { + real_nan (&lb, NULL, 0, TYPE_MODE (type)); + ub = lb; + maybe_nan = true; + } + // Otherwise, if one range includes zero and the other ends with +-INF, + // it is a maybe NaN. + else if (real_compare (LE_EXPR, &lh_lb, &dconst0) + && real_compare (GE_EXPR, &lh_ub, &dconst0) + && (real_isinf (&rh_lb) || real_isinf (&rh_ub))) + maybe_nan = true; + else if (real_compare (LE_EXPR, &rh_lb, &dconst0) + && real_compare (GE_EXPR, &rh_ub, &dconst0) + && (real_isinf (&lh_lb) || real_isinf (&lh_ub))) + maybe_nan = true; + else + maybe_nan = false; + } +} fop_mult; + // Instantiate a range_op_table for floating point operations. static floating_op_table global_floating_table; @@ -1942,6 +2024,7 @@ floating_op_table::floating_op_table () set (NEGATE_EXPR, fop_negate); set (PLUS_EXPR, fop_plus); set (MINUS_EXPR, fop_minus); + set (MULT_EXPR, fop_mult); } // Return a pointer to the range_operator_float instance, if there is For float foo (float x, float y) { if (x < -17.0f || x > 19.5f || y < -25.0f || y > 15.5f) return 0.0; return x * y; } float bar (float x, float y) { if (x > -17.0f && x < 19.5f && y > -25.0f && y < 15.5f) return x * y; return 0.0; } the product range is [frange] float [-4.875e+2 (-0x0.f3cp+9), 4.25e+2 (0x0.d48p+9)] +-NAN in the first case and [frange] float [-4.875e+2 (-0x0.f3cp+9), 4.25e+2 (0x0.d48p+9)] in the latter (where NAN can't appear). And while +-0.0 is in the range of even both operands, neither range includes any infinities.
(In reply to Jakub Jelinek from comment #18) > Ok, just so that I don't just kibbitz/review frange stuff but also try to do > something, here is my so far untested attempt at normal multiplication > fold_range (not the x * x stuff discussed elsewhere): Looking good! Thanks for tackling the hardest of them all.
Created attachment 53865 [details] gcc13-pr107569.patch Here is an updated patch (including the incremental patch) with some fixes. Previous version caused some ICEs, that is now fixed, but there is +FAIL: gcc.dg/fold-overflow-1.c scan-assembler-times 2139095040 2 regression (will file a separate PR about it, it isn't just the multiplication regression but already addition), and +FAIL: gfortran.dg/fmt_g0_6.f08 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test +FAIL: gfortran.dg/fmt_g0_6.f08 -O3 -g execution test +FAIL: math +FAIL: special_functions/08_cyl_bessel_j/check_value.cc execution test +FAIL: tr1/5_numerical_facilities/special_functions/09_cyl_bessel_j/check_value.cc execution test I need to investigate.
Ok, found a brown paper bag issue in the patch: --- gcc/range-op-float.cc2022-11-09 21:31:09.420369509 +0100 +++ gcc/range-op-float.cc2022-11-09 21:31:09.420369509 +0100 @@ -1981,7 +1981,7 @@ frange_mult (type, cp[2], cp[6], lh_ub, rh_lb); } frange_mult (type, cp[3], cp[7], lh_ub, rh_ub); - for (int i = 1; i < 3; ++i) + for (int i = 1; i < 4; ++i) { if (real_less (&cp[i], &cp[0]) || (real_iszero (&cp[0]) && real_isnegzero (&cp[i]))) With this the gfortran.dg/fmt_g0_6.f08 FAILs are gone.
Note, I was using: double foo (int x, int y) { double r; if (x >= 0) return 0.0; switch (y) { case 1: r = 0.0; break; case 2: r = 1.0; break; default: r = 0.5; break; } return r * __builtin_pow (10.0, x); } as reduced testcase of what the Fortran testcase does, but I see there some unrelated problem: Folding statement: _2 = __builtin_pow (1.0e+1, _1); Global Exported: _2 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN The +NAN looks suspicious, shouldn't that be +-NAN ? Of course once we handle POW builtins, if we are smart enough we should see that it is 10.0 ** [INT_MIN, -1] and so [0.0, 1.0e-1] (plus some larger ulp error because library functions aren't exactly 0.5ulp precise all the time). But when we don't know what __builtin_pow does (from frange perspective), I don't see what tells us that NAN with negative sign can't appear.
Created attachment 53866 [details] gcc13-pr107569.patch Updated version of the patch I'll test now (if you don't have any nits). Besides the thinko I've changed rv_fold to have relation_kind argument rather than relation_trio, passing the latter to a function that doesn't take ranges of the operands/result but just the real values is weird.
(In reply to Jakub Jelinek from comment #22) > Folding statement: _2 = __builtin_pow (1.0e+1, _1); > Global Exported: _2 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN > The +NAN looks suspicious, shouldn't that be +-NAN ? > Of course once we handle POW builtins, if we are smart enough we should see > that it is 10.0 ** [INT_MIN, -1] and so [0.0, 1.0e-1] (plus some larger ulp > error because library functions aren't exactly 0.5ulp precise all the time). > But when we don't know what __builtin_pow does (from frange perspective), I > don't see what > tells us that NAN with negative sign can't appear. Yeah, that +NAN looks very suspicious. For that matter, it took me a while to figure out how we know that _2 can't be negative, because we don't have any range-op entries for __builtin_pow. So...here's a trick to figure this out: --param=ranger-debug=tracegori You'll see in the *evrp dump: Folding statement: _2 = __builtin_pow (1.0e+1, _1); 45 range_of_stmt (_2) at stmt _2 = __builtin_pow (1.0e+1, _1); TRUE : (45) range_of_stmt (_2) [frange] double [0.0 (0x0.0p+0), +Inf] +NAN 46 range_of_expr(_1) at stmt _2 = __builtin_pow (1.0e+1, _1); TRUE : (46) range_of_expr (_1) [frange] double VARYING +-NAN 47 range_of_stmt (_2) at stmt _2 = __builtin_pow (1.0e+1, _1); TRUE : (47) cached (_2) [frange] double [0.0 (0x0.0p+0), +Inf] +NAN Global Exported: _2 = [frange] double [0.0 (0x0.0p+0), +Inf] +NAN So ranger was able to figure out immediately that _2 was positive. Andrew added smarts to break into any given place, so we can break where the counter is 45: (gdb) break breakpoint if index == 45 Yes, amazingly there's only one function named breakpoint() in the entire compiler ;-). If you single step from there on, we run into: if (gimple_stmt_nonnegative_warnv_p (call, &strict_overflow_p)) r.set_nonnegative (type); else if (gimple_call_nonnull_result_p (call) || gimple_call_nonnull_arg (call)) r.set_nonzero (type); else r.set_varying (type); IIRC, we had some discussion upstream about the meaning of set_nonnegative, and we all agreed that nuking -NAN was the right thing. Neat, huh? :)
(In reply to Jakub Jelinek from comment #23) > Created attachment 53866 [details] > gcc13-pr107569.patch > > Updated version of the patch I'll test now (if you don't have any nits). > Besides the thinko I've changed rv_fold to have relation_kind argument rather > than relation_trio, passing the latter to a function that doesn't take > ranges of > the operands/result but just the real values is weird. And yes, OK by me. You're far more qualified to opine in this area :). Thanks.
(In reply to Aldy Hernandez from comment #24) > If you single step from there on, we run into: > > if (gimple_stmt_nonnegative_warnv_p (call, &strict_overflow_p)) > r.set_nonnegative (type); > else if (gimple_call_nonnull_result_p (call) > || gimple_call_nonnull_arg (call)) > r.set_nonzero (type); > else > r.set_varying (type); > > IIRC, we had some discussion upstream about the meaning of set_nonnegative, > and we all agreed that nuking -NAN was the right thing. Neat, huh? :) Is this done only for statements for which there isn't a ranges handler? If so, given the IEEE 754 non-guarantee of NAN signs except for copy, abs, copysign and negate I'd say that we should have a ranges handler for all those ops and for anything else assume NAN sign is VARYING, including the above spot. As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't say sqrt is nonnegative (even when sqrt (-0.0) is -0.0).
(In reply to Jakub Jelinek from comment #26) > (In reply to Aldy Hernandez from comment #24) > > If you single step from there on, we run into: > > > > if (gimple_stmt_nonnegative_warnv_p (call, &strict_overflow_p)) > > r.set_nonnegative (type); > > else if (gimple_call_nonnull_result_p (call) > > || gimple_call_nonnull_arg (call)) > > r.set_nonzero (type); > > else > > r.set_varying (type); > > > > IIRC, we had some discussion upstream about the meaning of set_nonnegative, > > and we all agreed that nuking -NAN was the right thing. Neat, huh? :) > > Is this done only for statements for which there isn't a ranges handler? > If so, given the IEEE 754 non-guarantee of NAN signs except for copy, abs, > copysign and negate I'd say that we should have a ranges handler for all > those ops and for anything else assume NAN sign is VARYING, including the > above spot. We're doing this for all GIMPLE_CALL's, so copy, abs, copysign are handled separately, either as builtins or in the IL directly (i.e. not calls). Hmmm, maybe it's time to revisit what frange::set_nnonnegative() means (again). Lemme think about this...there are very few set_nonnegative() calls in ranger. It should be easy to contain. > As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't > say sqrt is nonnegative (even when sqrt (-0.0) is -0.0). It seems tree_call_nonnegative_warnv_p is already doing the right thing for sqrt? CASE_CFN_SQRT: CASE_CFN_SQRT_FN: /* sqrt(-0.0) is -0.0. */ if (!HONOR_SIGNED_ZEROS (type)) return true; return RECURSE (arg0);
(In reply to Aldy Hernandez from comment #27) > > As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't > > say sqrt is nonnegative (even when sqrt (-0.0) is -0.0). > > It seems tree_call_nonnegative_warnv_p is already doing the right thing for > sqrt? > > CASE_CFN_SQRT: > CASE_CFN_SQRT_FN: > /* sqrt(-0.0) is -0.0. */ > if (!HONOR_SIGNED_ZEROS (type)) > return true; > return RECURSE (arg0); Ah, ok. So, if we during stage3 go through all the cases and convince ourselves it is good enough, it might be ok for signed zeros then. Still the sign of NAN is separate question from that, and there we should set +NAN only when we see CASE_CFN_FABS{,_FN} or ABS_EXPR, toggle the sign of NEGATE_EXPR, inherit from second argument for CASE_CFN_COPYSIGN{,_FN} and copy over for copies.
(In reply to Jakub Jelinek from comment #28) > (In reply to Aldy Hernandez from comment #27) > > > As for signed zeros in -fsigned-zeros (default) mode, wonder if we e.g. don't > > > say sqrt is nonnegative (even when sqrt (-0.0) is -0.0). > > > > It seems tree_call_nonnegative_warnv_p is already doing the right thing for > > sqrt? > > > > CASE_CFN_SQRT: > > CASE_CFN_SQRT_FN: > > /* sqrt(-0.0) is -0.0. */ > > if (!HONOR_SIGNED_ZEROS (type)) > > return true; > > return RECURSE (arg0); > > Ah, ok. So, if we during stage3 go through all the cases and convince > ourselves it is good enough, it might be ok for signed zeros then. Still > the sign of NAN is separate question from that, and there we should set +NAN > only when we see > CASE_CFN_FABS{,_FN} or ABS_EXPR, toggle the sign of NEGATE_EXPR, inherit > from second argument for CASE_CFN_COPYSIGN{,_FN} and copy over for copies. From what I've seen, ABS is represented in the IL directly without the need for a builtin, at least by evrp time. Is this not always the case? Negate, copysign, and copy are already handled specially so no need to do anything there. I have a patch I'm testing for the generic case.
Created attachment 53869 [details] do not set NAN sign in frange::set_nonnegative() proposed patch in testing
Created attachment 53873 [details] gcc13-pr107569-div.patch This is what I meant by complete nightmare - division.
(In reply to Jakub Jelinek from comment #31) > Created attachment 53873 [details] > gcc13-pr107569-div.patch > > This is what I meant by complete nightmare - division. Ugh, yeah. That's pretty bad. (Not your code, the inevitable special casing.) Could you abstract out functionality into short inline functions to make it more readable? If it's too much hassle, or doesn't improve readability then don't... after all, it's all compartmentalized in a function in range-ops..which is the whole points of range-ops, being able to sand box all this knowledge. Out of curiosity, why is it so much more complex than the integer versions which share a lot of code?
(In reply to Jakub Jelinek from comment #31) > Created attachment 53873 [details] > gcc13-pr107569-div.patch > > This is what I meant by complete nightmare - division. We can take this to gcc-patches when you're done, but just a few thoughts... + // If +-0.0 is in both ranges, it is a maybe NAN. + if (real_compare (LE_EXPR, &lh_lb, &dconst0) + && real_compare (GE_EXPR, &lh_ub, &dconst0) + && real_compare (LE_EXPR, &rh_lb, &dconst0) + && real_compare (GE_EXPR, &rh_ub, &dconst0)) Perhaps we could provide frange::contains_zero_p ()? + // +-0.0 / +-0.0 or +-INF / +-INF is a known NAN. + if ((real_iszero (&lh_lb) + && real_iszero (&lh_ub) + && real_iszero (&rh_lb) + && real_iszero (&rh_ub)) This looks like frange::contains_zerp_p () as well. + || (real_isinf (&lh_lb) + && real_isinf (&lh_ub, real_isneg (&lh_lb)) + && real_isinf (&rh_lb) + && real_isinf (&rh_ub, real_isneg (&rh_lb)))) Note that, real_isinf with only one argument checks for +-INF. But I think what you're looking for is frange::maybe_isinf. Could your patch be simplified with some of these? // fpclassify like API bool known_isfinite () const; bool known_isnan () const; bool known_isinf () const; bool maybe_isnan () const; bool maybe_isnan (bool sign) const; bool maybe_isinf () const; bool signbit_p (bool &signbit) const; bool nan_signbit_p (bool &signbit) const; We should ultimately avoid peeking at the end points unnecessarily in order to prepare ourselves for next release when we (hopefully) have sub-ranges.
(In reply to Aldy Hernandez from comment #33) > (In reply to Jakub Jelinek from comment #31) > > Created attachment 53873 [details] > > gcc13-pr107569-div.patch > > > > This is what I meant by complete nightmare - division. > > We can take this to gcc-patches when you're done, but just a few thoughts... > > + // If +-0.0 is in both ranges, it is a maybe NAN. > + if (real_compare (LE_EXPR, &lh_lb, &dconst0) > + && real_compare (GE_EXPR, &lh_ub, &dconst0) > + && real_compare (LE_EXPR, &rh_lb, &dconst0) > + && real_compare (GE_EXPR, &rh_ub, &dconst0)) > > Perhaps we could provide frange::contains_zero_p ()? Well, contains_p in irange is a method on the value range, while here we don't have a frange, but just naked REAL_VALUE_TYPEs. It is twice contains_zero_p... > > + // +-0.0 / +-0.0 or +-INF / +-INF is a known NAN. > + if ((real_iszero (&lh_lb) > + && real_iszero (&lh_ub) > + && real_iszero (&rh_lb) > + && real_iszero (&rh_ub)) > > This looks like frange::contains_zerp_p () as well. No, this is twice zero_p. Due to signed zeros it isn't a singleton + contains_zero_p, just both boundaries are zero. > + || (real_isinf (&lh_lb) > + && real_isinf (&lh_ub, real_isneg (&lh_lb)) > + && real_isinf (&rh_lb) > + && real_isinf (&rh_ub, real_isneg (&rh_lb)))) > > Note that, real_isinf with only one argument checks for +-INF. I know. I'm intentionally using one and 2 argument ones to verify that lh is either [INF,INF] or [-INF,-INF], but not [-INF,INF]. > But I think > what you're looking for is frange::maybe_isinf. Again, that works on frange, which I don't have here. > > Could your patch be simplified with some of these? > > // fpclassify like API > bool known_isfinite () const; > bool known_isnan () const; > bool known_isinf () const; > bool maybe_isnan () const; > bool maybe_isnan (bool sign) const; > bool maybe_isinf () const; > bool signbit_p (bool &signbit) const; > bool nan_signbit_p (bool &signbit) const; > > We should ultimately avoid peeking at the end points unnecessarily in order > to prepare ourselves for next release when we (hopefully) have sub-ranges. No, see above (at least for now). The peeking at the end points is needed because those end points behave weirdly.
(In reply to Jakub Jelinek from comment #34) > (In reply to Aldy Hernandez from comment #33) > > what you're looking for is frange::maybe_isinf. > > Again, that works on frange, which I don't have here. Ughhh...yeah...brain fart. Sorry.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:2f7f9edd28d75a85a33599978f23811e679e443d commit r13-3923-g2f7f9edd28d75a85a33599978f23811e679e443d Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Nov 12 09:33:01 2022 +0100 range-op: Implement floating point multiplication fold_range [PR107569] The following patch implements frange multiplication, including the special case of x * x. The callers don't tell us that it is x * x, just that it is either z = x * x or if (x == y) z = x * y; For irange that makes no difference, but for frange it can mean x is -0.0 and y is 0.0 if they have the same range that includes both signed and unsigned zeros, so we need to assume result could be -0.0. The patch causes one regression: +FAIL: gcc.dg/fold-overflow-1.c scan-assembler-times 2139095040 2 but that is already tracked in PR107608 and affects not just the newly added multiplication, but addition and other floating point operations (and doesn't seem like a ranger bug but dce or whatever else). 2022-11-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/107569 PR tree-optimization/107591 * range-op.h (range_operator_float::rv_fold): Add relation_kind argument. * range-op-float.cc (range_operator_float::fold_range): Name last argument trio and pass trio.op1_op2 () as last argument to rv_fold. (range_operator_float::rv_fold): Add relation_kind argument. (foperator_plus::rv_fold, foperator_minus::rv_fold): Likewise. (foperator_mult): New class. (floating_op_table::floating_op_table): Use foperator_mult for MULT_EXPR.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:2d5c4a16dd833aa083f13dd3e78e3ef38afe6ebb commit r13-3924-g2d5c4a16dd833aa083f13dd3e78e3ef38afe6ebb Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Nov 12 09:35:16 2022 +0100 range-op: Implement floating point division fold_range [PR107569] Here is the floating point division fold_range implementation, as I wrote in the last mail, we could outline some of the common parts into static methods with descriptive names and share them between foperator_div and foperator_mult. Regressions are +FAIL: gcc.dg/pr95115.c execution test +FAIL: libphobos.phobos/std/math/hardware.d execution test +FAIL: libphobos.phobos_shared/std/math/hardware.d execution test The first test is we have: # RANGE [frange] double [] +-NAN _3 = Inf / Inf; if (_3 ord _3) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : abort (); <bb 4> : before evrp, the range is correct, Inf / Inf is known NAN of unknown sign. evrp correctly folds _3 ord _3 into false and the _3 = Inf / Inf; remains in the IL, but then comes dse1 and removes it as dead statement. So, I think yet another example of the PR107608 problems where DCE? removes dead statements which raise floating point exceptions. And -fno-delete-dead-exceptions doesn't help. 2022-11-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/107569 * range-op-float.cc (foperator_div): New class. (floating_op_table::floating_op_table): Use foperator_div for RDIV_EXPR.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:5747470efa8ff0ac82bb5f53d737b29a44f18118 commit r13-3925-g5747470efa8ff0ac82bb5f53d737b29a44f18118 Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Nov 12 09:36:59 2022 +0100 range-op: Cleanup floating point multiplication and division fold_range [PR107569] Admittedly there are many similar spots with the foperator_div case (but also with significant differences), so perhaps if foperator_{mult,div} inherit from some derived class from range_operator_float and that class would define various smaller helper static? methods, like this discussed in the PR - contains_zero_p, singleton_nan_p, zero_p, that + bool must_have_signbit_zero = false; + bool must_have_signbit_nonzero = false; + if (real_isneg (&lh_lb) == real_isneg (&lh_ub) + && real_isneg (&rh_lb) == real_isneg (&rh_ub)) + { + if (real_isneg (&lh_lb) == real_isneg (&rh_ub)) + must_have_signbit_zero = true; + else + must_have_signbit_nonzero = true; + } returned as -1/0/1 int, and those set result (based on the above value) to [+INF, +INF], [-INF, -INF] or [-INF, +INF] or [+0, +0], [-0, -0] or [-0, +0] or [+0, +INF], [-INF, -0] or [-INF, +INF] and the + for (int i = 1; i < 4; ++i) + { + if (real_less (&cp[i], &cp[0]) + || (real_iszero (&cp[0]) && real_isnegzero (&cp[i]))) + std::swap (cp[i], cp[0]); + if (real_less (&cp[4], &cp[i + 4]) + || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4]))) + std::swap (cp[i + 4], cp[4]); + } block, it could be smaller and more readable. 2022-11-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/107569 * range-op-float.cc (zero_p, contains_p, singleton_inf_p, signbit_known_p, zero_range, inf_range, zero_to_inf_range): New functions. (foperator_mult_div_base): New class. (foperator_mult, foperator_div): Derive from that and use protected static method from it as well as above new functions to simplify the code.
Hi, FYI, when trying to bootstrap a GNU toolchain with glibc (current master branch), binutils (current 2.39 release branch) and gcc, I noticed that the fixes done for this bug caused the following regressions on glibc side (tested on x86_64 Linux): FAIL: math/test-double-cosh FAIL: math/test-double-exp10 FAIL: math/test-double-expm1 FAIL: math/test-double-lgamma FAIL: math/test-double-log1p FAIL: math/test-double-tgamma FAIL: math/test-float-cosh FAIL: math/test-float-expm1 FAIL: math/test-float-lgamma FAIL: math/test-float-log1p FAIL: math/test-float-tgamma FAIL: math/test-float128-catan FAIL: math/test-float128-catanh FAIL: math/test-float128-cosh FAIL: math/test-float128-exp10 FAIL: math/test-float128-lgamma FAIL: math/test-float128-log FAIL: math/test-float128-log1p FAIL: math/test-float128-pow FAIL: math/test-float128-tgamma FAIL: math/test-float128-y0 FAIL: math/test-float128-y1 FAIL: math/test-float32-cosh FAIL: math/test-float32-expm1 FAIL: math/test-float32-lgamma FAIL: math/test-float32-log1p FAIL: math/test-float32-tgamma FAIL: math/test-float32x-cosh FAIL: math/test-float32x-exp10 FAIL: math/test-float32x-expm1 FAIL: math/test-float32x-lgamma FAIL: math/test-float32x-log1p FAIL: math/test-float32x-tgamma FAIL: math/test-float64-cosh FAIL: math/test-float64-exp10 FAIL: math/test-float64-expm1 FAIL: math/test-float64-lgamma FAIL: math/test-float64-log1p FAIL: math/test-float64-tgamma FAIL: math/test-float64x-cosh FAIL: math/test-float64x-lgamma FAIL: math/test-float64x-tgamma FAIL: math/test-ldouble-cosh FAIL: math/test-ldouble-lgamma FAIL: math/test-ldouble-tgamma These tests are working with gcc-13-3916-g5b6ce16adec (daily bump from 12th of November) and failing with gcc-13-3933-g30d77d49628 (daily bump from 13th of November). I did try to use the current master, it still fails. I also tried to patch from PR107879 and it still fails. Cheers, Romain
The current state of #c0 testcase is that bar is actually optimized into return 1; Folding statement: .ASSUME (_Z3bard._assume.0, x_1(D)); _Z3bard._assume.0 assume inferred range of x_1(D) (param x) = [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.7976931 3486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] during vrp1 and Folding statement: _4 = _3 u> 1.79769313486231570814527423731704356798070567525844996599e+308; Queued stmt for removal. Folds to: 0 It is correct to optimize the comparison even with -ftrapping-math - comparisons only emit exceptions on NaNs, the quiet ones like u> even just on sNaNs, and the range proves that the non-constant operand is never a NAN and the other isn't either (it is constant). On the other side, foo isn't optimized. # RANGE [frange] double [0.0 (0x0.0p+0), +Inf] +NAN _6 = ABS_EXPR <x_3(D)>; _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _8 = ~_4; if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : __builtin_unreachable (); <bb 4> : # RANGE [frange] double [0.0 (0x0.0p+0), +Inf] +NAN _9 = ABS_EXPR <x_3(D)>; _10 = _9 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _11 = ~_10; return _11; is turned by fre1 into: _6 = ABS_EXPR <x_3(D)>; _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _8 = ~_4; if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : __builtin_unreachable (); <bb 4> : return _8; and while e.g. evrp figures correctly out that 2->3 (T) _6 : [frange] double [+Inf, +Inf] +NAN 2->4 (F) x_3(D) : [frange] double [-1.79769313486231570814527423731704356798070567525844996599e+308 (-0x0.fffffffffffff8p+1024), 1.797693134862315708145274237317043567980705675 25844996599e+308 (0x0.fffffffffffff8p+1024)] 2->4 (F) _4 : [irange] bool [0, 0] NONZERO 0x0 2->4 (F) _6 : [frange] double [0.0 (0x0.0p+0), 1.79769313486231570814527423731704356798070567525844996599e+308 (0x0.fffffffffffff8p+1024)] it doesn't do the extra step of figuring out that when _4 on the 2->4 edge must be 0, then _8 on that edge must be 1. And, the finite range say for _6 or x_3(D) isn't stored into global state (if there would be say some possibly not returning call between _6 definition and uses or for x_3(D) between start of function and the uses, we obviously couldn't store it as a global range; in this case we could if we proved that isn't possible, i.e. that if the function is reached then return _8; is reached too). And then during vrp1 the same problem and __builtin_unreachable () is removed with nothing noted anywhere. Andrew, any thoughts?
As for #c1, with trunk -O3 -march=skylake it is: vmovsd 8(%rdi), %xmm1 vmovsd (%rdi), %xmm0 vmulsd %xmm1, %xmm1, %xmm1 vfmadd132sd %xmm0, %xmm1, %xmm0 vsqrtsd %xmm0, %xmm0, %xmm0 which is I think what we want.
On #c0 foo, this was previously optimized in dom2 which optimized _4 = ABS_EXPR <x_2(D)>; _3 = _4 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _5 = ~_3; if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308) goto <bb 3>; [0.00%] else goto <bb 4>; [100.00%] <bb 3> [count: 0]: __builtin_unreachable (); <bb 4> [local count: 1073741824]: return _5; without any frange related stuff as: - if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308) + if (_3 != 0) and - return _5; + return 1; But because __builtin_unreachable () is now removed earlier (vrp1 already; without providing useful global range though), we don't do that anymore.
(In reply to Jakub Jelinek from comment #42) > On #c0 foo, this was previously optimized in dom2 which optimized > _4 = ABS_EXPR <x_2(D)>; > _3 = _4 u> 1.79769313486231570814527423731704356798070567525844996599e+308; > _5 = ~_3; > if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308) > goto <bb 3>; [0.00%] > else > goto <bb 4>; [100.00%] > > <bb 3> [count: 0]: > __builtin_unreachable (); > > <bb 4> [local count: 1073741824]: > return _5; > without any frange related stuff as: > - if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308) > + if (_3 != 0) > and > - return _5; > + return 1; > > But because __builtin_unreachable () is now removed earlier (vrp1 already; > without providing useful global range though), we don't do that anymore. Hmm. its because when vrp1 removes the branch to builtin_unreachable, it calculates the global range of _4 as the union of all the ranges at each use.. this is to avoid issues where the unreachable call may not post-dominate an earlier use. THe initial use of _4 still is resolved to only [0, max] so we end up only using that for the range. Im experimenting with doing things the same way, except NOT removing the branch in VRP, but continuing to set the global range the way we do. I might also be able to revisit the branch *after* all the globals have been set, and see if the globals values now cause the condition to fold, and if they do, remove them only then... ThIs would leave the above case for DOM2 (or someone else to resolve). It seems like it might be a reasonable balance... ie, we only remove the unreachable in VRP1 if we can determine with 100% positivity that setting the global value will result in the branch being able to remove. Its also possible it could also just be left... and another pass should remove it shortly as it folds away... and certainly by VRP2 time... Anyway, I'll try it a few ways and see what seems to work best.
I can confirm that foo() is still not optimized but we now optimize bar() in VRP1. VRP1 sees bool foo (double x) { bool _3; double _4; bool _5; <bb 2> [local count: 1073741824]: _4 = ABS_EXPR <x_2(D)>; _3 = _4 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _5 = ~_3; if (_4 u> 1.79769313486231570814527423731704356798070567525844996599e+308) goto <bb 3>; [0.00%] else goto <bb 4>; [100.00%] <bb 3> [count: 0]: __builtin_unreachable (); <bb 4> [local count: 1073741824]: return _5;
For the #c0 foo function, one simple fix would be something like --- gcc/passes.def.jj 2023-01-02 09:32:39.539037434 +0100 +++ gcc/passes.def 2023-03-22 16:12:57.387652639 +0100 @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_forwprop); NEXT_PASS (pass_early_thread_jumps, /*first=*/true); NEXT_PASS (pass_sra_early); + NEXT_PASS (pass_dce); /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); The problem there is that ccp1 and forwprop1 passes result in some dead statements: _6 = ABS_EXPR <x_3(D)>; _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _8 = ~_4; _12 = _8; _1 = _12; retval.0_5 = ~_1; if (retval.0_5 != 0) by ccp1 into: _6 = ABS_EXPR <x_3(D)>; _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _8 = ~_4; if (_4 != 0) and forwprop1: _6 = ABS_EXPR <x_3(D)>; _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _8 = ~_4; if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308) So, now both _8 and _4 setters are dead. Then comes fre1 and happily uses them again, which results in undesirable _6 = ABS_EXPR <x_3(D)>; _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _8 = ~_4; if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : __builtin_unreachable (); <bb 4> : return _8; With the extra dce, we get _6 = ABS_EXPR <x_3(D)>; if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : __builtin_unreachable (); <bb 4> : _9 = ABS_EXPR <x_3(D)>; _10 = _9 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _11 = ~_10; return _11; before fre1 and optimize that into: _6 = ABS_EXPR <x_3(D)>; if (_6 u> 1.79769313486231570814527423731704356798070567525844996599e+308) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : __builtin_unreachable (); <bb 4> : return 1;
And another possibility might be try to keep __builtin_unreachable () in the IL more often; in this testcase nothing from the __builtin_unreachable () is really visible in any global ranges, they all are the same without/with the __builtin_unreachable. So, can't we remove those just in final_p case and not otherwise? Dom would then be able to optimize this.
I have tested the --- gcc/passes.def.jj 2023-03-22 16:59:45.378390155 +0100 +++ gcc/passes.def 2023-03-22 22:07:35.272803901 +0100 @@ -88,6 +88,7 @@ along with GCC; see the file COPYING3. /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); + NEXT_PASS (pass_dce); NEXT_PASS (pass_fre, true /* may_iterate */); NEXT_PASS (pass_early_vrp); NEXT_PASS (pass_merge_phi); patch and it unfortunately has some regressions on x86_64-linux, haven't verified though what is just that testcases would need to be tweaking and where we actually end up with worse code: UNRESOLVED: gcc.dg/bic-bitmask-13.c scan-tree-dump-not dce7 "&\\\\s* 4294967040" UNRESOLVED: gcc.dg/bic-bitmask-13.c scan-tree-dump-times dce7 "<=\\\\s* 255" 1 UNRESOLVED: gcc.dg/bic-bitmask-14.c scan-tree-dump-not dce7 "&\\\\s* 4294967040" UNRESOLVED: gcc.dg/bic-bitmask-14.c scan-tree-dump-times dce7 "<=\\\\s* 255" 1 UNRESOLVED: gcc.dg/bic-bitmask-15.c scan-tree-dump-not dce7 "&\\\\s* 4294967040" UNRESOLVED: gcc.dg/bic-bitmask-15.c scan-tree-dump-times dce7 "=\\\\s* 1" 1 UNRESOLVED: gcc.dg/bic-bitmask-16.c scan-tree-dump-not dce7 "&\\\\s* 4294967040" UNRESOLVED: gcc.dg/bic-bitmask-16.c scan-tree-dump-times dce7 ">\\\\s* 255" 1 UNRESOLVED: gcc.dg/bic-bitmask-17.c scan-tree-dump-not dce7 "&\\\\s* 4294967040" UNRESOLVED: gcc.dg/bic-bitmask-17.c scan-tree-dump-times dce7 "<=\\\\s* 255" 1 UNRESOLVED: gcc.dg/bic-bitmask-18.c scan-tree-dump-times dce7 " = 0;" 1 UNRESOLVED: gcc.dg/bic-bitmask-19.c scan-tree-dump-not dce7 "&\\\\s* 4294967294" UNRESOLVED: gcc.dg/bic-bitmask-19.c scan-tree-dump-times dce7 ">\\\\s* 1" 1 UNRESOLVED: gcc.dg/bic-bitmask-20.c scan-tree-dump dce7 "&\\\\s* 4294967290" UNRESOLVED: gcc.dg/bic-bitmask-20.c scan-tree-dump-not dce7 "<=\\\\s* 4294967289" UNRESOLVED: gcc.dg/bic-bitmask-21.c scan-tree-dump dce7 "<=\\\\s* 255" UNRESOLVED: gcc.dg/bic-bitmask-21.c scan-tree-dump-not dce7 "&\\\\s* 4294967290" UNRESOLVED: gcc.dg/bic-bitmask-22.c scan-tree-dump dce7 ">\\\\s* 255" UNRESOLVED: gcc.dg/bic-bitmask-22.c scan-tree-dump-not dce7 "&\\\\s* 4294967290" FAIL: gcc.dg/pr23911.c scan-tree-dump-times dce3 "(?n)IMAGPART_EXPR.*= 0\\\\.0" 2 FAIL: gcc.dg/pr23911.c scan-tree-dump-times dce3 "(?n)REALPART_EXPR.*= 1\\\\.0e\\\\+0" 2 FAIL: gcc.dg/guality/pr45003-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 10 a == 0x8078 FAIL: gcc.dg/guality/pr45003-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 19 a == 0xffff8078 FAIL: gcc.dg/guality/pr45003-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 10 a == -32648 FAIL: gcc.dg/guality/pr45003-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 19 a == 0x8078 UNRESOLVED: gcc.dg/tree-ssa/20030731-2.c scan-tree-dump-times dce2 "if " 1 UNRESOLVED: gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times dce7 "->code" 0 UNRESOLVED: gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times dce7 "if " 0 UNRESOLVED: gcc.dg/tree-ssa/20040305-1.c scan-tree-dump-times dce2 "if " 2 FAIL: gcc.dg/tree-ssa/alias-37.c scan-tree-dump dse1 "Deleted dead store" UNRESOLVED: gcc.dg/tree-ssa/cfgcleanup-1.c scan-tree-dump-times dce2 "if " 0 FAIL: gcc.dg/tree-ssa/evrp10.c scan-tree-dump-times evrp "\\\\[-128, 127\\\\]" 9 FAIL: gcc.dg/tree-ssa/evrp6.c scan-tree-dump evrp "\\\\[12, \\\\+INF" UNRESOLVED: gcc.dg/tree-ssa/pr69270-2.c scan-tree-dump-times dce2 "usecount_[0-9]+ = usecount_[0-9]+ . 1;" 0 FAIL: gcc.dg/tree-ssa/pr79697.c scan-tree-dump cddce1 "Deleting : __builtin_strdup" FAIL: gcc.dg/tree-ssa/pr79697.c scan-tree-dump cddce1 "Deleting : __builtin_strndup" FAIL: gcc.dg/tree-ssa/ssa-dse-32.c scan-tree-dump-times dse1 "Deleted dead store" 1 UNRESOLVED: gcc.dg/vect/pr26359.c -flto -ffat-lto-objects scan-tree-dump-times dce6 "Deleting : vect_" 0 UNRESOLVED: gcc.dg/vect/pr26359.c scan-tree-dump-times dce6 "Deleting : vect_" 0 UNRESOLVED: gcc.dg/vect/vect-bic-bitmask-23.c -flto -ffat-lto-objects scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1, 65535 }" UNRESOLVED: gcc.dg/vect/vect-bic-bitmask-23.c scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1, 65535 }" FAIL: g++.dg/cpp1y/new1.C -std=gnu++11 scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8 FAIL: g++.dg/cpp1y/new1.C -std=gnu++14 scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8 FAIL: g++.dg/cpp1y/new1.C -std=gnu++17 scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8 FAIL: g++.dg/cpp1y/new1.C -std=gnu++20 scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8 FAIL: g++.dg/cpp1y/new1.C -std=gnu++2b scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8 FAIL: g++.dg/cpp1y/new1.C -std=gnu++98 scan-tree-dump-times cddce1 "Deleting : _\\\\d+ = operator new" 8 FAIL: g++.dg/pr94314-2.C -std=gnu++11 scan-tree-dump-times cddce1 "Deleting : operator delete" 2 FAIL: g++.dg/pr94314-2.C -std=gnu++14 scan-tree-dump-times cddce1 "Deleting : operator delete" 2 FAIL: g++.dg/pr94314-2.C -std=gnu++17 scan-tree-dump-times cddce1 "Deleting : operator delete" 2 FAIL: g++.dg/pr94314-2.C -std=gnu++20 scan-tree-dump-times cddce1 "Deleting : operator delete" 2 FAIL: g++.dg/pr94314-2.C -std=gnu++2b scan-tree-dump-times cddce1 "Deleting : operator delete" 2 FAIL: g++.dg/pr94314-2.C -std=gnu++98 scan-tree-dump-times cddce1 "Deleting : operator delete" 2 FAIL: g++.dg/warn/Wmismatched-new-delete-5.C -std=gnu++11 (test for warnings, line 36) FAIL: g++.dg/warn/Wmismatched-new-delete-5.C -std=gnu++14 (test for warnings, line 36) FAIL: g++.dg/warn/Wmismatched-new-delete-5.C -std=gnu++17 (test for warnings, line 36) FAIL: g++.dg/warn/Wmismatched-new-delete-5.C -std=gnu++20 (test for warnings, line 36) FAIL: g++.dg/warn/Wmismatched-new-delete-5.C -std=gnu++2b (test for warnings, line 36) FAIL: g++.dg/warn/Wmismatched-new-delete-5.C -std=gnu++98 (test for warnings, line 36) The UNRESOLVED ones likely just need to replace dce7 with dce8.
So instead of an extra DCE pass one could try harder to not leave around dead code after folding, for example with the following (doesn't fix the testcase yet, all the custom folding code would need adjustments as well). But it's not 100% reliable and it comes at a cost even when folding isn't successful (while we don't update stmt operands immediately, what they point to is changed by the actual folding already). diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc index e34f0888954..133619899cd 100644 --- a/gcc/tree-ssa-forwprop.cc +++ b/gcc/tree-ssa-forwprop.cc @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "cgraph.h" #include "tree-ssa.h" #include "gimple-range.h" +#include "tree-ssa-dce.h" /* This pass propagates the RHS of assignment statements into use sites of the LHS of the assignment. It's basically a specialized @@ -3466,6 +3467,7 @@ pass_forwprop::execute (function *fun) auto_vec<gimple *, 32> to_remove; to_purge = BITMAP_ALLOC (NULL); bitmap need_ab_cleanup = BITMAP_ALLOC (NULL); + auto_bitmap simple_dce_seed; for (int i = 0; i < postorder_num; ++i) { gimple_stmt_iterator gsi; @@ -3846,8 +3848,16 @@ pass_forwprop::execute (function *fun) && gimple_call_noreturn_p (stmt)); changed = false; + auto_vec<int, 4> seeds; + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (use_p, orig_stmt, iter, SSA_OP_USE) + seeds.safe_push (SSA_NAME_VERSION (USE_FROM_PTR (use_p))); + if (fold_stmt (&gsi, fwprop_ssa_val)) { + for (int seed : seeds) + bitmap_set_bit (simple_dce_seed, seed); changed = true; stmt = gsi_stmt (gsi); /* Cleanup the CFG if we simplified a condition to @@ -4037,6 +4047,8 @@ pass_forwprop::execute (function *fun) cfg_changed |= fixup_noreturn_call (stmt); } + simple_dce_from_worklist (simple_dce_seed); + cfg_changed |= gimple_purge_all_dead_eh_edges (to_purge); cfg_changed |= gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup); BITMAP_FREE (to_purge); Another possibility is preventing FRE/PRE from picking up dead stmts, as suggested this does fix the testcase. The extra stmt removal is not strictly necessary. I'm going to test sth like this. diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc index d5b081a309f..9f77b7a5647 100644 --- a/gcc/tree-ssa-sccvn.cc +++ b/gcc/tree-ssa-sccvn.cc @@ -6683,6 +6683,12 @@ eliminate_dom_walker::eliminate_stmt (basic_block b, gimple_stmt_iterator *gsi) && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)) && is_global_var (gimple_assign_rhs1 (stmt))))) { + if (has_zero_uses (lhs)) + { + to_remove.safe_push (stmt); + return; + } + sprime = eliminate_avail (b, lhs); if (!sprime) { @@ -7200,7 +7206,8 @@ eliminate_dom_walker::eliminate_stmt (basic_block b, gimple_stmt_iterator *gsi) continue with the next stmt above and skip this. */ def_operand_p defp; FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF) - eliminate_push_avail (b, DEF_FROM_PTR (defp)); + if (! has_zero_uses (DEF_FROM_PTR (defp))) + eliminate_push_avail (b, DEF_FROM_PTR (defp)); } /* Perform elimination for the basic-block B during the domwalk. */ @@ -8048,7 +8055,8 @@ process_bb (rpo_elim &avail, basic_block bb, /* If not eliminating, make all not already available defs available. */ FOR_EACH_SSA_TREE_OPERAND (op, gsi_stmt (gsi), i, SSA_OP_DEF) - if (! avail.eliminate_avail (bb, op)) + if (! has_zero_uses (op) + && ! avail.eliminate_avail (bb, op)) avail.eliminate_push_avail (bb, op); }
Does the has_zero_uses patch work for this? As there is both: _4 = _6 u> 1.79769313486231570814527423731704356798070567525844996599e+308; _8 = ~_4; and _8 has_zero_uses while _4 has uses but all are dead. So, I assume we wouldn't try to reuse _8 but perhaps we'd reuse _4?
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:41ade3399bd1ec9927be1bb818965831232eda4b commit r13-6834-g41ade3399bd1ec9927be1bb818965831232eda4b Author: Richard Biener <rguenther@suse.de> Date: Thu Mar 23 14:52:01 2023 +0100 tree-optimization/107569 - avoid wrecking earlier folding in FRE/PRE The following avoids picking up dead code left over from folding during FRE/PRE, effectively undoing propagations. PR tree-optimization/107569 * tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): Do not push SSA names with zero uses as available leader. (process_bb): Likewise. * g++.dg/opt/pr107569.C: New testcase.
Fixed.