Since g:d846f225c25c5885250c303c8d118caa08c447ab we create an epilog loop on s390 for the following test case: /* { dg-do compile } */ /* { dg-options "-O3 -mzarch -march=z13" } */ /* { dg-require-effective-target s390_vx } */ int foo (int * restrict a, int n) { int i, result = 0; for (i = 0; i < n * 4; i++) result += a[i]; return result; } vec.c:10:17: note: epilog loop required The following check in tree-vect-loop.c:vect_need_peeling_or_partial_vectors_p() is now true: || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) < (unsigned) exact_log2 (const_vf)) We now have LOOP_VINFO_NITERS (loop_vinfo) = _15 > 0 ? (unsigned int) _15 : 1 as compared to (unsigned int) _15 before. tree_ctz() returns 0 for the conditional and 2 before which did not trigger the epilog requirement. may_be_zero is _15 > 0 so it looks to me like we rather want to check the not-may_be_zero part of niter for alignment. Not sure if this is the right/safe thing to do, though.
Created attachment 50870 [details] incomplete patch Yes, the simplest thing might be to push may_be_zero to assumptions, but that way we're going to force versioning for the loop. That said, recording the not maybe_zero_part separately and using it for this check should be easy. Also confirmed on x86_64. Incomplete patch attached - it would need auditing of all _NITER[M1] uses.
I did not get back to this until now. The patch works, of course and a testsuite run looks good so far. I assume we're too late in the cycle to still get this in, right?
(In reply to rdapp from comment #2) > I did not get back to this until now. The patch works, of course and a > testsuite run looks good so far. I assume we're too late in the cycle to > still get this in, right? Yes. That's clearly stage1 material now.
Anything that can/should be done here in case we'd still want to tackle it in this P1 cycle?
I think the issue is that we now have <bb 2> [local count: 118111600]: _15 = n_8(D) * 4; if (n_8(D) > 0) goto <bb 5>; [89.00%] else goto <bb 7>; [11.00%] while we probably had <bb 2> [local count: 118111600]: _15 = n_8(D) * 4; if (_15 > 0) goto <bb 5>; [89.00%] else goto <bb 7>; [11.00%] before the change. Loop header copying applies VN to the copied blocks: Processing block 0: BB6 Value numbering stmt = i_9 = PHI <0(2)> Setting value number of i_9 to 0 (changed) Replaced redundant PHI node defining i_9 with 0 Value numbering stmt = result_14 = PHI <0(2)> Setting value number of result_14 to 0 (changed) Replaced redundant PHI node defining result_14 with 0 Value numbering stmt = _15 = n_8(D) * 4; Setting value number of _15 to _15 (changed) Making available beyond BB6 _15 for value _15 Value numbering stmt = if (_15 > i_9) Recording on edge 6->7 _15 gt_expr 0 == true Recording on edge 6->7 _15 le_expr 0 == false Recording on edge 6->7 _15 ne_expr 0 == true Recording on edge 6->7 _15 ge_expr 0 == true Recording on edge 6->7 _15 lt_expr 0 == false Recording on edge 6->7 _15 eq_expr 0 == false marking outgoing edge 6 -> 7 executable gimple_simplified to if (n_8(D) > 0) with <bb 2> [local count: 118111600]: _15 = n_8(D) * 4; if (n_8(D) > 0) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 3> [local count: 955630225]: # i_16 = PHI <i_13(3), 0(2)> # result_17 = PHI <result_12(3), 0(2)> _1 = (long unsigned int) i_16; _2 = _1 * 4; _3 = a_11(D) + _2; _4 = *_3; result_12 = _4 + result_17; i_13 = i_16 + 1; _5 = n_8(D) * 4; if (_5 > i_13) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] so it was a single use in the compare (because CSE only later introduces more uses through DOM). The niter code then ends up with maybe-zero as _15 <= 0 and a condition of n_8(D) > 0 it tries to simplify with tree_simplify_using_condition (called from simplify_using_initial_conditions). That old machinery would be a perfect candidate to be rewritten using path ranger, but in a somewhat extended mode that can "skip" diamonds, aka, the path just contains dominators of the loop entry edge on which we want to evaluate the _15 <= 0 condition. To make the old simplification code work we can do the following: diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index 1e0f609d8b6..4ffcef4f4ff 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -2216,6 +2216,7 @@ expand_simple_operations (tree expr, tree stop, hash_map<tree, tree> &cache) case PLUS_EXPR: case MINUS_EXPR: + case MULT_EXPR: if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (expr)) && TYPE_OVERFLOW_TRAPS (TREE_TYPE (expr))) return expr; but that can of course have unintended side-effects elsewhere (this function is also used by IVOPTs).
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:19295e8607da2f743368fe6f5708146616aafa91 commit r13-3476-g19295e8607da2f743368fe6f5708146616aafa91 Author: Richard Biener <rguenther@suse.de> Date: Mon Oct 24 09:51:32 2022 +0200 tree-optimization/100756 - niter analysis and folding niter analysis, specifically the part trying to simplify the computed maybe_zero condition against the loop header copying condition, is confused by us now simplifying _15 = n_8(D) * 4; if (_15 > 0) to _15 = n_8(D) * 4; if (n_8(D) > 0) which is perfectly sound at the point we do this transform. One solution might be to involve ranger in this simplification, another is to be more aggressive when expanding expressions - the condition we try to simplify is _15 > 0, so all we need is expanding that to n_8(D) * 4 > 0. The following does just that. PR tree-optimization/100756 * tree-ssa-loop-niter.cc (expand_simple_operations): Also expand multiplications by invariants. * gcc.dg/vect/pr100756.c: New testcase.
Fixed on trunk - lets see if there's any fallout.
For completeness: haven't observed any fallout on s390 since and the regression is fixed.
GCC 12.3 is being released, retargeting bugs to GCC 12.4.
GCC 12.4 is being released, retargeting bugs to GCC 12.5.
Just noticed this is still open due to the retargeting message. IMHO this can be closed. I'm pretty sure I erroneously used the GCC 12 target when opening the bug when it should have been trunk/GCC 13. I suppose we're not expecting a backport for GCC 12 anyway and Stefan confirmed that the bug is fixed since. Therefore I'd suggest to close it.
Fixed then.