cat test.c extern short a[9000]; int foo() { int b; int i; b = a[0]; for(i = 1; i < 9000; i ++) { if(a[i] < b) { b = a[i]; } } return b; } gcc8 successfully vectorized the loop with option: -Ofast -march=skylake-avx512, but gcc9/10/trunk failed. test.c:9:16: missed: couldn't vectorize loop test.c:3:5: missed: not vectorized: relevant phi not supported: b_14 = PHI <_9(5), b_8(2)> test.c:3:5: note: vectorized 0 loops in function. test.c:14:10: note: ***** Analysis failed with vector mode V16HI test.c:14:10: note: ***** Skipping vector mode V32QI, which would repeat the analysis for V16HI It seems vect_recog_widen_op_pattern failed to handle this???
With a bit adjustment of testcase, vectorized. @@ -2,7 +2,7 @@ extern short a[9000]; int foo() { - int b; + short b; int i; b = a[0];
Probably because t.c:9:16: note: vect_recog_over_widening_pattern: detected: _9 = MIN_EXPR <_3, b_14>; t.c:9:16: note: demoting int to signed short t.c:9:16: note: created pattern stmt: patt_11 = MIN_EXPR <_2, patt_12>; t.c:9:16: note: over_widening pattern recognized: patt_6 = (int) patt_11; t.c:9:16: note: extra pattern stmt: patt_12 = (signed short) b_14; t.c:9:16: note: extra pattern stmt: patt_11 = MIN_EXPR <_2, patt_12>; which makes the reduction unhandled (we only support sign changing conversions, not truncations). We can restrict the over-widen pattern to not apply for reductions or see to use range-info (like pattern recog does) in the reduction handling somehow. I don't see a obvious place to add a reduction def check to vect_recog_over_widening_pattern, maybe Richard does.
Regressed with r9-1590-g370c2ebe8fa20e0812cd2d533d4ed38ee2d37c85
Alternatively, couldn't we support truncation in the reductions if SSA_NAME_RANGE_INFO suggests that the values are always in the narrower range?
On Thu, 28 Jan 2021, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98848 > > --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Alternatively, couldn't we support truncation in the reductions if > SSA_NAME_RANGE_INFO suggests that the values are always in the narrower range? Yes, we probably could. But note that changes in reduction support are quite fragile and we're currently just set up for sign changes (via emitting V_C_E) but not required promotions/demotions so there would be a lot of changes needed. It's also not clear promoting/demoting the reduction IV all the time is doing any good (unless you suggest that we'd magically undo the pattern by promoting the non-reduction OP instead - but that would require even more changes). So I guess the better approach might be to somehow allow late "undoing" of pattern recog (but it's a bit complicated because of how that influences VF compute and also relevant/liveness compute).
So what about punting if the lhs of the possible over_widen pattern is a PHI on loop header? --- gcc/tree-vect-patterns.c.jj 2021-01-04 10:25:38.650235896 +0100 +++ gcc/tree-vect-patterns.c 2021-02-01 10:13:51.755008757 +0100 @@ -1579,6 +1579,20 @@ vect_recog_over_widening_pattern (vec_in tree type = TREE_TYPE (lhs); tree_code code = gimple_assign_rhs_code (last_stmt); + /* Punt if lhs might be used in a reduction. */ + if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo)) + { + use_operand_p use_p; + imm_use_iterator iter; + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) + { + gimple *use_stmt = USE_STMT (use_p); + if (gimple_code (use_stmt) == GIMPLE_PHI + && gimple_bb (use_stmt) == LOOP_VINFO_LOOP (loop_vinfo)->header) + return NULL; + } + } + /* Keep the first operand of a COND_EXPR as-is: only the other two operands are interesting. */ unsigned int first_op = (code == COND_EXPR ? 2 : 1); doesn't regress any vect.exp=*over-widen* tests and let's this testcase be vectorized.
On Mon, 1 Feb 2021, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98848 > > --- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > So what about punting if the lhs of the possible over_widen pattern is a PHI on > loop header? That would be STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def, elsewhere we now use vect_reassociating_reduction_p, not sure if that woudl apply here, too. > --- gcc/tree-vect-patterns.c.jj 2021-01-04 10:25:38.650235896 +0100 > +++ gcc/tree-vect-patterns.c 2021-02-01 10:13:51.755008757 +0100 > @@ -1579,6 +1579,20 @@ vect_recog_over_widening_pattern (vec_in > tree type = TREE_TYPE (lhs); > tree_code code = gimple_assign_rhs_code (last_stmt); > > + /* Punt if lhs might be used in a reduction. */ > + if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo)) > + { > + use_operand_p use_p; > + imm_use_iterator iter; > + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) > + { > + gimple *use_stmt = USE_STMT (use_p); > + if (gimple_code (use_stmt) == GIMPLE_PHI > + && gimple_bb (use_stmt) == LOOP_VINFO_LOOP (loop_vinfo)->header) > + return NULL; > + } > + } > + > /* Keep the first operand of a COND_EXPR as-is: only the other two > operands are interesting. */ > unsigned int first_op = (code == COND_EXPR ? 2 : 1); > > doesn't regress any vect.exp=*over-widen* tests and let's this testcase be > vectorized. > >
Created attachment 50102 [details] gcc11-pr98848.patch This works too. I don't see how we could use vect_reassociating_reduction_p, that for one seems to be used in the positive checks (only recognize if reduction) and more importantly, makes heavy assumptions on what the assignment must be (while for over-widen it could be e.g. a COND_EXPR).
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:1592b74350a0311e4c95a0192ea9c943847e7bc0 commit r11-7034-g1592b74350a0311e4c95a0192ea9c943847e7bc0 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Feb 2 10:32:23 2021 +0100 tree-vect-patterns: Don't create over widening patterns for stmts used in reductions [PR98848] As discussed in the PR, the reduction code isn't able to cope with type promotions/demotions in the reduction computation, so if we recognize an over-widening pattern that has vect_reduction_def type, we most likely make it non-vectorizable. 2021-02-02 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/98848 * tree-vect-patterns.c (vect_recog_over_widening_pattern): Punt if STMT_VINFO_DEF_TYPE (last_stmt_info) is vect_reduction_def. * gcc.dg/vect/pr98848.c: New test. * gcc.dg/vect/pr92205.c: Remove xfail.
Fixed on the trunk. Unsure if we want to backport this.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Fixed in GCC 11.