[PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
Richard Biener
rguenther@suse.de
Wed Dec 6 09:33:06 GMT 2023
On Wed, 6 Dec 2023, Tamar Christina wrote:
> > > > is the exit edge you are looking for without iterating over all loop exits.
> > > >
> > > > > + gimple *tmp_vec_stmt = vec_stmt;
> > > > > + tree tmp_vec_lhs = vec_lhs;
> > > > > + tree tmp_bitstart = bitstart;
> > > > > + /* For early exit where the exit is not in the BB that leads
> > > > > + to the latch then we're restarting the iteration in the
> > > > > + scalar loop. So get the first live value. */
> > > > > + restart_loop = restart_loop || exit_e != main_e;
> > > > > + if (restart_loop)
> > > > > + {
> > > > > + tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > > > + tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > > > > + tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> > > >
> > > > Hmm, that gets you the value after the first iteration, not the one before which
> > > > would be the last value of the preceeding vector iteration?
> > > > (but we don't keep those, we'd need a PHI)
> > >
> > > I don't fully follow. The comment on top of this hunk under if (loop_vinfo) states
> > > that lhs should be pointing to a PHI.
> > >
> > > When I inspect the statement I see
> > >
> > > i_14 = PHI <i_11(6), 0(14)>
> > >
> > > so i_14 is the value at the start of the current iteration. If we're coming from the
> > > header 0, otherwise i_11 which is the value of the previous iteration?
> > >
> > > The peeling code explicitly leaves i_14 in the merge block and not i_11 for this
> > exact reason.
> > > So I'm confused, my understanding is that we're already *at* the right PHI.
> > >
> > > Is it perhaps that you thought we put i_11 here for the early exits? In which case
> > > Yes I'd agree that that would be wrong, and there we would have had to look at
> > > The defs, but i_11 is the def.
> > >
> > > I already kept this in mind and leveraged peeling to make this part easier.
> > > i_11 is used in the main exit and i_14 in the early one.
> >
> > I think the important detail is that this code is only executed for
> > vect_induction_defs which are indeed PHIs and so we're sure the
> > value live is before any modification so fine to feed as initial
> > value for the PHI in the epilog.
> >
> > Maybe we can assert the def type here?
>
> We can't assert because until cfg cleanup the dead value is still seen and still
> vectorized. That said I've added a guard here. We vectorize the non-induction
> value as normal now and if it's ever used it'll fail.
>
> >
> > > >
> > > > Why again do we need (non-induction) live values from the vector loop to the
> > > > epilogue loop again?
> > >
> > > They can appear as the result value of the main exit.
> > >
> > > e.g. in testcase (vect-early-break_17.c)
> > >
> > > #define N 1024
> > > unsigned vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(unsigned x)
> > > {
> > > unsigned ret = 0;
> > > for (int i = 0; i < N; i++)
> > > {
> > > vect_b[i] = x + i;
> > > if (vect_a[i] > x)
> > > return vect_a[i];
> > > vect_a[i] = x;
> > > ret = vect_a[i] + vect_b[i];
> > > }
> > > return ret;
> > > }
> > >
> > > The only situation they can appear in the as an early-break is when
> > > we have a case where main exit != latch connected exit.
> > >
> > > However in these cases they are unused, and only there because
> > > normally you would have exited (i.e. there was a return) but the
> > > vector loop needs to start over so we ignore it.
> > >
> > > These happen in testcase vect-early-break_74.c and
> > > vect-early-break_78.c
> >
> > Hmm, so in that case their value is incorrect (but doesn't matter,
> > we ignore it)?
> >
>
> Correct, they're placed there due to exit redirection, but in these inverted
> testcases where we've peeled the vector iteration you can't ever skip the
> epilogue. So they are guaranteed not to be used.
>
> > > > > + gimple_stmt_iterator exit_gsi;
> > > > > + tree new_tree
> > > > > + = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > > > > + exit_e, vectype, ncopies,
> > > > > + slp_node, bitsize,
> > > > > + tmp_bitstart, tmp_vec_lhs,
> > > > > + lhs_type, restart_loop,
> > > > > + &exit_gsi);
> > > > > +
> > > > > + /* Use the empty block on the exit to materialize the new
> > > > stmts
> > > > > + so we can use update the PHI here. */
> > > > > + if (gimple_phi_num_args (use_stmt) == 1)
> > > > > + {
> > > > > + auto gsi = gsi_for_stmt (use_stmt);
> > > > > + remove_phi_node (&gsi, false);
> > > > > + tree lhs_phi = gimple_phi_result (use_stmt);
> > > > > + gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > > > > + gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > > > > + }
> > > > > + else
> > > > > + SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
> > > >
> > > > if the else case works, why not use it always?
> > >
> > > Because it doesn't work for main exit. The early exit have a intermediate block
> > > that is used to generate the statements on, so for them we are fine updating the
> > > use in place.
> > >
> > > The main exits don't. and so the existing trick the vectorizer uses is to materialize
> > > the statements in the same block and then dissolves the phi node. However you
> > > can't do that for the early exit because the phi node isn't singular.
> >
> > But if the PHI has a single arg you can replace that? By making a
> > copy stmt from it don't you break LC SSA?
> >
>
> Yeah, what the existing code is sneakily doing is this:
>
> It has to vectorize
>
> x = PHI <y>
> y gets vectorized a z but
>
> x = PHI <z>
> z = ...
>
> would be invalid, so what it does, since it doesn't have a predecessor note to place stuff in,
> it'll do
>
> z = ...
> x = z
>
> and removed the PHI. The PHI was only placed there for vectorization so it's not needed
> after this point. It's also for this reason why the code passes around a gimpe_seq since
> it needs to make sure it gets the order right when inserting statements.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
OK.
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vectorizable_live_operation,
> vectorizable_live_operation_1): Support early exits.
> (can_vectorize_live_stmts): Call vectorizable_live_operation for non-live
> inductions or reductions.
> (find_connected_edge, vect_get_vect_def): New.
> (vect_create_epilog_for_reduction): Support reductions in early break.
> * tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> (vect_stmt_relevant_p): Mark all inductions when early break as being
> live.
> * tree-vectorizer.h (perm_mask_for_reverse): Expose.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
> bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
> bb_before_epilog = loop_preheader_edge (epilog)->src;
> }
> +
> /* If loop is peeled for non-zero constant times, now niters refers to
> orig_niters - prolog_peeling, it won't overflow even the orig_niters
> overflows. */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..2f922b42f6d567dfd5da9b276b1c9d37bc681876 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
> return new_temp;
> }
>
> +/* Retrieves the definining statement to be used for a reduction.
> + For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
> + the reduction definitions. */
> +
> +tree
> +vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
> + slp_instance slp_node_instance, bool main_exit_p, unsigned i,
> + vec <gimple *> &vec_stmts)
> +{
> + tree def;
> +
> + if (slp_node)
> + {
> + if (!main_exit_p)
> + slp_node = slp_node_instance->reduc_phis;
> + def = vect_get_slp_vect_def (slp_node, i);
> + }
> + else
> + {
> + if (!main_exit_p)
> + reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
> + vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
> + def = gimple_get_lhs (vec_stmts[0]);
> + }
> +
> + return def;
> +}
> +
> /* Function vect_create_epilog_for_reduction
>
> Create code at the loop-epilog to finalize the result of a reduction
> @@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
> SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
> REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
> (counting from 0)
> + LOOP_EXIT is the edge to update in the merge block. In the case of a single
> + exit this edge is always the main loop exit.
>
> This function:
> 1. Completes the reduction def-use cycles.
> @@ -5882,7 +5912,8 @@ static void
> vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> stmt_vec_info stmt_info,
> slp_tree slp_node,
> - slp_instance slp_node_instance)
> + slp_instance slp_node_instance,
> + edge loop_exit)
> {
> stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
> gcc_assert (reduc_info->is_reduc_info);
> @@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> loop-closed PHI of the inner loop which we remember as
> def for the reduction PHI generation. */
> bool double_reduc = false;
> + bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
> stmt_vec_info rdef_info = stmt_info;
> if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
> {
> @@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> /* Create an induction variable. */
> gimple_stmt_iterator incr_gsi;
> bool insert_after;
> - standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> + vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
> create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
> insert_after, &indx_before_incr, &indx_after_incr);
>
> @@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> Store them in NEW_PHIS. */
> if (double_reduc)
> loop = outer_loop;
> - exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> + /* We need to reduce values in all exits. */
> + exit_bb = loop_exit->dest;
> exit_gsi = gsi_after_labels (exit_bb);
> reduc_inputs.create (slp_node ? vec_num : ncopies);
> + vec <gimple *> vec_stmts;
> for (unsigned i = 0; i < vec_num; i++)
> {
> gimple_seq stmts = NULL;
> - if (slp_node)
> - def = vect_get_slp_vect_def (slp_node, i);
> - else
> - def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
> + def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
> + main_exit_p, i, vec_stmts);
> for (j = 0; j < ncopies; j++)
> {
> tree new_def = copy_ssa_name (def);
> phi = create_phi_node (new_def, exit_bb);
> if (j)
> - def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> - SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
> + def = gimple_get_lhs (vec_stmts[j]);
> + SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
> new_def = gimple_convert (&stmts, vectype, new_def);
> reduc_inputs.quick_push (new_def);
> }
> @@ -10481,17 +10513,18 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> return true;
> }
>
> -
> /* Function vectorizable_live_operation_1.
> +
> helper function for vectorizable_live_operation. */
> +
> tree
> vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> - stmt_vec_info stmt_info, edge exit_e,
> + stmt_vec_info stmt_info, basic_block exit_bb,
> tree vectype, int ncopies, slp_tree slp_node,
> tree bitsize, tree bitstart, tree vec_lhs,
> - tree lhs_type, gimple_stmt_iterator *exit_gsi)
> + tree lhs_type, bool restart_loop,
> + gimple_stmt_iterator *exit_gsi)
> {
> - basic_block exit_bb = exit_e->dest;
> gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>
> tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> @@ -10504,7 +10537,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> {
> /* Emit:
> +
> SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> +
> where VEC_LHS is the vectorized live-out result and MASK is
> the loop mask for the final iteration. */
> gcc_assert (ncopies == 1 && !slp_node);
> @@ -10513,15 +10548,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> tree len = vect_get_loop_len (loop_vinfo, &gsi,
> &LOOP_VINFO_LENS (loop_vinfo),
> 1, vectype, 0, 0);
> +
> /* BIAS - 1. */
> signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> tree bias_minus_one
> = int_const_binop (MINUS_EXPR,
> build_int_cst (TREE_TYPE (len), biasval),
> build_one_cst (TREE_TYPE (len)));
> +
> /* LAST_INDEX = LEN + (BIAS - 1). */
> tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> len, bias_minus_one);
> +
> /* This needs to implement extraction of the first index, but not sure
> how the LEN stuff works. At the moment we shouldn't get here since
> there's no LEN support for early breaks. But guard this so there's
> @@ -10532,13 +10570,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> tree scalar_res
> = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> vec_lhs_phi, last_index);
> +
> /* Convert the extracted vector element to the scalar type. */
> new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> }
> else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> {
> /* Emit:
> +
> SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> +
> where VEC_LHS is the vectorized live-out result and MASK is
> the loop mask for the final iteration. */
> gcc_assert (!slp_node);
> @@ -10548,10 +10589,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
> &LOOP_VINFO_MASKS (loop_vinfo),
> 1, vectype, 0);
> + tree scalar_res;
> +
> + /* For an inverted control flow with early breaks we want EXTRACT_FIRST
> + instead of EXTRACT_LAST. Emulate by reversing the vector and mask. */
> + if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> + /* First create the permuted mask. */
> + tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> + tree perm_dest = copy_ssa_name (mask);
> + gimple *perm_stmt
> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> + mask, perm_mask);
> + vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> + &gsi);
> + mask = perm_dest;
> +
> + /* Then permute the vector contents. */
> + tree perm_elem = perm_mask_for_reverse (vectype);
> + perm_dest = copy_ssa_name (vec_lhs_phi);
> + perm_stmt
> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> + vec_lhs_phi, perm_elem);
> + vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> + &gsi);
> + vec_lhs_phi = perm_dest;
> + }
>
> gimple_seq_add_seq (&stmts, tem);
> - tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> - mask, vec_lhs_phi);
> +
> + scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> + mask, vec_lhs_phi);
> +
> /* Convert the extracted vector element to the scalar type. */
> new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> }
> @@ -10564,12 +10633,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
> &stmts, true, NULL_TREE);
> }
> +
> *exit_gsi = gsi_after_labels (exit_bb);
> if (stmts)
> gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> +
> return new_tree;
> }
>
> +/* Find the edge that's the final one in the path from SRC to DEST and
> + return it. This edge must exist in at most one forwarder edge between. */
> +
> +static edge
> +find_connected_edge (edge src, basic_block dest)
> +{
> + if (src->dest == dest)
> + return src;
> +
> + return find_edge (src->dest, dest);
> +}
> +
> /* Function vectorizable_live_operation.
>
> STMT_INFO computes a value that is used outside the loop. Check if
> @@ -10590,11 +10673,13 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> int ncopies;
> gimple *use_stmt;
> + use_operand_p use_p;
> auto_vec<tree> vec_oprnds;
> int vec_entry = 0;
> poly_uint64 vec_index = 0;
>
> - gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
> + gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
> + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>
> /* If a stmt of a reduction is live, vectorize it via
> vect_create_epilog_for_reduction. vectorizable_reduction assessed
> @@ -10619,8 +10704,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
> if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
> || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
> return true;
> +
> vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
> - slp_node_instance);
> + slp_node_instance,
> + LOOP_VINFO_IV_EXIT (loop_vinfo));
> +
> + /* If early break we only have to materialize the reduction on the merge
> + block, but we have to find an alternate exit first. */
> + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> + for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
> + if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
> + {
> + vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
> + slp_node, slp_node_instance,
> + exit);
> + break;
> + }
> + }
> +
> return true;
> }
>
> @@ -10772,37 +10874,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
> lhs' = new_tree; */
>
> class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> - basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> - gcc_assert (single_pred_p (exit_bb));
> -
> - tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> - gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> - SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
> -
> - gimple_stmt_iterator exit_gsi;
> - tree new_tree
> - = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> - LOOP_VINFO_IV_EXIT (loop_vinfo),
> - vectype, ncopies, slp_node, bitsize,
> - bitstart, vec_lhs, lhs_type,
> - &exit_gsi);
> -
> - /* Remove existing phis that copy from lhs and create copies
> - from new_tree. */
> - gimple_stmt_iterator gsi;
> - for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> - {
> - gimple *phi = gsi_stmt (gsi);
> - if ((gimple_phi_arg_def (phi, 0) == lhs))
> + /* Check if we have a loop where the chosen exit is not the main exit,
> + in these cases for an early break we restart the iteration the vector code
> + did. For the live values we want the value at the start of the iteration
> + rather than at the end. */
> + edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> + bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
> + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> + if (!is_gimple_debug (use_stmt)
> + && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> {
> - remove_phi_node (&gsi, false);
> - tree lhs_phi = gimple_phi_result (phi);
> - gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> - gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> - }
> - else
> - gsi_next (&gsi);
> - }
> + edge e = gimple_phi_arg_edge (as_a <gphi *> (use_stmt),
> + phi_arg_index_from_use (use_p));
> + bool main_exit_edge = e == main_e
> + || find_connected_edge (main_e, e->src);
> +
> + /* Early exits have an merge block, we want the merge block itself
> + so use ->src. For main exit the merge block is the
> + destination. */
> + basic_block dest = main_exit_edge ? main_e->dest : e->src;
> + gimple *tmp_vec_stmt = vec_stmt;
> + tree tmp_vec_lhs = vec_lhs;
> + tree tmp_bitstart = bitstart;
> +
> + /* For early exit where the exit is not in the BB that leads
> + to the latch then we're restarting the iteration in the
> + scalar loop. So get the first live value. */
> + restart_loop = restart_loop || !main_exit_edge;
> + if (restart_loop
> + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> + {
> + tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> + tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> + tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> + }
> +
> + gimple_stmt_iterator exit_gsi;
> + tree new_tree
> + = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> + dest, vectype, ncopies,
> + slp_node, bitsize,
> + tmp_bitstart, tmp_vec_lhs,
> + lhs_type, restart_loop,
> + &exit_gsi);
> +
> + if (gimple_phi_num_args (use_stmt) == 1)
> + {
> + auto gsi = gsi_for_stmt (use_stmt);
> + remove_phi_node (&gsi, false);
> + tree lhs_phi = gimple_phi_result (use_stmt);
> + gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> + gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> + }
> + else
> + SET_PHI_ARG_DEF (use_stmt, e->dest_idx, new_tree);
> + }
>
> /* There a no further out-of-loop uses of lhs by LC-SSA construction. */
> FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a09c0a804a38e17ef32b6ce13b98b077459fc7..582c5e678fad802d6e76300fe3c939b9f2978f17 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
> - it has uses outside the loop.
> - it has vdefs (it alters memory).
> - control stmts in the loop (except for the exit condition).
> + - it is an induction and we have multiple exits.
>
> CHECKME: what other side effects would the vectorizer allow? */
>
> @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
> }
> }
>
> + /* Check if it's an induction and multiple exits. In this case there will be
> + a usage later on after peeling which is needed for the alternate exit. */
> + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "vec_stmt_relevant_p: induction forced for "
> + "early break.\n");
> + *live_p = true;
> +
> + }
> +
> if (*live_p && *relevant == vect_unused_in_scope
> && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
> {
> @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
> /* If the target supports a permute mask that reverses the elements in
> a vector of type VECTYPE, return that mask, otherwise return null. */
>
> -static tree
> +tree
> perm_mask_for_reverse (tree vectype)
> {
> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> @@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
> bool vec_stmt_p,
> stmt_vector_for_cost *cost_vec)
> {
> + loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> if (slp_node)
> {
> stmt_vec_info slp_stmt_info;
> unsigned int i;
> FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
> {
> - if (STMT_VINFO_LIVE_P (slp_stmt_info)
> + if ((STMT_VINFO_LIVE_P (slp_stmt_info)
> + || (loop_vinfo
> + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && STMT_VINFO_DEF_TYPE (slp_stmt_info)
> + == vect_induction_def))
> && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
> slp_node_instance, i,
> vec_stmt_p, cost_vec))
> return false;
> }
> }
> - else if (STMT_VINFO_LIVE_P (stmt_info)
> + else if ((STMT_VINFO_LIVE_P (stmt_info)
> + || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
> && !vectorizable_live_operation (vinfo, stmt_info,
> slp_node, slp_node_instance, -1,
> vec_stmt_p, cost_vec))
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 15c7f75b1f3c61ab469f1b1970dae9c6ac1a9f55..974f617d54a14c903894dd20d60098ca259c96f2 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
> enum vect_def_type *,
> tree *, stmt_vec_info * = NULL);
> extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> +extern tree perm_mask_for_reverse (tree);
> extern bool supportable_widening_operation (vec_info*, code_helper,
> stmt_vec_info, tree, tree,
> code_helper*, code_helper*,
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
More information about the Gcc-patches
mailing list