This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining


2016-06-15 14:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, May 19, 2016 at 9:44 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> Hi,
>>
>> This patch introduces support for loop epilogue combining.  This includes
>> support in cost estimation and all required changes required to mask
>> vectorized loop.
>
> I wonder why you compute a minimum number of iterations to make masking
> of the vectorized body profitable rather than a maximum number of iterations.
>
> I'd say masking the vectorized loop is profitable if niter/vf *
> masking-overhead < epilogue-cost.
> Masking the epilogue is profitable if vectorizing the epilogue with
> masking is profitable.
>
> Am I missing something?

We don't have two versions of vectorized loop.  The choice is between vector
and scalar loop and in this case minimum number of iterations is what we need.
Generating two vectorized loop versions would be something new to vectorizer.

Thanks,
Ilya

>
> Thanks,
> Richard.
>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>>
>>         * dbgcnt.def (vect_tail_combine): New.
>>         * params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
>>         * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
>>         * tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
>>         epilogue combined with loop body.
>>         (vect_do_peeling_for_loop_bound): Likewise.
>>         * tree-vect-loop.c Include alias.h and dbgcnt.h.
>>         (vect_estimate_min_profitable_iters): Add ret_min_profitable_combine_niters
>>         arg, compute number of iterations for which loop epilogue combining is
>>         profitable.
>>         (vect_generate_tmps_on_preheader): Support combined apilogue.
>>         (vect_gen_ivs_for_masking): New.
>>         (vect_get_mask_index_for_elems): New.
>>         (vect_get_mask_index_for_type): New.
>>         (vect_gen_loop_masks): New.
>>         (vect_mask_reduction_stmt): New.
>>         (vect_mask_mask_load_store_stmt): New.
>>         (vect_mask_load_store_stmt): New.
>>         (vect_combine_loop_epilogue): New.
>>         (vect_transform_loop): Support combined apilogue.
>>
>>
>> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
>> index 78ddcc2..73c2966 100644
>> --- a/gcc/dbgcnt.def
>> +++ b/gcc/dbgcnt.def
>> @@ -192,4 +192,5 @@ DEBUG_COUNTER (treepre_insert)
>>  DEBUG_COUNTER (tree_sra)
>>  DEBUG_COUNTER (vect_loop)
>>  DEBUG_COUNTER (vect_slp)
>> +DEBUG_COUNTER (vect_tail_combine)
>>  DEBUG_COUNTER (dom_unreachable_edges)
>> diff --git a/gcc/params.def b/gcc/params.def
>> index 62a1e40..98d6c5a 100644
>> --- a/gcc/params.def
>> +++ b/gcc/params.def
>> @@ -1220,6 +1220,11 @@ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
>>           "Maximum number of may-defs visited when devirtualizing "
>>           "speculatively", 50, 0, 0)
>>
>> +DEFPARAM (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD,
>> +         "vect-cost-increase-combine-threshold",
>> +         "Cost increase threshold to mask main loop for epilogue.",
>> +         10, 0, 300)
>> +
>>  /*
>>
>>  Local variables:
>> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> index f275933..c5bdeb9 100644
>> --- a/gcc/tree-vect-data-refs.c
>> +++ b/gcc/tree-vect-data-refs.c
>> @@ -4000,6 +4000,9 @@ vect_get_new_ssa_name (tree type, enum vect_var_kind var_kind, const char *name)
>>    case vect_scalar_var:
>>      prefix = "stmp";
>>      break;
>> +  case vect_mask_var:
>> +    prefix = "mask";
>> +    break;
>>    case vect_pointer_var:
>>      prefix = "vectp";
>>      break;
>> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
>> index fab5879..b3c0668 100644
>> --- a/gcc/tree-vect-loop-manip.c
>> +++ b/gcc/tree-vect-loop-manip.c
>> @@ -1195,6 +1195,7 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
>>    int first_guard_probability = 2 * REG_BR_PROB_BASE / 3;
>>    int second_guard_probability = 2 * REG_BR_PROB_BASE / 3;
>>    int probability_of_second_loop;
>> +  bool skip_second_after_first = false;
>>
>>    if (!slpeel_can_duplicate_loop_p (loop, e))
>>      return NULL;
>> @@ -1393,7 +1394,11 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
>>      {
>>        loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
>>        tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo);
>> -      unsigned limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
>> +      unsigned limit = 0;
>> +      if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +       skip_second_after_first = true;
>> +      else
>> +       limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
>>        if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>>         limit = limit + 1;
>>        if (check_profitability
>> @@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
>>    bb_between_loops = new_exit_bb;
>>    bb_after_second_loop = split_edge (single_exit (second_loop));
>>
>> -  pre_condition =
>> -       fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
>> -  skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
>> -                                  bb_after_second_loop, bb_before_first_loop,
>> -                                 inverse_probability (second_guard_probability));
>> +  if (skip_second_after_first)
>> +    /* We can just redirect edge from bb_between_loops to
>> +       bb_after_second_loop but we have many code assuming
>> +       we have a guard after the first loop.  So just make
>> +       always taken condtion.  */
>> +    pre_condition = fold_build2 (EQ_EXPR, boolean_type_node, integer_zero_node,
>> +                                integer_zero_node);
>> +  else
>> +    pre_condition =
>> +      fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
>> +  skip_e
>> +    = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
>> +                            bb_after_second_loop, bb_before_first_loop,
>> +                            inverse_probability (second_guard_probability));
>>    scale_loop_profile (second_loop, probability_of_second_loop, bound2);
>>    slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
>>                                       second_loop == new_loop, &new_exit_bb);
>> @@ -1758,8 +1772,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
>>    basic_block preheader;
>>    int loop_num;
>>    int max_iter;
>> +  int bound2;
>>    tree cond_expr = NULL_TREE;
>>    gimple_seq cond_expr_stmt_list = NULL;
>> +  bool combine = LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo);
>>
>>    if (dump_enabled_p ())
>>      dump_printf_loc (MSG_NOTE, vect_location,
>> @@ -1769,12 +1785,13 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
>>
>>    loop_num  = loop->num;
>>
>> +  bound2 = combine ? th : LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>    new_loop
>>      = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
>>                                      &ratio_mult_vf_name, ni_name, false,
>>                                      th, check_profitability,
>>                                      cond_expr, cond_expr_stmt_list,
>> -                                    0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
>> +                                    0, bound2);
>>    gcc_assert (new_loop);
>>    gcc_assert (loop_num == loop->num);
>>    slpeel_checking_verify_cfg_after_peeling (loop, new_loop);
>> @@ -1803,7 +1820,11 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
>>    max_iter = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>>               ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) * 2
>>               : LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 2;
>> -  if (check_profitability)
>> +  /* When epilogue is combined only profitability
>> +     treshold matters.  */
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +    max_iter = (int) th - 1;
>> +  else if (check_profitability)
>>      max_iter = MAX (max_iter, (int) th - 1);
>>    record_niter_bound (new_loop, max_iter, false, true);
>>    dump_printf (MSG_NOTE,
>> @@ -2036,7 +2057,8 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo, tree ni_name,
>>                                    bound, 0);
>>
>>    gcc_assert (new_loop);
>> -  slpeel_checking_verify_cfg_after_peeling (new_loop, loop);
>> +  if (!LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +    slpeel_checking_verify_cfg_after_peeling (new_loop, loop);
>>    /* For vectorization factor N, we need to copy at most N-1 values
>>       for alignment and this means N-2 loopback edge executions.  */
>>    max_iter = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 2;
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 31360d3..1a80c42 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -50,6 +50,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "gimple-fold.h"
>>  #include "cgraph.h"
>>  #include "tree-if-conv.h"
>> +#include "alias.h"
>> +#include "dbgcnt.h"
>>
>>  /* Loop Vectorization Pass.
>>
>> @@ -149,7 +151,8 @@ along with GCC; see the file COPYING3.  If not see
>>     http://gcc.gnu.org/projects/tree-ssa/vectorization.html
>>  */
>>
>> -static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *);
>> +static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *,
>> +                                               int *);
>>
>>  /* Function vect_determine_vectorization_factor
>>
>> @@ -2304,8 +2307,10 @@ start_over:
>>
>>    /* Analyze cost.  Decide if worth while to vectorize.  */
>>    int min_profitable_estimate, min_profitable_iters;
>> +  int min_profitable_combine_iters;
>>    vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters,
>> -                                     &min_profitable_estimate);
>> +                                     &min_profitable_estimate,
>> +                                     &min_profitable_combine_iters);
>>
>>    if (min_profitable_iters < 0)
>>      {
>> @@ -2412,6 +2417,52 @@ start_over:
>>    gcc_assert (vectorization_factor
>>               == (unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo));
>>
>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>> +    {
>> +      LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>> +      LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>> +    }
>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>> +          && min_profitable_combine_iters >= 0)
>> +    {
>> +      if (((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>> +           && (LOOP_VINFO_INT_NITERS (loop_vinfo)
>> +               >= (unsigned) min_profitable_combine_iters))
>> +          || estimated_niter == -1
>> +          || estimated_niter >= min_profitable_combine_iters)
>> +         && dbg_cnt (vect_tail_combine))
>> +       {
>> +         LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>> +         LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = true;
>> +
>> +         dump_printf_loc (MSG_NOTE, vect_location,
>> +                          "Decided to combine loop with its epilogue.\n");
>> +
>> +         /* We need to adjust profitability check if combine
>> +            epilogue considering additional vector iteration
>> +            and profitable combine iterations.  */
>> +         if ((int)(min_profitable_combine_iters + vectorization_factor)
>> +             > min_scalar_loop_bound)
>> +           {
>> +             LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
>> +               = (unsigned) min_profitable_combine_iters;
>> +             if (dump_enabled_p ())
>> +               dump_printf_loc (MSG_NOTE, vect_location,
>> +                                "Updated runtime profitability treshold: %d\n",
>> +                                min_profitable_combine_iters);
>> +
>> +           }
>> +       }
>> +      else
>> +       {
>> +         if (!LOOP_VINFO_NEED_MASKING (loop_vinfo) && dump_enabled_p ())
>> +           dump_printf_loc (MSG_NOTE, vect_location,
>> +                            "Not combined loop with epilogue: iterations "
>> +                            "count is too low (threshold is %d).\n",
>> +                            min_profitable_combine_iters);
>> +       }
>> +    }
>> +
>>    /* Ok to vectorize!  */
>>    return true;
>>
>> @@ -3381,7 +3432,8 @@ vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
>>  static void
>>  vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
>>                                     int *ret_min_profitable_niters,
>> -                                   int *ret_min_profitable_estimate)
>> +                                   int *ret_min_profitable_estimate,
>> +                                   int *ret_min_profitable_combine_niters)
>>  {
>>    int min_profitable_iters;
>>    int min_profitable_estimate;
>> @@ -3625,6 +3677,10 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
>>                     vec_prologue_cost);
>>        dump_printf (MSG_NOTE, "  Vector epilogue cost: %d\n",
>>                     vec_epilogue_cost);
>> +      dump_printf (MSG_NOTE, "  Masking prologue cost: %d\n",
>> +                   masking_prologue_cost);
>> +      dump_printf (MSG_NOTE, "  Masking inside cost: %d\n",
>> +                   masking_inside_cost);
>>        dump_printf (MSG_NOTE, "  Scalar iteration cost: %d\n",
>>                     scalar_single_iter_cost);
>>        dump_printf (MSG_NOTE, "  Scalar outside cost: %d\n",
>> @@ -3728,6 +3784,77 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
>>                      min_profitable_estimate);
>>
>>    *ret_min_profitable_estimate = min_profitable_estimate;
>> +
>> +  *ret_min_profitable_combine_niters = -1;
>> +
>> +  /* Don't try to vectorize epilogue of epilogue.  */
>> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> +    return;
>> +
>> +  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
>> +    {
>> +      if (flag_vect_epilogue_cost_model == VECT_COST_MODEL_UNLIMITED)
>> +       {
>> +         if (flag_tree_vectorize_epilogues & VECT_EPILOGUE_COMBINE)
>> +           *ret_min_profitable_combine_niters = 0;
>> +         return;
>> +       }
>> +
>> +      unsigned combine_treshold
>> +       = PARAM_VALUE (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD);
>> +      /* Calculate profitability combining epilogue with the main loop.
>> +        We have a threshold for inside cost overhead (not applied
>> +        for low trip count loop case):
>> +        MIC * 100 < VIC * CT
>> +        Masked iteration should be better than a scalar prologue:
>> +        MIC + VIC < SIC * epilogue_niters  */
>> +      if (masking_inside_cost * 100 >= vec_inside_cost * combine_treshold)
>> +       {
>> +         if (dump_enabled_p ())
>> +           {
>> +             dump_printf_loc (MSG_NOTE, vect_location,
>> +                              "Combining loop with epilogue is not "
>> +                              "profitable.\n");
>> +             dump_printf_loc (MSG_NOTE, vect_location,
>> +                              "  Combining overhead %d%% exceeds "
>> +                              "treshold %d%%.\n",
>> +                              masking_inside_cost * 100 / vec_inside_cost,
>> +                              combine_treshold);
>> +           }
>> +         *ret_min_profitable_combine_niters = -1;
>> +       }
>> +      else if ((int)(masking_inside_cost + vec_inside_cost)
>> +              >= scalar_single_iter_cost * peel_iters_epilogue)
>> +       {
>> +         if (dump_enabled_p ())
>> +           {
>> +             dump_printf_loc (MSG_NOTE, vect_location,
>> +                              "Combining loop with epilogue is not "
>> +                              "profitable.\n");
>> +             dump_printf_loc (MSG_NOTE, vect_location,
>> +                              "  Scalar epilogue is faster than a "
>> +                              "single masked iteration.\n");
>> +           }
>> +         *ret_min_profitable_combine_niters = -1;
>> +       }
>> +      else if (flag_tree_vectorize_epilogues & VECT_EPILOGUE_COMBINE)
>> +       {
>> +         int inside_cost = vec_inside_cost + masking_inside_cost;
>> +         int outside_cost = vec_outside_cost + masking_prologue_cost;
>> +          int profitable_iters = ((outside_cost - scalar_outside_cost) * vf
>> +                                 - inside_cost * peel_iters_prologue
>> +                                  - inside_cost * peel_iters_epilogue)
>> +                                 / ((scalar_single_iter_cost * vf)
>> +                                    - inside_cost);
>> +
>> +         if (dump_enabled_p ())
>> +           dump_printf_loc (MSG_NOTE, vect_location,
>> +                            "Combinig loop with epilogue "
>> +                            "pofitability treshold = %d\n",
>> +                            profitable_iters);
>> +         *ret_min_profitable_combine_niters = profitable_iters;
>> +       }
>> +    }
>>  }
>>
>>  /* Writes into SEL a mask for a vec_perm, equivalent to a vec_shr by OFFSET
>> @@ -6843,20 +6970,37 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
>>    else
>>      ni_minus_gap_name = ni_name;
>>
>> -  /* Create: ratio = ni >> log2(vf) */
>> -  /* ???  As we have ni == number of latch executions + 1, ni could
>> -     have overflown to zero.  So avoid computing ratio based on ni
>> -     but compute it using the fact that we know ratio will be at least
>> -     one, thus via (ni - vf) >> log2(vf) + 1.  */
>> -  ratio_name
>> -    = fold_build2 (PLUS_EXPR, TREE_TYPE (ni_name),
>> -                  fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name),
>> -                               fold_build2 (MINUS_EXPR, TREE_TYPE (ni_name),
>> -                                            ni_minus_gap_name,
>> -                                            build_int_cst
>> -                                              (TREE_TYPE (ni_name), vf)),
>> -                               log_vf),
>> -                  build_int_cst (TREE_TYPE (ni_name), 1));
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +    {
>> +      /* Create ni + (vf-1) >> log2(vf) if epilogue is combined with loop.  */
>> +      gcc_assert (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
>> +      ratio_name
>> +       = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name),
>> +                      fold_build2 (PLUS_EXPR, TREE_TYPE (ni_name),
>> +                                   ni_name,
>> +                                   build_int_cst (TREE_TYPE (ni_name),
>> +                                                  vf - 1)),
>> +                      log_vf);
>> +    }
>> +  else
>> +    {
>> +      /* Create: ratio = ni >> log2(vf) */
>> +      /* ???  As we have ni == number of latch executions + 1, ni could
>> +        have overflown to zero.  So avoid computing ratio based on ni
>> +        but compute it using the fact that we know ratio will be at least
>> +        one, thus via (ni - vf) >> log2(vf) + 1.  */
>> +      ratio_name
>> +        = fold_build2 (PLUS_EXPR, TREE_TYPE (ni_name),
>> +                      fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name),
>> +                                   fold_build2 (MINUS_EXPR,
>> +                                                TREE_TYPE (ni_name),
>> +                                                ni_minus_gap_name,
>> +                                                build_int_cst
>> +                                                  (TREE_TYPE (ni_name), vf)),
>> +                                   log_vf),
>> +                      build_int_cst (TREE_TYPE (ni_name), 1));
>> +    }
>> +
>>    if (!is_gimple_val (ratio_name))
>>      {
>>        var = create_tmp_var (TREE_TYPE (ni_name), "bnd");
>> @@ -6886,6 +7030,485 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
>>    return;
>>  }
>>
>> +/* Function vect_gen_ivs_for_masking.
>> +
>> +   Create IVs to be used for masks computation to mask loop described
>> +   by LOOP_VINFO.  Created IVs are stored in IVS vector.  .
>> +
>> +   Initial IV values is {0, 1, ..., VF - 1} (probably split into several
>> +   vectors, in this case IVS's elements with lower index hold IV with
>> +   smaller numbers).  IV step is {VF, VF, ..., VF}.  VF is a used
>> +   vectorization factor.  */
>> +
>> +static void
>> +vect_gen_ivs_for_masking (loop_vec_info loop_vinfo, vec<tree> *ivs)
>> +{
>> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> +  tree vectype = vect_get_masking_iv_type (loop_vinfo);
>> +  tree type = TREE_TYPE (vectype);
>> +  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +  unsigned elems = TYPE_VECTOR_SUBPARTS (vectype);
>> +  int ncopies  = vf / elems;
>> +  int i, k;
>> +  tree iv, init_val, step_val;
>> +  bool insert_after;
>> +  gimple_stmt_iterator gsi;
>> +  tree *vtemp;
>> +
>> +  /* Create {VF, ..., VF} vector constant.  */
>> +  step_val = build_vector_from_val (vectype, build_int_cst (type, vf));
>> +
>> +  vtemp = XALLOCAVEC (tree, vf);
>> +  for (i = 0; i < ncopies; i++)
>> +    {
>> +      /* Create initial IV value.  */
>> +      for (k = 0; k < vf; k++)
>> +       vtemp[k] = build_int_cst (type, k + i * elems);
>> +      init_val = build_vector (vectype, vtemp);
>> +
>> +      /* Create an inductive variable including phi node.  */
>> +      standard_iv_increment_position (loop, &gsi, &insert_after);
>> +      create_iv (init_val, step_val, NULL, loop, &gsi, insert_after,
>> +                &iv, NULL);
>> +      ivs->safe_push (iv);
>> +    }
>> +}
>> +
>> +/* Function vect_get_mask_index_for_elems.
>> +
>> +   A helper function to access masks vector.  See vect_gen_loop_masks
>> +   for masks vector sorting description.  Return index of the first
>> +   mask having MASK_ELEMS elements.  */
>> +
>> +static inline unsigned
>> +vect_get_mask_index_for_elems (unsigned mask_elems)
>> +{
>> +  return current_vector_size / mask_elems - 1;
>> +}
>> +
>> +/* Function vect_get_mask_index_for_type.
>> +
>> +   A helper function to access masks vector.  See vect_gen_loop_masks
>> +   for masks vector sorting description.  Return index of the first
>> +   mask appropriate for VECTYPE.  */
>> +
>> +static inline unsigned
>> +vect_get_mask_index_for_type (tree vectype)
>> +{
>> +  unsigned elems = TYPE_VECTOR_SUBPARTS (vectype);
>> +  return vect_get_mask_index_for_elems (elems);
>> +}
>> +
>> +/* Function vect_gen_loop_masks.
>> +
>> +   Create masks to mask a loop desvribed by LOOP_VINFO.  Masks
>> +   are created according to LOOP_VINFO_REQUIRED_MASKS and are stored
>> +   into MASKS vector.
>> +
>> +   Index of a mask in a vector is computed according to a number
>> +   of masks's elements.  Masks are sorted by number of its elements
>> +   in descending order.  Index 0 is used to access a mask with
>> +   current_vector_size elements.  Among masks with the same number
>> +   of elements the one with lower index is used to mask iterations
>> +   with smaller iteration counter.  Note that you may get NULL elements
>> +   for masks which are not required.  Use vect_get_mask_index_for_elems
>> +   or vect_get_mask_index_for_type to access resulting vector.  */
>> +
>> +static void
>> +vect_gen_loop_masks (loop_vec_info loop_vinfo, vec<tree> *masks)
>> +{
>> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> +  edge pe = loop_preheader_edge (loop);
>> +  tree niters = LOOP_VINFO_NITERS (loop_vinfo);
>> +  unsigned min_mask_elems, max_mask_elems, nmasks;
>> +  unsigned iv_elems, cur_mask, prev_mask, cur_mask_elems;
>> +  auto_vec<tree> ivs;
>> +  tree vectype, mask_type;
>> +  tree vec_niters, vec_niters_val, mask;
>> +  gimple *stmt;
>> +  basic_block bb;
>> +  gimple_stmt_iterator gsi = gsi_after_labels (loop->header);
>> +  unsigned vec_size;
>> +
>> +  /* Create required IVs.  */
>> +  vect_gen_ivs_for_masking (loop_vinfo, &ivs);
>> +  vectype = TREE_TYPE (ivs[0]);
>> +
>> +  vec_size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
>> +  iv_elems = TYPE_VECTOR_SUBPARTS (vectype);
>> +
>> +  /* Get a proper niter to build a vector.  */
>> +  if (!is_gimple_val (niters))
>> +    {
>> +      gimple_seq seq = NULL;
>> +      niters = force_gimple_operand (niters, &seq, true, NULL);
>> +      gsi_insert_seq_on_edge_immediate (pe, seq);
>> +    }
>> +  /* We may need a type cast in case niter has a too small type
>> +     for generated IVs.  */
>> +  if (!types_compatible_p (TREE_TYPE (vectype), TREE_TYPE (niters)))
>> +    {
>> +      tree new_niters = make_temp_ssa_name (TREE_TYPE (vectype),
>> +                                           NULL, "niters");
>> +      stmt = gimple_build_assign (new_niters, CONVERT_EXPR, niters);
>> +      bb = gsi_insert_on_edge_immediate (pe, stmt);
>> +      gcc_assert (!bb);
>> +      niters = new_niters;
>> +    }
>> +  /* Create {NITERS, ..., NITERS} vector and put to SSA_NAME.  */
>> +  vec_niters_val = build_vector_from_val (vectype, niters);
>> +  vec_niters = vect_get_new_ssa_name (vectype, vect_simple_var, "niters");
>> +  stmt = gimple_build_assign (vec_niters, vec_niters_val);
>> +  bb = gsi_insert_on_edge_immediate (pe, stmt);
>> +  gcc_assert (!bb);
>> +
>> +  /* Determine which masks we need to compute and how many.  */
>> +  vect_get_extreme_masks (loop_vinfo, &min_mask_elems, &max_mask_elems);
>> +  nmasks = vect_get_mask_index_for_elems (MIN (min_mask_elems, iv_elems) / 2);
>> +  masks->safe_grow_cleared (nmasks);
>> +
>> +  /* Now create base masks through comparison IV < VEC_NITERS.  */
>> +  mask_type = build_same_sized_truth_vector_type (vectype);
>> +  cur_mask = vect_get_mask_index_for_elems (iv_elems);
>> +  for (unsigned i = 0; i < ivs.length (); i++)
>> +    {
>> +      tree iv = ivs[i];
>> +      mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
>> +      stmt = gimple_build_assign (mask, LT_EXPR, iv, vec_niters);
>> +      gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
>> +      (*masks)[cur_mask++] = mask;
>> +    }
>> +
>> +  /* Create narrowed masks.  */
>> +  cur_mask_elems = iv_elems;
>> +  nmasks = ivs.length ();
>> +  while (cur_mask_elems < max_mask_elems)
>> +    {
>> +      prev_mask = vect_get_mask_index_for_elems (cur_mask_elems);
>> +
>> +      cur_mask_elems <<= 1;
>> +      nmasks >>= 1;
>> +
>> +      cur_mask = vect_get_mask_index_for_elems (cur_mask_elems);
>> +
>> +      mask_type = build_truth_vector_type (cur_mask_elems, vec_size);
>> +
>> +      for (unsigned i = 0; i < nmasks; i++)
>> +       {
>> +         tree mask_low = (*masks)[prev_mask++];
>> +         tree mask_hi = (*masks)[prev_mask++];
>> +         mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
>> +         stmt = gimple_build_assign (mask, VEC_PACK_TRUNC_EXPR,
>> +                                     mask_low, mask_hi);
>> +         gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
>> +         (*masks)[cur_mask++] = mask;
>> +       }
>> +    }
>> +
>> +  /* Created widened masks.  */
>> +  cur_mask_elems = iv_elems;
>> +  nmasks = ivs.length ();
>> +  while (cur_mask_elems > min_mask_elems)
>> +    {
>> +      prev_mask = vect_get_mask_index_for_elems (cur_mask_elems);
>> +
>> +      cur_mask_elems >>= 1;
>> +      nmasks <<= 1;
>> +
>> +      cur_mask = vect_get_mask_index_for_elems (cur_mask_elems);
>> +
>> +      mask_type = build_truth_vector_type (cur_mask_elems, vec_size);
>> +
>> +      for (unsigned i = 0; i < nmasks; i += 2)
>> +       {
>> +         tree orig_mask = (*masks)[prev_mask++];
>> +
>> +         mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
>> +         stmt = gimple_build_assign (mask, VEC_UNPACK_LO_EXPR, orig_mask);
>> +         gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
>> +         (*masks)[cur_mask++] = mask;
>> +
>> +         mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
>> +         stmt = gimple_build_assign (mask, VEC_UNPACK_HI_EXPR, orig_mask);
>> +         gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
>> +         (*masks)[cur_mask++] = mask;
>> +       }
>> +    }
>> +}
>> +
>> +/* Function vect_mask_reduction_stmt.
>> +
>> +   Mask given vectorized reduction statement STMT using
>> +   MASK.  In case scalar reduction statement is vectorized
>> +   into several vector statements then PREV holds a
>> +   preceding vetor statement copy for STMT.
>> +
>> +   Masking is performed using VEC_COND_EXPR. E.g.
>> +
>> +   S1: r_1 = r_2 + d_3
>> +
>> +   is transformed into:
>> +
>> +   S1': r_4 = r_2 + d_3
>> +   S2': r_1 = VEC_COND_EXPR<MASK, r_4, r_2>
>> +
>> +   Return generated condition statement.  */
>> +
>> +static gimple *
>> +vect_mask_reduction_stmt (gimple *stmt, tree mask, gimple *prev)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +  tree vectype;
>> +  tree lhs, rhs, tmp;
>> +  gimple *new_stmt, *phi;
>> +
>> +  lhs = gimple_assign_lhs (stmt);
>> +  vectype = TREE_TYPE (lhs);
>> +
>> +  gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
>> +             == TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)));
>> +
>> +  /* Find operand RHS defined by PHI node.  */
>> +  rhs = gimple_assign_rhs1 (stmt);
>> +  gcc_assert (TREE_CODE (rhs) == SSA_NAME);
>> +  phi = SSA_NAME_DEF_STMT (rhs);
>> +
>> +  if (phi != prev && gimple_code (phi) != GIMPLE_PHI)
>> +    {
>> +      rhs = gimple_assign_rhs2 (stmt);
>> +      gcc_assert (TREE_CODE (rhs) == SSA_NAME);
>> +      phi = SSA_NAME_DEF_STMT (rhs);
>> +      gcc_assert (phi == prev || gimple_code (phi) == GIMPLE_PHI);
>> +    }
>> +
>> +  /* Convert reduction stmt to ordinary assignment to TMP.  */
>> +  tmp = vect_get_new_ssa_name (vectype, vect_simple_var, NULL);
>> +  gimple_assign_set_lhs (stmt, tmp);
>> +
>> +  /* Create VEC_COND_EXPR and insert it after STMT.  */
>> +  new_stmt = gimple_build_assign (lhs, VEC_COND_EXPR, mask, tmp, rhs);
>> +  gsi = gsi_for_stmt (stmt);
>> +  gsi_insert_after (&gsi, new_stmt, GSI_SAME_STMT);
>> +
>> +  return new_stmt;
>> +}
>> +
>> +/* Function vect_mask_mask_load_store_stmt.
>> +
>> +   Mask given vectorized MASK_LOAD or MASK_STORE statement
>> +   STMT using MASK.  Function replaces a mask used by STMT
>> +   with its conjunction with MASK.  */
>> +
>> +static void
>> +vect_mask_mask_load_store_stmt (gimple *stmt, tree mask)
>> +{
>> +  gimple *new_stmt;
>> +  tree old_mask, new_mask;
>> +  gimple_stmt_iterator gsi;
>> +
>> +  gsi = gsi_for_stmt (stmt);
>> +  old_mask = gimple_call_arg (stmt, 2);
>> +
>> +  gcc_assert (types_compatible_p (TREE_TYPE (old_mask), TREE_TYPE (mask)));
>> +
>> +  new_mask = vect_get_new_ssa_name (TREE_TYPE (mask), vect_simple_var, NULL);
>> +  new_stmt = gimple_build_assign (new_mask, BIT_AND_EXPR, old_mask, mask);
>> +  gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
>> +
>> +  gimple_call_set_arg (stmt, 2, new_mask);
>> +  update_stmt (stmt);
>> +}
>> +
>> +
>> +/* Function vect_mask_load_store_stmt.
>> +
>> +   Mask given vectorized load or store statement STMT using
>> +   MASK.  DR is a data reference for a scalar memory access.
>> +   Assignment is transformed into MASK_LOAD or MASK_STORE
>> +   statement.  SI is either an iterator pointing to STMT and
>> +   is to be updated or NULL.  */
>> +
>> +static void
>> +vect_mask_load_store_stmt (gimple *stmt, tree vectype, tree mask,
>> +                          data_reference *dr, gimple_stmt_iterator *si)
>> +{
>> +  tree mem, val, addr, ptr;
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>> +  unsigned align, misalign;
>> +  tree elem_type = TREE_TYPE (vectype);
>> +  gimple *new_stmt;
>> +
>> +  gcc_assert (!si || gsi_stmt (*si) == stmt);
>> +
>> +  gsi = gsi_for_stmt (stmt);
>> +  if (gimple_store_p (stmt))
>> +    {
>> +      val = gimple_assign_rhs1 (stmt);
>> +      mem = gimple_assign_lhs (stmt);
>> +    }
>> +  else
>> +    {
>> +      val = gimple_assign_lhs (stmt);
>> +      mem = gimple_assign_rhs1 (stmt);
>> +    }
>> +
>> +  gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
>> +             == TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)));
>> +
>> +  addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (mem),
>> +                                  true, NULL_TREE, true,
>> +                                  GSI_SAME_STMT);
>> +
>> +  align = TYPE_ALIGN_UNIT (vectype);
>> +  if (aligned_access_p (dr))
>> +    misalign = 0;
>> +  else if (DR_MISALIGNMENT (dr) == -1)
>> +    {
>> +      align = TYPE_ALIGN_UNIT (elem_type);
>> +      misalign = 0;
>> +    }
>> +  else
>> +    misalign = DR_MISALIGNMENT (dr);
>> +  set_ptr_info_alignment (get_ptr_info (addr), align, misalign);
>> +  ptr = build_int_cst (reference_alias_ptr_type (mem),
>> +                      misalign ? misalign & -misalign : align);
>> +
>> +  if (gimple_store_p (stmt))
>> +    new_stmt = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr,
>> +                                          mask, val);
>> +  else
>> +    {
>> +      new_stmt = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, ptr,
>> +                                            mask);
>> +      gimple_call_set_lhs (new_stmt, val);
>> +    }
>> +  gsi_replace (si ? si : &gsi, new_stmt, false);
>> +}
>> +
>> +/* Function vect_combine_loop_epilogue.
>> +
>> +   Combine loop epilogue with the main vectorized body.  It requires
>> +   masking of memory accesses and reductions.  */
>> +
>> +static void
>> +vect_combine_loop_epilogue (loop_vec_info loop_vinfo)
>> +{
>> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> +  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
>> +  unsigned mask_no;
>> +  auto_vec<tree> masks;
>> +
>> +  vect_gen_loop_masks (loop_vinfo, &masks);
>> +
>> +  /* Convert reduction statements if any.  */
>> +  for (unsigned i = 0; i < LOOP_VINFO_REDUCTIONS (loop_vinfo).length (); i++)
>> +    {
>> +      gimple *stmt = LOOP_VINFO_REDUCTIONS (loop_vinfo)[i];
>> +      gimple *prev_stmt = NULL;
>> +      stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> +
>> +      mask_no = vect_get_mask_index_for_type (STMT_VINFO_VECTYPE (stmt_info));
>> +
>> +      stmt = STMT_VINFO_VEC_STMT (stmt_info);
>> +      while (stmt)
>> +       {
>> +         prev_stmt = vect_mask_reduction_stmt (stmt, masks[mask_no++],
>> +                                               prev_stmt);
>> +         stmt_info = vinfo_for_stmt (stmt);
>> +         stmt = stmt_info ? STMT_VINFO_RELATED_STMT (stmt_info) : NULL;
>> +       }
>> +    }
>> +
>> +  /* Scan all loop statements to convert vector load/store including masked
>> +     form.  */
>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>> +    {
>> +      basic_block bb = bbs[i];
>> +      for (gimple_stmt_iterator si = gsi_start_bb (bb);
>> +          !gsi_end_p (si); gsi_next (&si))
>> +       {
>> +         gimple *stmt = gsi_stmt (si);
>> +         stmt_vec_info stmt_info = NULL;
>> +         tree vectype = NULL;
>> +         data_reference *dr;
>> +
>> +         /* Mask load case.  */
>> +         if (is_gimple_call (stmt)
>> +             && gimple_call_internal_p (stmt)
>> +             && gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
>> +             && !VECTOR_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, 2))))
>> +           {
>> +             stmt_info = vinfo_for_stmt (stmt);
>> +             if (!STMT_VINFO_VEC_STMT (stmt_info))
>> +               continue;
>> +             stmt = STMT_VINFO_VEC_STMT (stmt_info);
>> +             vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +           }
>> +         /* Mask store case.  */
>> +         else if (is_gimple_call (stmt)
>> +                  && gimple_call_internal_p (stmt)
>> +                  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE
>> +                  && vinfo_for_stmt (stmt)
>> +                  && STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (stmt)))
>> +           {
>> +             stmt_info = vinfo_for_stmt (stmt);
>> +             vectype = TREE_TYPE (gimple_call_arg (stmt, 2));
>> +           }
>> +         /* Load case.  */
>> +         else if (gimple_assign_load_p (stmt)
>> +                  && !VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
>> +           {
>> +             stmt_info = vinfo_for_stmt (stmt);
>> +             /* Skip vector loads.  */
>> +             if (!STMT_VINFO_VEC_STMT (stmt_info))
>> +               continue;
>> +             /* Skip invariant loads.  */
>> +             if (integer_zerop (nested_in_vect_loop_p (loop, stmt)
>> +                                ? STMT_VINFO_DR_STEP (stmt_info)
>> +                                : DR_STEP (STMT_VINFO_DATA_REF (stmt_info))))
>> +               continue;
>> +             stmt = STMT_VINFO_VEC_STMT (stmt_info);
>> +             vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +           }
>> +         /* Store case.  */
>> +         else if (gimple_code (stmt) == GIMPLE_ASSIGN
>> +                  && gimple_store_p (stmt)
>> +                  && vinfo_for_stmt (stmt)
>> +                  && STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (stmt)))
>> +           {
>> +             stmt_info = vinfo_for_stmt (stmt);
>> +             vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +           }
>> +         else
>> +           continue;
>> +
>> +         /* Skip hoisted out statements.  */
>> +         if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
>> +           continue;
>> +
>> +         mask_no = vect_get_mask_index_for_type (vectype);
>> +
>> +         dr = STMT_VINFO_DATA_REF (stmt_info);
>> +         while (stmt)
>> +           {
>> +             if (is_gimple_call (stmt))
>> +               vect_mask_mask_load_store_stmt (stmt, masks[mask_no++]);
>> +             else
>> +               vect_mask_load_store_stmt (stmt, vectype, masks[mask_no++], dr,
>> +                                          /* Have to update iterator only if
>> +                                             it points to stmt we mask.  */
>> +                                          stmt == gsi_stmt (si) ? &si : NULL);
>> +
>> +             stmt_info = vinfo_for_stmt (stmt);
>> +             stmt = stmt_info ? STMT_VINFO_RELATED_STMT (stmt_info) : NULL;
>> +           }
>> +       }
>> +    }
>> +
>> +  if (dump_enabled_p ())
>> +    dump_printf_loc (MSG_NOTE, vect_location,
>> +                    "=== Loop epilogue was combined ===\n");
>> +}
>>
>>  /* Function vect_transform_loop.
>>
>> @@ -6927,7 +7550,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>       run at least the vectorization factor number of times checking
>>       is pointless, too.  */
>>    th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>> -  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>> +  if ((th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>> +       || (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)
>> +          && th > 1))
>>        && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>      {
>>        if (dump_enabled_p ())
>> @@ -6976,12 +7601,18 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>      {
>>        tree ratio_mult_vf;
>>        if (!ni_name)
>> -       ni_name = vect_build_loop_niters (loop_vinfo);
>> +       {
>> +         ni_name = vect_build_loop_niters (loop_vinfo);
>> +         LOOP_VINFO_NITERS (loop_vinfo) = ni_name;
>> +       }
>>        vect_generate_tmps_on_preheader (loop_vinfo, ni_name, &ratio_mult_vf,
>>                                        &ratio);
>> -      epilogue = vect_do_peeling_for_loop_bound (loop_vinfo, ni_name,
>> -                                                ratio_mult_vf, th,
>> -                                                check_profitability);
>> +      /* If epilogue is combined with main loop peeling is not needed.  */
>> +      if (!LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)
>> +         || check_profitability)
>> +       epilogue = vect_do_peeling_for_loop_bound (loop_vinfo, ni_name,
>> +                                                  ratio_mult_vf, th,
>> +                                                  check_profitability);
>>      }
>>    else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>      ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
>> @@ -6989,7 +7620,10 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>    else
>>      {
>>        if (!ni_name)
>> -       ni_name = vect_build_loop_niters (loop_vinfo);
>> +       {
>> +         ni_name = vect_build_loop_niters (loop_vinfo);
>> +         LOOP_VINFO_NITERS (loop_vinfo) = ni_name;
>> +       }
>>        vect_generate_tmps_on_preheader (loop_vinfo, ni_name, NULL, &ratio);
>>      }
>>
>> @@ -7243,23 +7877,35 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>
>>    slpeel_make_loop_iterate_ntimes (loop, ratio);
>>
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +    vect_combine_loop_epilogue (loop_vinfo);
>> +
>>    /* Reduce loop iterations by the vectorization factor.  */
>>    scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
>>                       expected_iterations / vectorization_factor);
>>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>>        && loop->nb_iterations_upper_bound != 0)
>> -    loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1;
>> -  loop->nb_iterations_upper_bound
>> -    = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
>> -                     vectorization_factor) - 1;
>> -
>> +    loop->nb_iterations_upper_bound -= 1;
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +    loop->nb_iterations_upper_bound
>> +      = wi::div_ceil (loop->nb_iterations_upper_bound + 1,
>> +                     vectorization_factor, UNSIGNED) - 1;
>> +  else
>> +    loop->nb_iterations_upper_bound
>> +      = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
>> +                       vectorization_factor) - 1;
>>    if (loop->any_estimate)
>>      {
>> -      loop->nb_iterations_estimate
>> -        = wi::udiv_floor (loop->nb_iterations_estimate, vectorization_factor);
>> -       if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>> -          && loop->nb_iterations_estimate != 0)
>> -        loop->nb_iterations_estimate = loop->nb_iterations_estimate - 1;
>> +      if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +       loop->nb_iterations_estimate
>> +         = wi::div_ceil (loop->nb_iterations_estimate, vectorization_factor,
>> +                         UNSIGNED);
>> +      else
>> +       loop->nb_iterations_estimate
>> +         = wi::udiv_floor (loop->nb_iterations_estimate, vectorization_factor);
>> +      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>> +         && loop->nb_iterations_estimate != 0)
>> +       loop->nb_iterations_estimate -= 1;
>>      }
>>
>>    if (dump_enabled_p ())


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]