[16/n] Apply maximum nunits for BB SLP

Richard Biener richard.guenther@gmail.com
Tue Nov 5 13:22:00 GMT 2019


On Tue, Oct 29, 2019 at 6:05 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> The BB vectoriser picked vector types in the same way as the loop
> vectoriser: it picked a vector mode/size for the region and then
> based all the vector types off that choice.  This meant we could
> end up trying to use vector types that had too many elements for
> the group size.
>
> The main part of this patch is therefore about passing the SLP
> group size down to routines like get_vectype_for_scalar_type and
> ensuring that each vector type in the SLP tree is chosen wrt the
> group size.  That part in itself is pretty easy and mechanical.
>
> The main warts are:
>
> (1) We normally pick a STMT_VINFO_VECTYPE for data references at an
>     early stage (vect_analyze_data_refs).  However, nothing in the
>     BB vectoriser relied on this, or on the min_vf calculated from it.
>     I couldn't see anything other than vect_recog_bool_pattern that
>     tried to access the vector type before the SLP tree is built.

So can you not set STMT_VINFO_VECTYPE for data refs with BB vectorization
then?

> (2) It's possible for the same statement to be used in the groups of
>     different sizes.  Taking the group size into account meant that
>     we could try to pick different vector types for the same statement.

That only happens when we have multiple SLP instances though
(entries into the shared SLP graph).  It probably makes sense to
keep handling SLP instances sharing stmts together for costing
reasons but one issue is that for disjunct pieces (in the same BB)
disqualifying one cost-wise disqualifies all.  So at some point
during analysis (which should eventually cover more than a single
BB) we want to split the graph.  It probably doesn't help the above
case.

>     This problem should go away with the move to doing everything on
>     SLP trees, where presumably we would attach the vector type to the
>     SLP node rather than the stmt_vec_info.  Until then, the patch just
>     uses a first-come, first-served approach.

Yeah, I ran into not having vectype on SLP trees with invariants/externals
as well.  I suppose you didn't try simply adding that to the SLP tree
and pushing/popping it like we push/pop the def type?

Assigning the vector types should really happen in vectorizable_*
and not during SLP build itself btw.

Your update-all-shared-vectypes thing looks quadratic to me :/

> (3) A similar problem exists for grouped data references, where
>     different statements in the same dataref group could be used
>     in SLP nodes that have different group sizes.  The patch copes
>     with that by making sure that all vector types in a dataref
>     group remain consistent.
>
> The patch means that:
>
>     void
>     f (int *x, short *y)
>     {
>       x[0] += y[0];
>       x[1] += y[1];
>       x[2] += y[2];
>       x[3] += y[3];
>     }
>
> now produces:
>
>         ldr     q0, [x0]
>         ldr     d1, [x1]
>         saddw   v0.4s, v0.4s, v1.4h
>         str     q0, [x0]
>         ret
>
> instead of:
>
>         ldrsh   w2, [x1]
>         ldrsh   w3, [x1, 2]
>         fmov    s0, w2
>         ldrsh   w2, [x1, 4]
>         ldrsh   w1, [x1, 6]
>         ins     v0.s[1], w3
>         ldr     q1, [x0]
>         ins     v0.s[2], w2
>         ins     v0.s[3], w1
>         add     v0.4s, v0.4s, v1.4s
>         str     q0, [x0]
>         ret

Nice.

> Unfortunately it also means we start to vectorise
> gcc.target/i386/pr84101.c for -m32.  That seems like a target
> cost issue though; see PR92265 for details.
>
>
> 2019-10-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (vect_get_vector_types_for_stmt): Take an
>         optional maximum nunits.
>         (get_vectype_for_scalar_type): Likewise.  Also declare a form that
>         takes an slp_tree.
>         (get_mask_type_for_scalar_type): Take an optional slp_tree.
>         (vect_get_mask_type_for_stmt): Likewise.
>         * tree-vect-data-refs.c (vect_analyze_data_refs): Don't store
>         the vector type in STMT_VINFO_VECTYPE for BB vectorization.
>         * tree-vect-patterns.c (vect_recog_bool_pattern): Use
>         vect_get_vector_types_for_stmt instead of STMT_VINFO_VECTYPE
>         to get an assumed vector type for data references.
>         * tree-vect-slp.c (vect_update_shared_vectype): New function.
>         (vect_update_all_shared_vectypes): Likewise.
>         (vect_build_slp_tree_1): Pass the group size to
>         vect_get_vector_types_for_stmt.  Use vect_update_shared_vectype
>         for BB vectorization.
>         (vect_build_slp_tree_2): Call vect_update_all_shared_vectypes
>         before building the vectof from scalars.
>         (vect_analyze_slp_instance): Pass the group size to
>         get_vectype_for_scalar_type.
>         (vect_slp_analyze_node_operations_1): Don't recompute the vector
>         types for BB vectorization here; just handle the case in which
>         we deferred the choice for booleans.
>         (vect_get_constant_vectors): Pass the slp_tree to
>         get_vectype_for_scalar_type.
>         * tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise.
>         (vectorizable_call): Likewise.
>         (vectorizable_simd_clone_call): Likewise.
>         (vectorizable_conversion): Likewise.
>         (vectorizable_shift): Likewise.
>         (vectorizable_operation): Likewise.
>         (vectorizable_comparison): Likewise.
>         (vect_is_simple_cond): Take the slp_tree as argument and
>         pass it to get_vectype_for_scalar_type.
>         (vectorizable_condition): Update call accordingly.
>         (get_vectype_for_scalar_type): Take a group_size argument.
>         For BB vectorization, limit the the vector to that number
>         of elements.  Also define an overload that takes an slp_tree.
>         (get_mask_type_for_scalar_type): Add an slp_tree argument and
>         pass it to get_vectype_for_scalar_type.
>         (vect_get_vector_types_for_stmt): Add a group_size argument
>         and pass it to get_vectype_for_scalar_type.  Don't use the
>         cached vector type for BB vectorization if a group size is given.
>         Handle data references in that case.
>         (vect_get_mask_type_for_stmt): Take an slp_tree argument and
>         pass it to get_mask_type_for_scalar_type.
>
> gcc/testsuite/
>         * gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized
>         with -fno-vect-cost-model.
>         * gcc.dg/vect/bb-slp-bool-1.c: New test.
>         * gcc.target/aarch64/vect_mixed_sizes_14.c: Likewise.
>         * gcc.target/i386/pr84101.c: XFAIL for -m32.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-10-29 17:01:42.835677274 +0000
> +++ gcc/tree-vectorizer.h       2019-10-29 17:02:09.883487330 +0000
> @@ -1598,8 +1598,9 @@ extern bool vect_can_advance_ivs_p (loop
>  /* In tree-vect-stmts.c.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
>                                                  poly_uint64 = 0);
> -extern tree get_vectype_for_scalar_type (vec_info *, tree);
> -extern tree get_mask_type_for_scalar_type (vec_info *, tree);
> +extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);
> +extern tree get_vectype_for_scalar_type (vec_info *, tree, slp_tree);
> +extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree = 0);
>  extern tree get_same_sized_vectype (tree, tree);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> @@ -1649,8 +1650,8 @@ extern void optimize_mask_stores (class
>  extern gcall *vect_gen_while (tree, tree, tree);
>  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
>  extern opt_result vect_get_vector_types_for_stmt (stmt_vec_info, tree *,
> -                                                 tree *);
> -extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info);
> +                                                 tree *, unsigned int = 0);
> +extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, slp_tree = 0);
>
>  /* In tree-vect-data-refs.c.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2019-10-25 09:21:28.606327675 +0100
> +++ gcc/tree-vect-data-refs.c   2019-10-29 17:02:09.875487386 +0000
> @@ -4343,9 +4343,8 @@ vect_analyze_data_refs (vec_info *vinfo,
>
>        /* Set vectype for STMT.  */
>        scalar_type = TREE_TYPE (DR_REF (dr));
> -      STMT_VINFO_VECTYPE (stmt_info)
> -       = get_vectype_for_scalar_type (vinfo, scalar_type);
> -      if (!STMT_VINFO_VECTYPE (stmt_info))
> +      tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +      if (!vectype)
>          {
>            if (dump_enabled_p ())
>              {
> @@ -4378,14 +4377,19 @@ vect_analyze_data_refs (vec_info *vinfo,
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_NOTE, vect_location,
>                              "got vectype for stmt: %G%T\n",
> -                            stmt_info->stmt, STMT_VINFO_VECTYPE (stmt_info));
> +                            stmt_info->stmt, vectype);
>         }
>
>        /* Adjust the minimal vectorization factor according to the
>          vector type.  */
> -      vf = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
> +      vf = TYPE_VECTOR_SUBPARTS (vectype);
>        *min_vf = upper_bound (*min_vf, vf);
>
> +      /* Leave the BB vectorizer to pick the vector type later, based on
> +        the final dataref group size and SLP node size.  */
> +      if (is_a <loop_vec_info> (vinfo))
> +       STMT_VINFO_VECTYPE (stmt_info) = vectype;
> +
>        if (gatherscatter != SG_NONE)
>         {
>           gather_scatter_info gs_info;
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2019-10-29 17:01:42.543679326 +0000
> +++ gcc/tree-vect-patterns.c    2019-10-29 17:02:09.879487358 +0000
> @@ -4153,9 +4153,10 @@ vect_recog_bool_pattern (stmt_vec_info s
>            && STMT_VINFO_DATA_REF (stmt_vinfo))
>      {
>        stmt_vec_info pattern_stmt_info;
> -      vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
> -      gcc_assert (vectype != NULL_TREE);
> -      if (!VECTOR_MODE_P (TYPE_MODE (vectype)))
> +      tree nunits_vectype;
> +      if (!vect_get_vector_types_for_stmt (stmt_vinfo, &vectype,
> +                                          &nunits_vectype)
> +         || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>         return NULL;
>
>        if (check_bool_pattern (var, vinfo, bool_stmts))
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-29 17:02:06.355512105 +0000
> +++ gcc/tree-vect-slp.c 2019-10-29 17:02:09.879487358 +0000
> @@ -601,6 +601,77 @@ vect_get_and_check_slp_defs (vec_info *v
>    return 0;
>  }
>
> +/* Try to assign vector type VECTYPE to STMT_INFO for BB vectorization.
> +   Return true if we can, meaning that this choice doesn't conflict with
> +   existing SLP nodes that use STMT_INFO.  */
> +
> +static bool
> +vect_update_shared_vectype (stmt_vec_info stmt_info, tree vectype)
> +{
> +  tree old_vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  if (old_vectype && useless_type_conversion_p (vectype, old_vectype))
> +    return true;
> +
> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> +      && DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
> +    {
> +      /* We maintain the invariant that if any statement in the group is
> +        used, all other members of the group have the same vector type.  */
> +      stmt_vec_info first_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
> +      stmt_vec_info member_info = first_info;
> +      for (; member_info; member_info = DR_GROUP_NEXT_ELEMENT (member_info))
> +       if (STMT_VINFO_NUM_SLP_USES (member_info) > 0
> +           || is_pattern_stmt_p (member_info))
> +         break;
> +
> +      if (!member_info)
> +       {
> +         for (member_info = first_info; member_info;
> +              member_info = DR_GROUP_NEXT_ELEMENT (member_info))
> +           STMT_VINFO_VECTYPE (member_info) = vectype;
> +         return true;
> +       }
> +    }
> +  else if (STMT_VINFO_NUM_SLP_USES (stmt_info) == 0
> +          && !is_pattern_stmt_p (stmt_info))
> +    {
> +      STMT_VINFO_VECTYPE (stmt_info) = vectype;
> +      return true;
> +    }
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                      "Build SLP failed: incompatible vector"
> +                      " types for: %G", stmt_info->stmt);
> +      dump_printf_loc (MSG_NOTE, vect_location,
> +                      "    old vector type: %T\n", old_vectype);
> +      dump_printf_loc (MSG_NOTE, vect_location,
> +                      "    new vector type: %T\n", vectype);
> +    }
> +  return false;
> +}
> +
> +/* Try to infer and assign a vector type to all the statements in STMTS.
> +   Used only for BB vectorization.  */
> +
> +static bool
> +vect_update_all_shared_vectypes (vec<stmt_vec_info> stmts)
> +{
> +  tree vectype, nunits_vectype;
> +  if (!vect_get_vector_types_for_stmt (stmts[0], &vectype,
> +                                      &nunits_vectype, stmts.length ()))
> +    return false;
> +
> +  stmt_vec_info stmt_info;
> +  unsigned int i;
> +  FOR_EACH_VEC_ELT (stmts, i, stmt_info)
> +    if (!vect_update_shared_vectype (stmt_info, vectype))
> +      return false;
> +
> +  return true;
> +}
> +
>  /* Return true if call statements CALL1 and CALL2 are similar enough
>     to be combined into the same SLP group.  */
>
> @@ -747,6 +818,7 @@ vect_build_slp_tree_1 (unsigned char *sw
>    stmt_vec_info stmt_info;
>    FOR_EACH_VEC_ELT (stmts, i, stmt_info)
>      {
> +      vec_info *vinfo = stmt_info->vinfo;
>        gimple *stmt = stmt_info->stmt;
>        swap[i] = 0;
>        matches[i] = false;
> @@ -780,7 +852,7 @@ vect_build_slp_tree_1 (unsigned char *sw
>
>        tree nunits_vectype;
>        if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
> -                                          &nunits_vectype)
> +                                          &nunits_vectype, group_size)
>           || (nunits_vectype
>               && !vect_record_max_nunits (stmt_info, group_size,
>                                           nunits_vectype, max_nunits)))
> @@ -792,6 +864,10 @@ vect_build_slp_tree_1 (unsigned char *sw
>
>        gcc_assert (vectype);
>
> +      if (is_a <bb_vec_info> (vinfo)
> +         && !vect_update_shared_vectype (stmt_info, vectype))
> +       continue;
> +
>        if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
>         {
>           rhs_code = CALL_EXPR;
> @@ -1330,7 +1406,8 @@ vect_build_slp_tree_2 (vec_info *vinfo,
>               FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (child), j, grandchild)
>                 if (SLP_TREE_DEF_TYPE (grandchild) != vect_external_def)
>                   break;
> -             if (!grandchild)
> +             if (!grandchild
> +                 && vect_update_all_shared_vectypes (oprnd_info->def_stmts))
>                 {
>                   /* Roll back.  */
>                   this_tree_size = old_tree_size;
> @@ -1371,7 +1448,8 @@ vect_build_slp_tree_2 (vec_info *vinfo,
>              do extra work to cancel the pattern so the uses see the
>              scalar version.  */
>           && !is_pattern_stmt_p (stmt_info)
> -         && !oprnd_info->any_pattern)
> +         && !oprnd_info->any_pattern
> +         && vect_update_all_shared_vectypes (oprnd_info->def_stmts))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1468,7 +1546,9 @@ vect_build_slp_tree_2 (vec_info *vinfo,
>                   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (child), j, grandchild)
>                     if (SLP_TREE_DEF_TYPE (grandchild) != vect_external_def)
>                       break;
> -                 if (!grandchild)
> +                 if (!grandchild
> +                     && (vect_update_all_shared_vectypes
> +                         (oprnd_info->def_stmts)))
>                     {
>                       /* Roll back.  */
>                       this_tree_size = old_tree_size;
> @@ -2003,8 +2083,8 @@ vect_analyze_slp_instance (vec_info *vin
>    if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>      {
>        scalar_type = TREE_TYPE (DR_REF (dr));
> -      vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
>        group_size = DR_GROUP_SIZE (stmt_info);
> +      vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
>      }
>    else if (!dr && REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>      {
> @@ -2586,22 +2666,13 @@ vect_slp_analyze_node_operations_1 (vec_
>       Memory accesses already got their vector type assigned
>       in vect_analyze_data_refs.  */
>    bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> -  if (bb_vinfo
> -      && ! STMT_VINFO_DATA_REF (stmt_info))
> +  if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
>      {
> -      tree vectype, nunits_vectype;
> -      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
> -                                          &nunits_vectype))
> -       /* We checked this when building the node.  */
> -       gcc_unreachable ();
> -      if (vectype == boolean_type_node)
> -       {
> -         vectype = vect_get_mask_type_for_stmt (stmt_info);
> -         if (!vectype)
> -           /* vect_get_mask_type_for_stmt has already explained the
> -              failure.  */
> -           return false;
> -       }
> +      tree vectype = vect_get_mask_type_for_stmt (stmt_info, node);
> +      if (!vectype)
> +       /* vect_get_mask_type_for_stmt has already explained the
> +          failure.  */
> +       return false;
>
>        stmt_vec_info sstmt_info;
>        unsigned int i;
> @@ -3475,7 +3546,7 @@ vect_get_constant_vectors (slp_tree op_n
>        && vect_mask_constant_operand_p (stmt_vinfo))
>      vector_type = truth_type_for (stmt_vectype);
>    else
> -    vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op));
> +    vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), op_node);
>
>    unsigned int number_of_vectors
>      = vect_get_num_vectors (SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-29 17:01:42.951676460 +0000
> +++ gcc/tree-vect-stmts.c       2019-10-29 17:02:09.883487330 +0000
> @@ -783,7 +783,7 @@ vect_prologue_cost_for_slp_op (slp_tree
>    /* Without looking at the actual initializer a vector of
>       constants can be implemented as load from the constant pool.
>       When all elements are the same we can use a splat.  */
> -  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op));
> +  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), node);
>    unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length ();
>    unsigned num_vects_to_check;
>    unsigned HOST_WIDE_INT const_nunits;
> @@ -3290,7 +3290,7 @@ vectorizable_call (stmt_vec_info stmt_in
>    /* If all arguments are external or constant defs, infer the vector type
>       from the scalar type.  */
>    if (!vectype_in)
> -    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
> +    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
>    if (vec_stmt)
>      gcc_assert (vectype_in);
>    if (!vectype_in)
> @@ -4066,7 +4066,8 @@ vectorizable_simd_clone_call (stmt_vec_i
>         && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
>        {
>         tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
> -       arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type);
> +       arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
> +                                                         slp_node);
>         if (arginfo[i].vectype == NULL
>             || (simd_clone_subparts (arginfo[i].vectype)
>                 > bestn->simdclone->simdlen))
> @@ -4782,7 +4783,7 @@ vectorizable_conversion (stmt_vec_info s
>    /* If op0 is an external or constant def, infer the vector type
>       from the scalar type.  */
>    if (!vectype_in)
> -    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
> +    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
>    if (vec_stmt)
>      gcc_assert (vectype_in);
>    if (!vectype_in)
> @@ -5548,7 +5549,7 @@ vectorizable_shift (stmt_vec_info stmt_i
>    /* If op0 is an external or constant def, infer the vector type
>       from the scalar type.  */
>    if (!vectype)
> -    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
> +    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0), slp_node);
>    if (vec_stmt)
>      gcc_assert (vectype);
>    if (!vectype)
> @@ -5647,7 +5648,8 @@ vectorizable_shift (stmt_vec_info stmt_i
>                           "vector/vector shift/rotate found.\n");
>
>        if (!op1_vectype)
> -       op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1));
> +       op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1),
> +                                                  slp_node);
>        incompatible_op1_vectype_p
>         = (op1_vectype == NULL_TREE
>            || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
> @@ -5999,7 +6001,8 @@ vectorizable_operation (stmt_vec_info st
>           vectype = vectype_out;
>         }
>        else
> -       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
> +       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0),
> +                                              slp_node);
>      }
>    if (vec_stmt)
>      gcc_assert (vectype);
> @@ -9741,7 +9744,7 @@ vectorizable_load (stmt_vec_info stmt_in
>     condition operands are supportable using vec_is_simple_use.  */
>
>  static bool
> -vect_is_simple_cond (tree cond, vec_info *vinfo,
> +vect_is_simple_cond (tree cond, vec_info *vinfo, slp_tree slp_node,
>                      tree *comp_vectype, enum vect_def_type *dts,
>                      tree vectype)
>  {
> @@ -9805,7 +9808,8 @@ vect_is_simple_cond (tree cond, vec_info
>         scalar_type = build_nonstandard_integer_type
>           (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))),
>            TYPE_UNSIGNED (scalar_type));
> -      *comp_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +      *comp_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
> +                                                  slp_node);
>      }
>
>    return true;
> @@ -9912,7 +9916,7 @@ vectorizable_condition (stmt_vec_info st
>    then_clause = gimple_assign_rhs2 (stmt);
>    else_clause = gimple_assign_rhs3 (stmt);
>
> -  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo,
> +  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo, slp_node,
>                             &comp_vectype, &dts[0], slp_node ? NULL : vectype)
>        || !comp_vectype)
>      return false;
> @@ -10391,7 +10395,8 @@ vectorizable_comparison (stmt_vec_info s
>    /* Invariant comparison.  */
>    if (!vectype)
>      {
> -      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
> +      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1),
> +                                            slp_node);
>        if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
>         return false;
>      }
> @@ -11199,27 +11204,87 @@ get_related_vectype_for_scalar_type (mac
>  /* Function get_vectype_for_scalar_type.
>
>     Returns the vector type corresponding to SCALAR_TYPE as supported
> -   by the target.  */
> +   by the target.  If GROUP_SIZE is nonzero and we're performing BB
> +   vectorization, make sure that the number of elements in the vector
> +   is no bigger than GROUP_SIZE.  */
>
>  tree
> -get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
> +get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
> +                            unsigned int group_size)
>  {
> +  /* For BB vectorization, we should always have a group size once we've
> +     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
> +     are tentative requests during things like early data reference
> +     analysis and pattern recognition.  */
> +  if (is_a <bb_vec_info> (vinfo))
> +    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
> +  else
> +    group_size = 0;
> +
>    tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
>                                                       scalar_type);
>    if (vectype && vinfo->vector_mode == VOIDmode)
>      vinfo->vector_mode = TYPE_MODE (vectype);
> +
> +  /* If the natural choice of vector type doesn't satisfy GROUP_SIZE,
> +     try again with an explicit number of elements.  */
> +  if (vectype
> +      && group_size
> +      && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), group_size))
> +    {
> +      /* Start with the biggest number of units that fits within
> +        GROUP_SIZE and halve it until we find a valid vector type.
> +        Usually either the first attempt will succeed or all will
> +        fail (in the latter case because GROUP_SIZE is too small
> +        for the target), but it's possible that a target could have
> +        a hole between supported vector types.
> +
> +        If GROUP_SIZE is not a power of 2, this has the effect of
> +        trying the largest power of 2 that fits within the group,
> +        even though the group is not a multiple of that vector size.
> +        The BB vectorizer will then try to carve up the group into
> +        smaller pieces.  */
> +      unsigned int nunits = 1 << floor_log2 (group_size);
> +      do
> +       {
> +         vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
> +                                                        scalar_type, nunits);
> +         nunits /= 2;
> +       }
> +      while (nunits > 1 && !vectype);
> +    }
>    return vectype;
>  }
>
> +/* Return the vector type corresponding to SCALAR_TYPE as supported
> +   by the target.  NODE, if nonnull, is the SLP tree node that will
> +   use the returned vector type.  */
> +
> +tree
> +get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type, slp_tree node)
> +{
> +  unsigned int group_size = 0;
> +  if (node)
> +    {
> +      group_size = SLP_TREE_SCALAR_OPS (node).length ();
> +      if (group_size == 0)
> +       group_size = SLP_TREE_SCALAR_STMTS (node).length ();
> +    }
> +  return get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +}
> +
>  /* Function get_mask_type_for_scalar_type.
>
>     Returns the mask type corresponding to a result of comparison
> -   of vectors of specified SCALAR_TYPE as supported by target.  */
> +   of vectors of specified SCALAR_TYPE as supported by target.
> +   NODE, if nonnull, is the SLP tree node that will use the returned
> +   vector type.  */
>
>  tree
> -get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type)
> +get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
> +                              slp_tree node)
>  {
> -  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, node);
>
>    if (!vectype)
>      return NULL;
> @@ -11892,6 +11957,9 @@ vect_gen_while_not (gimple_seq *seq, tre
>
>  /* Try to compute the vector types required to vectorize STMT_INFO,
>     returning true on success and false if vectorization isn't possible.
> +   If GROUP_SIZE is nonzero and we're performing BB vectorization,
> +   take sure that the number of elements in the vectors is no bigger
> +   than GROUP_SIZE.
>
>     On success:
>
> @@ -11909,11 +11977,21 @@ vect_gen_while_not (gimple_seq *seq, tre
>  opt_result
>  vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
>                                 tree *stmt_vectype_out,
> -                               tree *nunits_vectype_out)
> +                               tree *nunits_vectype_out,
> +                               unsigned int group_size)
>  {
>    vec_info *vinfo = stmt_info->vinfo;
>    gimple *stmt = stmt_info->stmt;
>
> +  /* For BB vectorization, we should always have a group size once we've
> +     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
> +     are tentative requests during things like early data reference
> +     analysis and pattern recognition.  */
> +  if (is_a <bb_vec_info> (vinfo))
> +    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
> +  else
> +    group_size = 0;
> +
>    *stmt_vectype_out = NULL_TREE;
>    *nunits_vectype_out = NULL_TREE;
>
> @@ -11944,7 +12022,7 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>    tree vectype;
>    tree scalar_type = NULL_TREE;
> -  if (STMT_VINFO_VECTYPE (stmt_info))
> +  if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
>      {
>        *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
>        if (dump_enabled_p ())
> @@ -11953,15 +12031,17 @@ vect_get_vector_types_for_stmt (stmt_vec
>      }
>    else
>      {
> -      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
> -      if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> +      if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> +       scalar_type = TREE_TYPE (DR_REF (dr));
> +      else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>         scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
>        else
>         scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>
>        /* Pure bool ops don't participate in number-of-units computation.
>          For comparisons use the types being compared.  */
> -      if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
> +      if (!STMT_VINFO_DATA_REF (stmt_info)
> +         && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
>           && is_gimple_assign (stmt)
>           && gimple_assign_rhs_code (stmt) != COND_EXPR)
>         {
> @@ -11981,9 +12061,16 @@ vect_get_vector_types_for_stmt (stmt_vec
>         }
>
>        if (dump_enabled_p ())
> -       dump_printf_loc (MSG_NOTE, vect_location,
> -                        "get vectype for scalar type: %T\n", scalar_type);
> -      vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +       {
> +         if (group_size)
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "get vectype for scalar type (group size %d):"
> +                            " %T\n", group_size, scalar_type);
> +         else
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "get vectype for scalar type: %T\n", scalar_type);
> +       }
> +      vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
>        if (!vectype)
>         return opt_result::failure_at (stmt,
>                                        "not vectorized:"
> @@ -12014,7 +12101,8 @@ vect_get_vector_types_for_stmt (stmt_vec
>             dump_printf_loc (MSG_NOTE, vect_location,
>                              "get vectype for smallest scalar type: %T\n",
>                              scalar_type);
> -         nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +         nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
> +                                                       group_size);
>           if (!nunits_vectype)
>             return opt_result::failure_at
>               (stmt, "not vectorized: unsupported data-type %T\n",
> @@ -12042,10 +12130,11 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>  /* Try to determine the correct vector type for STMT_INFO, which is a
>     statement that produces a scalar boolean result.  Return the vector
> -   type on success, otherwise return NULL_TREE.  */
> +   type on success, otherwise return NULL_TREE.  NODE, if nonnull,
> +   is the SLP tree node that will use the returned vector type.  */
>
>  opt_tree
> -vect_get_mask_type_for_stmt (stmt_vec_info stmt_info)
> +vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, slp_tree node)
>  {
>    vec_info *vinfo = stmt_info->vinfo;
>    gimple *stmt = stmt_info->stmt;
> @@ -12057,7 +12146,7 @@ vect_get_mask_type_for_stmt (stmt_vec_in
>        && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
>      {
>        scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> -      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type);
> +      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, node);
>
>        if (!mask_type)
>         return opt_tree::failure_at (stmt,
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/bb-slp-4.c        2019-03-08 18:15:02.268871230 +0000
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-4.c        2019-10-29 17:02:09.875487386 +0000
> @@ -38,5 +38,4 @@ int main (void)
>    return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
> -
> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-bool-1.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-bool-1.c   2019-10-29 17:02:09.875487386 +0000
> @@ -0,0 +1,44 @@
> +#include "tree-vect.h"
> +
> +void __attribute__ ((noipa))
> +f1 (_Bool *x, unsigned short *y)
> +{
> +  x[0] = (y[0] == 1);
> +  x[1] = (y[1] == 1);
> +}
> +
> +void __attribute__ ((noipa))
> +f2 (_Bool *x, unsigned short *y)
> +{
> +  x[0] = (y[0] == 1);
> +  x[1] = (y[1] == 1);
> +  x[2] = (y[2] == 1);
> +  x[3] = (y[3] == 1);
> +  x[4] = (y[4] == 1);
> +  x[5] = (y[5] == 1);
> +  x[6] = (y[6] == 1);
> +  x[7] = (y[7] == 1);
> +}
> +
> +_Bool x[8];
> +unsigned short y[8] = { 11, 1, 9, 5, 1, 44, 1, 1 };
> +
> +int
> +main (void)
> +{
> +  check_vect ();
> +
> +  f1 (x, y);
> +
> +  if (x[0] || !x[1])
> +    __builtin_abort ();
> +
> +  x[1] = 0;
> +
> +  f2 (x, y);
> +
> +  if (x[0] || !x[1] || x[2] | x[3] || !x[4] || x[5] || !x[6] || !x[7])
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_14.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_14.c      2019-10-29 17:02:09.875487386 +0000
> @@ -0,0 +1,26 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** foo:
> +** (
> +**     ldr     d([0-9]+), \[x1\]
> +**     ldr     q([0-9]+), \[x0\]
> +**     saddw   v([0-9]+)\.4s, v\2\.4s, v\1\.4h
> +**     str     q\3, \[x0\]
> +** |
> +**     ldr     q([0-9]+), \[x0\]
> +**     ldr     d([0-9]+), \[x1\]
> +**     saddw   v([0-9]+)\.4s, v\4\.4s, v\5\.4h
> +**     str     q\6, \[x0\]
> +** )
> +**     ret
> +*/
> +void
> +foo (int *x, short *y)
> +{
> +  x[0] += y[0];
> +  x[1] += y[1];
> +  x[2] += y[2];
> +  x[3] += y[3];
> +}
> Index: gcc/testsuite/gcc.target/i386/pr84101.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/pr84101.c     2019-04-04 08:34:50.849942379 +0100
> +++ gcc/testsuite/gcc.target/i386/pr84101.c     2019-10-29 17:02:09.875487386 +0000
> @@ -18,4 +18,5 @@ uint64_pair_t pair(int num)
>    return p ;
>  }
>
> -/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */
> +/* See PR92266 for the XFAIL.  */
> +/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" { xfail ilp32 } } } */



More information about the Gcc-patches mailing list