This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, rs6000] Fix vec_construct vectorization cost to be somewhat more accurate


On Fri, Jul 15, 2016 at 3:55 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patch is a follow-up to Richard's patch of
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
> vec_construct (initialization of an N-way vector by N scalars) is too low,
> which can cause too-aggressive vectorization in particular for N=8 or
> higher.  Richard changed the default cost to N-1, which is generally
> sensible.  For powerpc I am going with a slightly higher cost of N, which
> will keep us from being less conservative than the previous values when N=2.
>
> The whole cost model for powerpc needs more work (in particular we need
> to distinguish among processor models), but that's beyond the scope of
> this patch.  One thing that I've called out in the comments is that a
> vec_construct can have wildly different costs depending on the scalar
> elements.  If they are all the same small constant, then we only need
> a single splat-immediate instruction; but for V4SF the cost is potentially
> higher because of the need to do converts.  For the splat case, we might
> want to teach the vectorizer in general to estimate the cost as just
> a vector_stmt rather than a vec_construct, but that requires some target
> knowledge of which constants can be duplicated with a splat-immediate.
>
> In any case, the purpose of this patch is simply to avoid vectorizing
> things we shouldn't when we've undercounted the cost of a vec_construct.
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions (hence the vectorization decisions in the test suite have
> not changed).  Is this ok for trunk?

Note that most of the vectorizer testsuite is running with -fno-vect-cost-model,
only the costmodel tests are running with the cost model enabled.

Richard.

> Thanks,
> Bill
>
>
> 2016-07-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
>         Improve vec_construct estimate.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  (revision 238312)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -5138,7 +5138,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>                                     tree vectype, int misalign)
>  {
>    unsigned elements;
> -  tree elem_type;
>
>    switch (type_of_cost)
>      {
> @@ -5245,16 +5244,16 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>          return 2;
>
>        case vec_construct:
> -       elements = TYPE_VECTOR_SUBPARTS (vectype);
> -       elem_type = TREE_TYPE (vectype);
> -       /* 32-bit vectors loaded into registers are stored as double
> -          precision, so we need n/2 converts in addition to the usual
> -          n/2 merges to construct a vector of short floats from them.  */
> -       if (SCALAR_FLOAT_TYPE_P (elem_type)
> -           && TYPE_PRECISION (elem_type) == 32)
> -         return elements + 1;
> -       else
> -         return elements / 2 + 1;
> +       /* This is a rough approximation assuming non-constant elements
> +          constructed into a vector via element insertion.  FIXME:
> +          vec_construct is not granular enough for uniformly good
> +          decisions.  If the initialization is a splat, this is
> +          cheaper than we estimate.  If we want to form four SF
> +          values into a vector, it's more expensive (we need to
> +          copy the four elements into two vector registers,
> +          perform two conversions to single precision, and merge
> +          the two result vectors).  Improve this someday.  */
> +       return TYPE_VECTOR_SUBPARTS (vectype);
>
>        default:
>          gcc_unreachable ();
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]