This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, rs6000] Fix vec_construct vectorization cost to be somewhat more accurate
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Segher Boessenkool <segher at kernel dot crashing dot org>, David Edelsohn <dje dot gcc at gmail dot com>
- Date: Mon, 18 Jul 2016 13:12:47 +0200
- Subject: Re: [PATCH, rs6000] Fix vec_construct vectorization cost to be somewhat more accurate
- Authentication-results: sourceware.org; auth=none
- References: <1b21afb4-a971-a95d-1084-53948c9c7f4c@linux.vnet.ibm.com>
On Fri, Jul 15, 2016 at 3:55 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patch is a follow-up to Richard's patch of
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html. The cost of a
> vec_construct (initialization of an N-way vector by N scalars) is too low,
> which can cause too-aggressive vectorization in particular for N=8 or
> higher. Richard changed the default cost to N-1, which is generally
> sensible. For powerpc I am going with a slightly higher cost of N, which
> will keep us from being less conservative than the previous values when N=2.
>
> The whole cost model for powerpc needs more work (in particular we need
> to distinguish among processor models), but that's beyond the scope of
> this patch. One thing that I've called out in the comments is that a
> vec_construct can have wildly different costs depending on the scalar
> elements. If they are all the same small constant, then we only need
> a single splat-immediate instruction; but for V4SF the cost is potentially
> higher because of the need to do converts. For the splat case, we might
> want to teach the vectorizer in general to estimate the cost as just
> a vector_stmt rather than a vec_construct, but that requires some target
> knowledge of which constants can be duplicated with a splat-immediate.
>
> In any case, the purpose of this patch is simply to avoid vectorizing
> things we shouldn't when we've undercounted the cost of a vec_construct.
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions (hence the vectorization decisions in the test suite have
> not changed). Is this ok for trunk?
Note that most of the vectorizer testsuite is running with -fno-vect-cost-model,
only the costmodel tests are running with the cost model enabled.
Richard.
> Thanks,
> Bill
>
>
> 2016-07-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
> Improve vec_construct estimate.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c (revision 238312)
> +++ gcc/config/rs6000/rs6000.c (working copy)
> @@ -5138,7 +5138,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
> tree vectype, int misalign)
> {
> unsigned elements;
> - tree elem_type;
>
> switch (type_of_cost)
> {
> @@ -5245,16 +5244,16 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
> return 2;
>
> case vec_construct:
> - elements = TYPE_VECTOR_SUBPARTS (vectype);
> - elem_type = TREE_TYPE (vectype);
> - /* 32-bit vectors loaded into registers are stored as double
> - precision, so we need n/2 converts in addition to the usual
> - n/2 merges to construct a vector of short floats from them. */
> - if (SCALAR_FLOAT_TYPE_P (elem_type)
> - && TYPE_PRECISION (elem_type) == 32)
> - return elements + 1;
> - else
> - return elements / 2 + 1;
> + /* This is a rough approximation assuming non-constant elements
> + constructed into a vector via element insertion. FIXME:
> + vec_construct is not granular enough for uniformly good
> + decisions. If the initialization is a splat, this is
> + cheaper than we estimate. If we want to form four SF
> + values into a vector, it's more expensive (we need to
> + copy the four elements into two vector registers,
> + perform two conversions to single precision, and merge
> + the two result vectors). Improve this someday. */
> + return TYPE_VECTOR_SUBPARTS (vectype);
>
> default:
> gcc_unreachable ();
>