This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue)


I meant to mention that I regstrapped this on both POWER8 and POWER9.

Bill

> On May 11, 2017, at 10:59 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> 
> Hi,
> 
> PR80695 identifies a case (similar to several others we've seen) where SLP
> vectorization is too aggressive about vectorizing stores.  The problem is
> that we undervalue the cost of a vec_construct operation.  vec_construct
> is the vectorizer's representation for building a vector from scalar
> elements.  When we construct an integer vector type from its constituent
> parts, it requires a direct move from two GPRs (one instruction on P9,
> two direct moves and a merge on P8).  The high cost of this is not
> reflected in the current cost calculation, which only counts the cost
> of combining the elements using N-1 inserts.  This patch provides a higher
> estimate that is closer to reality.  Note that all cost estimation for
> vectorization is a bit rough, so this should be viewed as a heuristic.
> 
> The patch treats all integer vectors separately from the default case.
> There is already special handling for V4SFmode, so this leaves only
> V2DFmode in the default case.  It was previously established heuristically
> that a cost factor of 2 was appropriate for V2DFmode, so that is left
> unchanged here; but since V2DFmode is the only default, we can simplify
> the calculation to just return 2.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk?
> 
> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	PR target/80695
> 	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
> 	Account for direct move costs for vec_construct of integer
> 	vectors.
> 
> [gcc/testsuite]
> 
> 2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	PR target/80695
> 	* gcc.target/powerpc/pr80695-p8.c: New file.
> 	* gcc.target/powerpc/pr80695-p9.c: New file.
> 
> 
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 247809)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
> 	if (SCALAR_FLOAT_TYPE_P (elem_type)
> 	    && TYPE_PRECISION (elem_type) == 32)
> 	  return 5;
> +	/* On POWER9, integer vector types are built up in GPRs and then
> +           use a direct move (2 cycles).  For POWER8 this is even worse,
> +           as we need two direct moves and a merge, and the direct moves
> +	   are five cycles.  */
> +	else if (INTEGRAL_TYPE_P (elem_type))
> +	  {
> +	    if (TARGET_P9_VECTOR)
> +	      return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2;
> +	    else
> +	      return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11;
> +	  }
> 	else
> -	  return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1);
> +	  /* V2DFmode doesn't need a direct move.  */
> +	  return 2;
> 
>       default:
>         gcc_unreachable ();
> Index: gcc/testsuite/gcc.target/powerpc/pr80695-p8.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr80695-p8.c	(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr80695-p8.c	(working copy)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-mcpu=power8 -O3 -fdump-tree-slp-details" } */
> +
> +/* PR80695: Verify cost model for vec_construct on POWER8.  */
> +
> +long a[10] __attribute__((aligned(16)));
> +
> +void foo (long i, long j, long k, long l)
> +{
> +  a[6] = i;
> +  a[7] = j;
> +  a[8] = k;
> +  a[9] = l;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr80695-p9.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr80695-p9.c	(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr80695-p9.c	(working copy)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-mcpu=power9 -O3 -fdump-tree-slp-details" } */
> +
> +/* PR80695: Verify cost model for vec_construct on POWER9.  */
> +
> +long a[10] __attribute__((aligned(16)));
> +
> +void foo (long i, long j, long k, long l)
> +{
> +  a[6] = i;
> +  a[7] = j;
> +  a[8] = k;
> +  a[9] = l;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]