This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, rs6000] Avoid vectorizing versioned copy loops with vectorization factor 2


Thanks!  This was committed to trunk last week as r247671.  As we discussed
offline, I've also backported to GCC 7 (r248010) and GCC 6 (r248011).

Bill

> On May 5, 2017, at 10:30 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi Bill,
> 
> On Wed, May 03, 2017 at 02:43:09PM -0500, Bill Schmidt wrote:
>> We recently became aware of some poor code generation as a result of
>> unprofitable (for POWER) loop vectorization.  When a loop is simply copying
>> data with 64-bit loads and stores, vectorizing with 128-bit loads and stores
>> generally does not provide any benefit on modern POWER processors.
>> Furthermore, if there is a requirement to version the loop for aliasing,
>> alignment, etc., the cost of the versioning test is almost certainly a
>> performance loss for such loops.  The user code example included such a copy
>> loop, executed only a few times on average, within an outer loop that was
>> executed many times on average, causing a tremendous slowdown.
>> 
>> This patch very specifically targets these kinds of loops and no others,
>> and artificially inflates the vectorization cost to ensure vectorization
>> does not appear profitable.  This is done within the target model cost
>> hooks to avoid affecting other targets.  A new test case is included that
>> demonstrates the refusal to vectorize.
>> 
>> We've done SPEC performance testing to verify that the patch does not
>> degrade such workloads.  Results were all in the noise range.  The
>> customer code performance loss was verified to have been reversed.
>> 
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
>> Is this ok for trunk?
> 
>> 2017-05-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>> 
>> 	* config/rs6000/rs6000.c (rs6000_vect_nonmem): New static var.
>> 	(rs6000_init_cost): Initialize rs6000_vect_nonmem.
>> 	(rs6000_add_stmt_cost): Update rs6000_vect_nonmem.
>> 	(rs6000_finish_cost): Avoid vectorizing simple copy loops with
>> 	VF=2 that require versioning.
>> 
>> [gcc/testsuite]
>> 
>> 2017-05-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>> 
>> 	* gcc.target/powerpc/veresioned-copy-loop.c: New file.
> 
>> --- gcc/config/rs6000/rs6000.c	(revision 247560)
>> +++ gcc/config/rs6000/rs6000.c	(working copy)
>> @@ -5873,6 +5873,8 @@ rs6000_density_test (rs6000_cost_data *data)
>> 
>> /* Implement targetm.vectorize.init_cost.  */
>> 
>> +static bool rs6000_vect_nonmem;
> 
> Please put a comment on this, saying what it is for.
> 
>> +      /* Check whether we're doing something other than just a copy loop.
>> +	 Not all such loops may be profitably vectorized; see
>> +	 rs6000_finish_cost.  */
>> +      if ((where == vect_body
>> +	   && (kind == vector_stmt || kind == vec_to_scalar || kind == vec_perm
>> +	       || kind == vec_promote_demote || kind == vec_construct
>> +	       || kind == scalar_to_vec))
>> +	  || (where != vect_body
>> +	      && (kind == vec_to_scalar || kind == vec_perm
>> +		  || kind == vec_promote_demote || kind == vec_construct
>> +		  || kind == scalar_to_vec)))
>> +	rs6000_vect_nonmem = true;
> 
> Perhaps
> 
> +      if ((kind == vec_to_scalar || kind == vec_perm
> +	       || kind == vec_promote_demote || kind == vec_construct
> +	       || kind == scalar_to_vec)
> +	   || (where == vect_body && kind == vector_stmt))
>> +	rs6000_vect_nonmem = true;
> 
> if you agree that is clearer.
> 
> Okay for trunk with the comment added, and the condition either or not
> simplified.  Thanks,
> 
> 
> Segher
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]