This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

From: Jeff Law <law at redhat dot com>
To: James Greenhalgh <james dot greenhalgh at arm dot com>, gcc-patches at gcc dot gnu dot org
Cc: nd at arm dot com, marcus dot shawcroft at arm dot com, richard dot earnshaw at arm dot com
Date: Mon, 18 Dec 2017 16:36:58 -0700
Subject: Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful
Authentication-results: sourceware.org; auth=none
References: <1513001933-17348-1-git-send-email-james.greenhalgh@arm.com> <1513007074-18802-1-git-send-email-james.greenhalgh@arm.com>

On 12/11/2017 08:44 AM, James Greenhalgh wrote:
> Hi,
> 
> In the testcase in this patch we create an SLP vector with only two
> elements. Our current vector initialisation code will first duplicate
> the first element to both lanes, then overwrite the top lane with a new
> value.
> 
> This duplication can be clunky and wasteful.
> 
> Better would be to simply use the fact that we will always be overwriting
> the remaining bits, and simply move the first element to the corrcet place
> (implicitly zeroing all other bits).
> 
> This reduces the code generation for this case, and can allow more
> efficient addressing modes, and other second order benefits for AArch64
> code which has been vectorized to V2DI mode.
> 
> Note that the change is generic enough to catch the case for any vector
> mode, but is expected to be most useful for 2x64-bit vectorization.
> 
> Unfortunately, on its own, this would cause failures in
> gcc.target/aarch64/load_v2vec_lanes_1.c and
> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> vec_merge and vec_duplicate for their simplifications to apply. To fix this,
> add a special case to the AArch64 code if we are loading from two memory
> addresses, and use the load_pair_lanes patterns directly.
> 
> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
> catch:
> 
>   (vec_merge:OUTER
>      (vec_duplicate:OUTER x:INNER)
>      (subreg:OUTER y:INNER 0)
>      (const_int N))
> 
> And simplify it to:
> 
>   (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
> 
> This is similar to the existing patterns which are tested in this function,
> without requiring the second operand to also be a vec_duplicate.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> aarch64-none-elf.
> 
> Note that this requires https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> if we don't want to ICE creating broken vector zero extends.
> 
> Are the non-AArch64 parts OK?
> 
> Thanks,
> James
> 
> ---
> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
> 	generation for cases where splatting a value is not useful.
> 	* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
> 	across a vec_duplicate and a paradoxical subreg forming a vector
> 	mode to a vec_concat.
> 
> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
> 
> 	* gcc.target/aarch64/vect-slp-dup.c: New.
> 
> 
> 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch
> 
> 

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 806c309..ed16f70 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
>  		return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
>  	    }
>  
> +	  /* Replace:
> +
> +	      (vec_merge:outer (vec_duplicate:outer x:inner)
> +			       (subreg:outer y:inner 0)
> +			       (const_int N))
> +
> +	     with (vec_concat:outer x:inner y:inner) if N == 1,
> +	     or (vec_concat:outer y:inner x:inner) if N == 2.
> +
> +	     Implicitly, this means we have a paradoxical subreg, but such
> +	     a check is cheap, so make it anyway.
I'm going to assume that N == 1 and N == 3 are handled elsewhere and do
not show up here in practice.


So is it advisable to handle the case where the VEC_DUPLICATE and SUBREG
show up in the opposite order?  Or is there some canonicalization that
prevents that?

simplify-rtx bits are OK as-is if we're certain we're not going to get
the alternate ordering of the VEC_MERGE operands.  ALso OK if you either
generalize this chunk of code or duplicate & twiddle it to handle the
alternate order.

I didn't look at the aarch64 specific bits.

jeff

References:
- [Patch combine] Don't create vector mode ZERO_EXTEND from subregs
  - From: James Greenhalgh
- [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful
  - From: James Greenhalgh

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]