This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful
On 12/11/2017 08:44 AM, James Greenhalgh wrote:
> Hi,
>
> In the testcase in this patch we create an SLP vector with only two
> elements. Our current vector initialisation code will first duplicate
> the first element to both lanes, then overwrite the top lane with a new
> value.
>
> This duplication can be clunky and wasteful.
>
> Better would be to simply use the fact that we will always be overwriting
> the remaining bits, and simply move the first element to the corrcet place
> (implicitly zeroing all other bits).
>
> This reduces the code generation for this case, and can allow more
> efficient addressing modes, and other second order benefits for AArch64
> code which has been vectorized to V2DI mode.
>
> Note that the change is generic enough to catch the case for any vector
> mode, but is expected to be most useful for 2x64-bit vectorization.
>
> Unfortunately, on its own, this would cause failures in
> gcc.target/aarch64/load_v2vec_lanes_1.c and
> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> vec_merge and vec_duplicate for their simplifications to apply. To fix this,
> add a special case to the AArch64 code if we are loading from two memory
> addresses, and use the load_pair_lanes patterns directly.
>
> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
> catch:
>
> (vec_merge:OUTER
> (vec_duplicate:OUTER x:INNER)
> (subreg:OUTER y:INNER 0)
> (const_int N))
>
> And simplify it to:
>
> (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
>
> This is similar to the existing patterns which are tested in this function,
> without requiring the second operand to also be a vec_duplicate.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> aarch64-none-elf.
>
> Note that this requires https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> if we don't want to ICE creating broken vector zero extends.
>
> Are the non-AArch64 parts OK?
>
> Thanks,
> James
>
> ---
> 2017-12-11 James Greenhalgh <james.greenhalgh@arm.com>
>
> * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
> generation for cases where splatting a value is not useful.
> * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
> across a vec_duplicate and a paradoxical subreg forming a vector
> mode to a vec_concat.
>
> 2017-12-11 James Greenhalgh <james.greenhalgh@arm.com>
>
> * gcc.target/aarch64/vect-slp-dup.c: New.
>
>
> 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch
>
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 806c309..ed16f70 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
> return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
> }
>
> + /* Replace:
> +
> + (vec_merge:outer (vec_duplicate:outer x:inner)
> + (subreg:outer y:inner 0)
> + (const_int N))
> +
> + with (vec_concat:outer x:inner y:inner) if N == 1,
> + or (vec_concat:outer y:inner x:inner) if N == 2.
> +
> + Implicitly, this means we have a paradoxical subreg, but such
> + a check is cheap, so make it anyway.
I'm going to assume that N == 1 and N == 3 are handled elsewhere and do
not show up here in practice.
So is it advisable to handle the case where the VEC_DUPLICATE and SUBREG
show up in the opposite order? Or is there some canonicalization that
prevents that?
simplify-rtx bits are OK as-is if we're certain we're not going to get
the alternate ordering of the VEC_MERGE operands. ALso OK if you either
generalize this chunk of code or duplicate & twiddle it to handle the
alternate order.
I didn't look at the aarch64 specific bits.
jeff