[PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
Richard Sandiford
richard.sandiford@arm.com
Mon Jul 13 16:12:37 GMT 2020
Hi,
Sorry for the slow review.
Przemyslaw Wirkus <Przemyslaw.Wirkus@arm.com> writes:
> Hi,
>
> Introduce simple peephole2 optimization which substitutes a sequence of
> four consecutive load or store (LDR, STR) instructions with two load or
> store pair (LDP, STP) instructions for 2 element supported vector modes
> (V2SI, V2SF, V2DI, and V2DF).
> Generated load / store pair instruction offset is adjusted accordingly.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Example:
> $ cat stp_vec_v2sf.c
> typedef float __attribute__((vector_size(8))) vec;
>
> void
> store_adjusted(vec *out, vec x, vec y)
> {
> out[400] = x;
> out[401] = y;
> out[402] = y;
> out[403] = x;
> }
>
> Example compiled with:
> $ ./aarch64-none-linux-gnu-gcc -S -O2 stp_vec_v2sf.c -dp
>
> Before the patch:
>
> store_adjusted:
> str d0, [x0, 3200] // 9 [c=4 l=4] *aarch64_simd_movv2si/2
> str d1, [x0, 3208] // 11 [c=4 l=4] *aarch64_simd_movv2si/2
> str d1, [x0, 3216] // 13 [c=4 l=4] *aarch64_simd_movv2si/2
> str d0, [x0, 3224] // 15 [c=4 l=4] *aarch64_simd_movv2si/2
> ret // 26 [c=0 l=4] *do_return
>
> After the patch:
>
> store_adjusted:
> add x1, x0, 3200 // 27 [c=4 l=4] *adddi3_aarch64/0
> stp d0, d1, [x1] // 28 [c=0 l=4] vec_store_pairv2siv2si
> stp d1, d0, [x1, 16] // 29 [c=0 l=4] vec_store_pairv2siv2si
> ret // 22 [c=0 l=4] *do_return
>
>
> OK for master ?
>
> kind regards,
> Przemyslaw
Thanks for doing this, looks good.
My only real comment is that I wonder if we really need the nunits
parameter, or if we should instead change the scalar_mode to a general
machine_mode and just pass the vector mode to that.
Passing nunits means that we don't need a to_constant here:
> @@ -21970,7 +21973,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
> for (int i = 0; i < num_insns; i++)
> offvals[i] = INTVAL (offset[i]);
>
> - msize = GET_MODE_SIZE (mode);
> + msize = GET_MODE_SIZE (mode) * nunits;
>
> /* Check if the offsets can be put in the right order to do a ldp/stp. */
> qsort (offvals, num_insns, sizeof (HOST_WIDE_INT),
but I think adding to_constant is fine in this context, since it's
inherently non-SVE code.
That would also avoid having to recalculate the mode:
> @@ -22114,8 +22118,11 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
> replace_equiv_address_nv (mem_4, plus_constant (Pmode, operands[8],
> new_off_3 + msize), true);
>
> - if (!aarch64_mem_pair_operand (mem_1, mode)
> - || !aarch64_mem_pair_operand (mem_3, mode))
> + /* If nunits > 1 we are adjusting for vector mode. In this case we should
> + generate mode for vector built from nunits and scalar_mode provided. */
> + mem_mode = (nunits == 1) ? mode : mode_for_vector(mode, nunits).else_void();
> + if (!aarch64_mem_pair_operand (mem_1, mem_mode)
> + || !aarch64_mem_pair_operand (mem_3, mem_mode))
> return false;
>
> if (code == ZERO_EXTEND)
…here.
One other (very) minor thing is that some of the lines were over the
80 character limit, but removing the nunits parameter might fix that :-)
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f46dea1f748a094509ecfa0292a7c54e94164c9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef float __attribute__((vector_size(8))) vec;
> +
> +vec
> +load_long(vec *v) {
> + return v[110] + v[111] + v[112] + v[113];
> +}
> +
> +/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 880" } } */
> +/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+, 16\\\]" } } */
FWIW, it's possible to avoid many of these backslashes by quoting the
regexp with {…} rather than "…". E.g.:
/* { dg-final { scan-assembler {ldp\td[0-9]+, d[0-9]+, \[x[0-9]+, 16\]} } } */
The above is fine too though.
Thanks,
Richard
More information about the Gcc-patches
mailing list