This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] [SVE] PR88837 - Poor vector construction code in VL-specific mode
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: Prathamesh Kulkarni <prathamesh dot kulkarni at linaro dot org>
- Cc: gcc Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 29 May 2019 13:40:15 +0100
- Subject: Re: [AArch64] [SVE] PR88837 - Poor vector construction code in VL-specific mode
- References: <CAAgBjM=-CJO=OkkRArPUrDy5JpVAQyALLC0Dq909QJVL6GGDGQ@mail.gmail.com>
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi,
> The attached patch tries to improve initialization for fixed-length
> SVE vector and it's algorithm is described in comments for
> aarch64_sve_expand_vector_init() in the patch, with help from Richard
> Sandiford. I verified tests added in the patch pass with qemu and am
> trying to run bootstrap+test on patch in qemu.
> Does the patch look OK ?
>
> Thanks,
> Prathamesh
>
> 2019-05-27 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Richard Sandiford <richard.sandiford@arm.com>
Although we iterated on ideas for the patch a bit, I didn't write
any of it, so the changelog should just have your name.
> [...]
> @@ -3207,3 +3207,15 @@
> DONE;
> }
> )
> +
> +;; Standard pattern name vec_init<mode><Vel>.
> +
> +(define_expand "vec_init<mode><Vel>"
The rest of the file doesn't have blank lines after the comment.
> +/* Subroutine of aarch64_sve_expand_vector_init for handling
> + trailing constants.
> + This function works as follows:
> + (a) Create a new vector consisting of trailing constants.
> + (b) Initialize TARGET with the constant vector using emit_move_insn.
> + (c) Insert remaining elements in TARGET using insr.
> + NELTS is the total number of elements in original vector while
> +
truncated sentence, guess the rest would have been:
NELTS_REQD is the number of elements that are actually significant.
or something.
> + ??? The heuristic used is to do above only if number of constants
> + is at least half the total number of elements. May need fine tuning. */
> +
> +static bool
> +aarch64_sve_expand_vector_init_handle_trailing_constants
> + (rtx target, const rtx_vector_builder &builder, int nelts, int nelts_reqd)
> +{
> + machine_mode mode = GET_MODE (target);
> + scalar_mode elem_mode = GET_MODE_INNER (mode);
> + int n_trailing_constants = 0;
> +
> + for (int i = nelts_reqd - 1;
> + i >= 0 && aarch64_legitimate_constant_p (elem_mode, builder.elt (i));
> + i--)
> + n_trailing_constants++;
> +
> + if (n_trailing_constants >= nelts_reqd / 2)
> + {
> + rtx_vector_builder v (mode, 1, nelts);
> + for (int i = 0; i < nelts; i++)
> + v.quick_push (builder.elt (i + nelts_reqd - n_trailing_constants));
> + rtx const_vec = v.build ();
> + emit_move_insn (target, const_vec);
> +
> + for (int i = nelts_reqd - n_trailing_constants - 1; i >= 0; i--)
> + emit_insr (target, builder.elt (i));
> +
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/* Subroutine of aarch64_sve_expand_vector_init.
> + Works as follows:
> + (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER.
> + (b) Skip trailing elements from BUILDER, which are same as
s/are same/are the same/
> + element NELTS_REQD - 1.
> + (c) Insert earlier elements in reverse order in TARGET using insr. */
> +
> +static void
> +aarch64_sve_expand_vector_init_insert_elems (rtx target,
> + const rtx_vector_builder &builder,
> + int nelts_reqd)
> +{
> + machine_mode mode = GET_MODE (target);
> + scalar_mode elem_mode = GET_MODE_INNER (mode);
> +
> + struct expand_operand ops[2];
> + enum insn_code icode = optab_handler (vec_duplicate_optab, mode);
> + gcc_assert (icode != CODE_FOR_nothing);
> +
> + create_output_operand (&ops[0], target, mode);
> + create_input_operand (&ops[1], builder.elt (nelts_reqd - 1), elem_mode);
> + expand_insn (icode, 2, ops);
> +
> + int ndups = builder.count_dups (nelts_reqd - 1, -1, -1);
> + for (int i = nelts_reqd - ndups - 1; i >= 0; i--)
> + emit_insr (target, builder.elt (i));
> +}
> +
> +/* Subroutine of aarch64_sve_expand_vector_init to handle case
> + when all trailing elements of builder are same.
> + This works as follows:
> + (a) Using expand_insn interface to broadcast last vector element in TARGET.
s/Using/Use/
> + (b) Insert remaining elements in TARGET using insr.
> +
> + ??? The heuristic used is to do above if number of same trailing elements
> + is at least 3/4 of total number of elements, loosely based on
> + heuristic from mostly_zeros_p. May need fine-tuning. */
Should be two spaces before "May".
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_1.c b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c
> new file mode 100644
> index 00000000000..c51876947fb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
These tests shouldn't require -ftree-vectorize.
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c
> new file mode 100644
> index 00000000000..d9640e42ddd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c
> @@ -0,0 +1,21 @@
> +/* { dg-do run { target aarch64_sve256_hw } } */
> +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
No need for --save-temps in the run tests.
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_11.c b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c
> new file mode 100644
> index 00000000000..b90895df436
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
> +
> +/* Case 5.5: Interleaved repeating elements and trailing same elements. */
> +
> +#include <stdint.h>
> +
> +typedef int32_t vnx4si __attribute__((vector_size (32)));
> +
> +vnx4si foo(int a, int b, int f)
> +{
> + return (vnx4si) { a, f, b, f, b, f, b, f };
> +}
> +
This is missing __attribute__ ((noipa)). Same for some other tests.
Thanks,
Richard