This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64] [SVE] PR88837 - Poor vector construction code in VL-specific mode


Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi,
> The attached patch tries to improve initialization for fixed-length
> SVE vector and it's algorithm is described in comments for
> aarch64_sve_expand_vector_init() in the patch, with help from Richard
> Sandiford. I verified tests added in the patch pass with qemu and am
> trying to run bootstrap+test on patch in qemu.
> Does the patch look OK ?
>
> Thanks,
> Prathamesh
>
> 2019-05-27  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
> 	    Richard Sandiford  <richard.sandiford@arm.com>

Although we iterated on ideas for the patch a bit, I didn't write
any of it, so the changelog should just have your name.

> [...]
> @@ -3207,3 +3207,15 @@
>      DONE;
>    }
>  )
> +
> +;; Standard pattern name vec_init<mode><Vel>.
> +
> +(define_expand "vec_init<mode><Vel>"

The rest of the file doesn't have blank lines after the comment.

> +/* Subroutine of aarch64_sve_expand_vector_init for handling
> +   trailing constants.
> +   This function works as follows:
> +   (a) Create a new vector consisting of trailing constants.
> +   (b) Initialize TARGET with the constant vector using emit_move_insn.
> +   (c) Insert remaining elements in TARGET using insr.
> +   NELTS is the total number of elements in original vector while
> +

truncated sentence, guess the rest would have been:

   NELTS_REQD is the number of elements that are actually significant.

or something.

> +   ??? The heuristic used is to do above only if number of constants
> +   is at least half the total number of elements.  May need fine tuning.  */
> +
> +static bool
> +aarch64_sve_expand_vector_init_handle_trailing_constants
> + (rtx target, const rtx_vector_builder &builder, int nelts, int nelts_reqd)
> +{
> +  machine_mode mode = GET_MODE (target);
> +  scalar_mode elem_mode = GET_MODE_INNER (mode);
> +  int n_trailing_constants = 0;
> +
> +  for (int i = nelts_reqd - 1;
> +       i >= 0 && aarch64_legitimate_constant_p (elem_mode, builder.elt (i));
> +       i--)
> +    n_trailing_constants++;
> +
> +  if (n_trailing_constants >= nelts_reqd / 2)
> +    {
> +      rtx_vector_builder v (mode, 1, nelts);
> +      for (int i = 0; i < nelts; i++)
> +	v.quick_push (builder.elt (i + nelts_reqd - n_trailing_constants));
> +      rtx const_vec = v.build ();
> +      emit_move_insn (target, const_vec);
> +
> +      for (int i = nelts_reqd - n_trailing_constants - 1; i >= 0; i--)
> +	emit_insr (target, builder.elt (i));
> +
> +      return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Subroutine of aarch64_sve_expand_vector_init.
> +   Works as follows:
> +   (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER.
> +   (b) Skip trailing elements from BUILDER, which are same as

s/are same/are the same/

> +       element NELTS_REQD - 1.
> +   (c) Insert earlier elements in reverse order in TARGET using insr.  */
> +
> +static void
> +aarch64_sve_expand_vector_init_insert_elems (rtx target,
> +					     const rtx_vector_builder &builder,
> +					     int nelts_reqd)
> +{
> +  machine_mode mode = GET_MODE (target);
> +  scalar_mode elem_mode = GET_MODE_INNER (mode);
> +
> +  struct expand_operand ops[2];
> +  enum insn_code icode = optab_handler (vec_duplicate_optab, mode);
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[1], builder.elt (nelts_reqd - 1), elem_mode);
> +  expand_insn (icode, 2, ops);
> +
> +  int ndups = builder.count_dups (nelts_reqd - 1, -1, -1);
> +  for (int i = nelts_reqd - ndups - 1; i >= 0; i--)
> +    emit_insr (target, builder.elt (i));
> +}
> +
> +/* Subroutine of aarch64_sve_expand_vector_init to handle case
> +   when all trailing elements of builder are same.
> +   This works as follows:
> +   (a) Using expand_insn interface to broadcast last vector element in TARGET.

s/Using/Use/

> +   (b) Insert remaining elements in TARGET using insr.
> +
> +   ??? The heuristic used is to do above if number of same trailing elements
> +   is at least 3/4 of total number of elements, loosely based on
> +   heuristic from mostly_zeros_p. May need fine-tuning.  */

Should be two spaces before "May".

> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_1.c b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c
> new file mode 100644
> index 00000000000..c51876947fb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */

These tests shouldn't require -ftree-vectorize.

> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c
> new file mode 100644
> index 00000000000..d9640e42ddd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c
> @@ -0,0 +1,21 @@
> +/* { dg-do run { target aarch64_sve256_hw } } */
> +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */

No need for --save-temps in the run tests.

> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_11.c b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c
> new file mode 100644
> index 00000000000..b90895df436
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
> +
> +/* Case 5.5: Interleaved repeating elements and trailing same elements.  */
> +
> +#include <stdint.h>
> +
> +typedef int32_t vnx4si __attribute__((vector_size (32)));
> +
> +vnx4si foo(int a, int b, int f) 
> +{
> +  return (vnx4si) { a, f, b, f, b, f, b, f };
> +}
> +

This is missing __attribute__ ((noipa)).  Same for some other tests.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]