[Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

Richard Sandiford richard.sandiford@arm.com
Fri Dec 7 17:34:00 GMT 2018


Sorry for the slow review.

Steve Ellcey <sellcey@cavium.com> writes:
> @@ -1470,6 +1479,45 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
>    return false;
>  }
>  
> +/* Return true if this is a definition of a vectorized simd function.  */
> +
> +static bool
> +aarch64_simd_decl_p (tree fndecl)
> +{
> +  tree fntype;
> +
> +  if (fndecl == NULL)
> +    return false;
> +  fntype = TREE_TYPE (fndecl);
> +  if (fntype == NULL)
> +    return false;
> +
> +  /* All functions with the aarch64_vector_pcs attribute use the simd ABI.  */
> +  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)) != NULL)
> +    return true;
> +  /* Functions without the aarch64_vector_pcs or simd attribute never use the
> +     simd ABI.  */
> +  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
> +    return false;
> +  /* Functions with the simd attribute can generate three versions of a
> +     function, a masked vector function, an unmasked vector function,
> +     and a scalar version.  Only the vector versions use the simd ABI.  */
> +  return (VECTOR_TYPE_P (TREE_TYPE (fntype)));

Is this enough?  E.g.:

    void __attribute__ ((simd)) f (int *x) { *x = 1; }

generates SIMD clones but doesn't have a vector return type.

I'm not an expert on this stuff, but it looks like:

  struct cgraph_node *node = cgraph_node::get (fndecl);
  return node && node->simdclone;

might work.  But in some ways it would be cleaner to add the
aarch64_vector_pcs attribute for SIMD clones, e.g. via a new hook,
so that the function type is "correct".

It might be more efficient to save aarch64_simd_decl_p in cfun->machine.

> @@ -4863,6 +4949,7 @@ aarch64_process_components (sbitmap components, bool prologue_p)
>  	 mergeable with the current one into a pair.  */
>        if (!satisfies_constraint_Ump (mem)
>  	  || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> +	  || (aarch64_simd_decl_p (cfun->decl) && (FP_REGNUM_P (regno)))

Formatting nit: redundant brackets around FP_REGNUM_P (regno).

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 82af4d4..44261ee 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -724,7 +724,13 @@
>    ""
>  )
>  
> -(define_insn "simple_return"
> +(define_expand "simple_return"
> +  [(simple_return)]
> +  "aarch64_use_simple_return_insn_p ()"
> +  ""
> +)
> +
> +(define_insn "*simple_return"
>    [(simple_return)]
>    ""
>    "ret"

Can't you just change the condition on the existing define_insn,
without turning it in a define_expand?  Worth a comment explaining
why if not.

> @@ -1487,6 +1538,23 @@
>    [(set_attr "type" "neon_store1_2reg<q>")]
>  )
>  
> +(define_insn "storewb_pair<TX:mode>_<P:mode>"
> +  [(parallel
> +    [(set (match_operand:P 0 "register_operand" "=&k")
> +          (plus:P (match_operand:P 1 "register_operand" "0")
> +                  (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> +     (set (mem:TX (plus:P (match_dup 0)
> +                  (match_dup 4)))

Should be indented under the (match_dup 0).

> +          (match_operand:TX 2 "register_operand" "w"))
> +     (set (mem:TX (plus:P (match_dup 0)
> +                  (match_operand:P 5 "const_int_operand" "n")))
> +          (match_operand:TX 3 "register_operand" "w"))])]

Think this last part should be:

     (set (mem:TX (plus:P (plus:P (match_dup 0)
				  (match_dup 4))
			  (match_operand:P 5 "const_int_operand" "n")))
	  (match_operand:TX 3 "register_operand" "w"))])]

> +  "TARGET_SIMD &&
> +   INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"

&& should be on the second line (which makes the line long enough to
need breaking).

> diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> index e69de29..bf6e64a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +
> +void
> +f (void)
> +{
> +  /* Clobber all fp/simd regs and verify that the correct ones are saved
> +     and restored in the prologue and epilogue of a SIMD function. */
> +  __asm__ __volatile__ ("" :::  "q0",  "q1",  "q2",  "q3");
> +  __asm__ __volatile__ ("" :::  "q4",  "q5",  "q6",  "q7");
> +  __asm__ __volatile__ ("" :::  "q8",  "q9", "q10", "q11");
> +  __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15");
> +  __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19");
> +  __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23");
> +  __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27");
> +  __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31");
> +}
> +
> +/* { dg-final { scan-assembler {\sstp\td8, d9} } } */
> +/* { dg-final { scan-assembler {\sstp\td10, d11} } } */
> +/* { dg-final { scan-assembler {\sstp\td12, d13} } } */
> +/* { dg-final { scan-assembler {\sstp\td14, d15} } } */
> +/* { dg-final { scan-assembler {\sldp\td8, d9} } } */
> +/* { dg-final { scan-assembler {\sldp\td10, d11} } } */
> +/* { dg-final { scan-assembler {\sldp\td12, d13} } } */
> +/* { dg-final { scan-assembler {\sldp\td14, d15} } } */
> +/* { dg-final { scan-assembler-not {\sstp\tq} } } */
> +/* { dg-final { scan-assembler-not {\sldp\tq} } } */
> +/* { dg-final { scan-assembler-not {\sstp\tq[01234567]} } } */
> +/* { dg-final { scan-assembler-not {\sldp\tq[01234567]} } } */
> +/* { dg-final { scan-assembler-not {\sstp\tq1[6789]} } } */
> +/* { dg-final { scan-assembler-not {\sldp\tq1[6789]} } } */
> +/* { dg-final { scan-assembler-not {\sstr\t} } } */
> +/* { dg-final { scan-assembler-not {\sldr\t} } } */

Comment doesn't match code: this is testing a normal PCS function.

> diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
> index e69de29..7d639a5e 100644
> --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +
> +void __attribute__ ((aarch64_vector_pcs))
> +f (void)
> +{
> +  /* Clobber some fp/simd regs and verify that only those are saved
> +     and restored in the prologue and epilogue of a SIMD function. */
> +  __asm__ __volatile__ ("" :::  "q8",  "q9", "q10", "q11");
> +}
> +
> +/* { dg-final { scan-assembler {\sstp\tq8, q9} } } */
> +/* { dg-final { scan-assembler {\sstp\tq10, q11} } } */
> +/* { dg-final { scan-assembler-not {\sstp\tq[034567]} } } */
> +/* { dg-final { scan-assembler-not {\sldp\tq[034567]} } } */
> +/* { dg-final { scan-assembler-not {\sstp\tq1[23456789]} } } */
> +/* { dg-final { scan-assembler-not {\sldp\tq1[23456789]} } } */
> +/* { dg-final { scan-assembler-not {\sstp\tq2[456789]} } } */
> +/* { dg-final { scan-assembler-not {\sldp\tq2[456789]} } } */
> +/* { dg-final { scan-assembler-not {\sstp\td} } } */
> +/* { dg-final { scan-assembler-not {\sldp\td} } } */
> +/* { dg-final { scan-assembler-not {\sstr\t} } } */
> +/* { dg-final { scan-assembler-not {\sldr\t} } } */

This is a nice test, but I think it would also be good to have versions
that don't clobber full register pairs.  E.g. one without q9 and another
without q10 would test individual STR Qs.

LGTM otherwise.

Thanks,
Richard



More information about the Gcc-patches mailing list