This is the mail archive of the
mailing list for the GCC project.
Re: RFC: Patch to implement Aarch64 SIMD ABI
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: Steve Ellcey <sellcey at cavium dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>, "richard.earnshaw" <richard dot earnshaw at arm dot com>, "james.greenhalgh" <james dot greenhalgh at arm dot com>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>
- Date: Thu, 19 Jul 2018 08:31:36 +0100
- Subject: Re: RFC: Patch to implement Aarch64 SIMD ABI
- References: <firstname.lastname@example.org>
Thanks for doing this.
Steve Ellcey <email@example.com> writes:
> This is a patch to support the Aarch64 SIMD ABI  in GCC. I intend
> to eventually follow this up with two more patches; one to define the
> TARGET_SIMD_CLONE* macros and one to improve the GCC register
> allocation/usage when calling SIMD functions.
> The significant difference between the standard ARM ABI and the SIMD ABI
> is that in the normal ABI a callee saves only the lower 64 bits of registers
> V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
> This patch checks for SIMD functions and saves the extra registers when
> needed. It does not change the caller behavour, so with just this patch
> there may be values saved by both the caller and callee. This is not
> efficient, but it is correct code.
> This patch bootstraps and passes the GCC testsuite but that only verifies
> I haven't broken anything, it doesn't validate the handling of SIMD functions.
> I tried to write some tests, but I could never get GCC to generate code
> that would save the FP callee-save registers in the prologue. Complex code
> might generate spills and fills but it never triggered the prologue/epilogue
> code to save V8-V23. If anyone has ideas on how to write a test that would
> cause GCC to generate this code I would appreciate some ideas. Just doing
> lots of calculations with lots of intermediate values doesn't seem to be enough.
Probably easiest to use asm clobbers, e.g.:
void __attribute__ ((aarch64_vector_pcs))
asm volatile ("" ::: "s8", "s13");
This also lets you control exactly which registers are saved.
> @@ -4105,7 +4128,8 @@ aarch64_layout_frame (void)
> /* If there is an alignment gap between integer and fp callee-saves,
> allocate the last fp register to it if possible. */
> - if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
> + if (regno == last_fp_reg && has_align_gap
> + && !simd_function && (offset & 8) == 0)
> cfun->machine->frame.reg_offset[regno] = max_int_offset;
> @@ -4117,7 +4141,7 @@ aarch64_layout_frame (void)
> else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
> && cfun->machine->frame.wb_candidate1 >= V0_REGNUM)
> cfun->machine->frame.wb_candidate2 = regno;
> - offset += UNITS_PER_WORD;
> + offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD;
> offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
> @@ -4706,8 +4730,11 @@ aarch64_process_components (sbitmap components, bool prologue_p)
> while (regno != last_regno)
> /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved
> - so DFmode for the vector registers is enough. */
> - machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode;
> + so DFmode for the vector registers is enough. For simd functions
> + we want to save the entire register. */
> + machine_mode mode = GP_REGNUM_P (regno) ? E_DImode
> + : (aarch64_simd_function_p (cfun->decl) ? E_TFmode : E_DFmode);
This condition also occurs in aarch64_push_regs and aarch64_pop_regs.
It'd probably be worth splitting it out into a subfunction.
I think you also need to handle the writeback cases, which should work
for Q registers too. This will mean extra loadwb_pair and storewb_pair
LGTM otherwise FWIW.
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index f284e74..d11474e 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -500,6 +500,8 @@ extern unsigned aarch64_architecture_version;
> #define PR_LO_REGNUM_P(REGNO)\
> (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM))
> +#define FP_SIMD_SAVED_REGNUM_P(REGNO) \
> + (((unsigned) (REGNO - V8_REGNUM)) <= (V23_REGNUM - V8_REGNUM))
(We should probably rewrite these to use IN_RANGE at some point,
but I agree it's better to be consistent until then.)