This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[committed][AArch64] Add support for the SVE PCS


The AAPCS64 specifies that if a function takes arguments in SVE
registers or returns them in SVE registers, it must preserve all
of Z8-Z23 and all of P4-P11.  (Normal functions only preserve the
low 64 bits of Z8-Z15 and clobber all of the predicate registers.)

This variation is known informally as the "SVE PCS" and functions
that use it are known informally as "SVE functions".  The SVE PCS
is mutually interoperable with functions that follow the standard
AAPCS64 rules and those that use the aarch64_vector_pcs attribute.
(Note that it's an error to use the attribute for SVE functions.)

One complication -- although it's not really that complicated --
is that SVE registers need to be saved at a VL-dependent offset while
other registers need to be saved at a constant offset.  The easiest way
of handling this seemed to be to group the SVE registers together below
the hard frame pointer.  In common cases, the frame pointer is then
usually an easy-to-compute VL multiple above the stack pointer and a
constant amount below the incoming stack pointer.

A bigger complication is that, because the base AAPCS64 specifies that
only the low 64 bits of V8-V15 are preserved by calls, the associated
DWARF frame registers are also treated as 64 bits by the unwinder.
The 64 bits must also have the same layout as they would for a base
AAPCS64 function, otherwise unwinding won't work correctly.  (This is
actually a problem for the existing aarch64_vector_pcs support too,
but I'll fix that separately.)

This falls out naturally for little-endian targets but not for
big-endian targets.  The easiest way of meeting the requirement for them
was to use ST1D and LD1D to save and restore Z8-Z15, which also has the
nice property of storing the 64 bits at the start of the slot.  However,
using ST1D and LD1D requires a spare predicate register, and since all
of P0-P7 are either argument registers or call-preserved, we may need
to spill P4 in order to save the vector registers, even if P4 wouldn't
need to be saved otherwise.

Since Z16-Z23 are fully clobbered by base AAPCS64 functions, we don't
need to emit frame information for them at all.  This avoids having
to decide whether the registers should be treated as having 64 bits
(as for Z8-Z15), 128 bits (for Advanced SIMD) or the full SVE width.

There are two ways of dealing with stack-clash protection when
saving SVE registers:

(1) If the area between the hard frame pointer and the incoming stack
    pointer is allocated via a store with writeback (callee_adjust != 0),
    the SVE save area is allocated separately and becomes the "initial"
    allocation as far as stack-clash protection goes.  In this case
    the store with writeback acts as a probe at the hard frame pointer
    position.

(2) If the area between the hard frame pointer and the incoming stack
    pointer is allocated via aarch64_allocate_and_probe_stack_space,
    the SVE save area is added to this initial allocation, so that the
    SP ends up pointing at the SVE register saves.  It's then necessary
    to use a temporary base register to save the non-SVE registers.
    Setting up this temporary register requires a single instruction
    only and so should be more efficient than doing two allocations
    and probes.

When SVE registers need to be saved, saving them below the frame pointer
makes it harder to rely on the LR save as a stack probe, since the LR
register's offset won't usually be a compile-time constant.  The patch
copes with that by using the lowest SVE register save as a stack probe
too, and thus prevents the save from being shrink-wrapped if stack clash
protection is enabled.

The changelog describes the low-level details.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r277564.

Richard


2019-10-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* calls.c (pass_by_reference): Leave the target to decide whether
	POLY_INT_CST-sized arguments should be passed by value or reference,
	rather than forcing them to be passed by reference.
	(must_pass_in_stack_var_size): Likewise.
	* config/aarch64/aarch64.md (LAST_SAVED_REGNUM): Redefine from
	V31_REGNUM to P15_REGNUM.
	* config/aarch64/aarch64-protos.h (aarch64_init_cumulative_args):
	Take an extra "silent_p" parameter, defaulting to false.
	(aarch64_sve::svbool_type_p): Declare.
	(aarch64_sve::nvectors_if_data_type): Likewise.
	* config/aarch64/aarch64.h (NUM_PR_ARG_REGS): New macro.
	(aarch64_frame::reg_offset): Turn into poly_int64s.
	(aarch64_frame::save_regs_size): Likewise.
	(aarch64_frame::below_hard_fp_saved_regs_size): New field.
	(aarch64_frame::sve_callee_adjust): Likewise.
	(aarch64_frame::spare_reg_reg): Likewise.
	(ARM_PCS_SVE): New arm_pcs value.
	(CUMULATIVE_ARGS::aapcs_nprn): New field.
	(CUMULATIVE_ARGS::aapcs_nextnprn): Likewise.
	(CUMULATIVE_ARGS::silent_p): Likewise.
	(BITS_PER_SVE_PRED): New macro.
	* config/aarch64/aarch64.c (handle_aarch64_vector_pcs_attribute): New
	function.  Reject aarch64_vector_pcs attributes on SVE functions.
	(aarch64_attribute_table): Use the above handler.
	(aarch64_sve_abi): New function.
	(aarch64_sve_argument_p): Likewise.
	(aarch64_returns_value_in_sve_regs_p): Likewise.
	(aarch64_takes_arguments_in_sve_regs_p): Likewise.
	(aarch64_fntype_abi): Check for SVE functions and return the SVE PCS
	descriptor for them.
	(aarch64_simd_decl_p): Delete.
	(aarch64_emit_cfi_for_reg_p): New function.
	(aarch64_reg_save_mode): Remove the fndecl argument and instead use
	crtl->abi to choose the mode for FP registers.  Handle the SVE PCS.
	(aarch64_hard_regno_call_part_clobbered): Do not treat FP registers
	as partly clobbered for the SVE PCS.
	(aarch64_function_ok_for_sibcall): Check whether the two functions
	use the same ABI, rather than checking specifically for whether
	they're aarch64_vector_pcs functions.
	(aarch64_pass_by_reference): Raise an error for attempts to pass
	SVE arguments when SVE is disabled.  Pass SVE arguments by reference
	if there are not enough free registers left, or if the argument is
	variadic.
	(aarch64_function_value): Handle SVE predicates, vectors and tuples.
	(aarch64_return_in_memory): Do not return SVE predicates, vectors and
	tuples in memory.
	(aarch64_layout_arg): Take a function_arg_info rather than
	individual properties.  Handle SVE predicates, vectors and tuples.
	Raise an error if they are passed to unprototyped functions.
	(aarch64_function_arg): If the silent_p flag is set, suppress the
	usual error about using float registers without TARGET_FLOAT.
	(aarch64_init_cumulative_args): Take a silent_p parameter and store
	it in the cumulative_args structure.  Initialize aapcs_nprn and
	aapcs_nextnprn.  If the silent_p flag is set, suppress the usual
	error about using float registers without TARGET_FLOAT.
	If the silent_p flag is not set, also raise an error about
	using SVE functions when SVE is disabled.
	(aarch64_function_arg_advance): Update the call to aarch64_layout_arg,
	and call it for SVE functions too.  Update aapcs_nprn similarly
	to the other register counts.
	(aarch64_layout_frame): If a big-endian function needs to save
	and restore Z8-Z15, search for a spare predicate that it can use.
	Store SVE predicates at the bottom of the register save area,
	followed by SVE vectors, then followed by the normal slots.
	Keep pointing the hard frame pointer at the base of the normal slots,
	above the SVE vectors.  Update the various frame creation and
	tear-down strategies for the new layout, initializing the new
	sve_callee_adjust field.  Add an additional layout for frames
	whose saved registers are all SVE registers.
	(aarch64_register_saved_on_entry): Cope with poly_int64 reg_offsets.
	(aarch64_return_address_signing_enabled): Likewise.
	(aarch64_push_regs, aarch64_pop_regs): Update calls to
	aarch64_reg_save_mode.
	(aarch64_adjust_sve_callee_save_base): New function.
	(aarch64_add_cfa_expression): Move earlier in file.  Take the
	saved register as an rtx rather than a register number and use
	its mode for the MEM slot.
	(aarch64_save_callee_saves): Remove the mode argument and instead
	use aarch64_reg_save_mode to get the mode of each save slot.
	Add a hard_fp_valid_p parameter.  Cope with poly_int64 register
	offsets.  Allow GP offsets to be saved at a VL-based offset from
	the stack, handling this case using the frame pointer if available
	or a temporary register otherwise.  Use ST1D to save Z8-Z15 for
	big-endian SVE functions; use normal moves for other SVE saves.
	Only mark the save as frame-related if aarch64_emit_cfi_for_reg_p
	returns true.  Add explicit CFA notes when not storing via the
	stack pointer.  Do not try to pair SVE saves.
	(aarch64_restore_callee_saves): Cope with poly_int64 register
	offsets.  Use LD1D to restore Z8-Z15 for big-endian SVE functions;
	use normal moves for other SVE restores.  Only add CFA restore notes
	if aarch64_emit_cfi_for_reg_p returns true.  Do not try to pair
	SVE restores.
	(aarch64_get_separate_components): Always keep the first SVE save
	in the prologue if we need to use it as a stack probe.  Don't allow
	Z8-Z15 saves and loads to be shrink-wrapped for big-endian targets.
	Likewise the spare predicate register that they need.  Update the
	offset calculation to account for the SVE save area.  Use the
	appropriate range check for SVE LDR and STR instructions.
	(aarch64_components_for_bb): Cope with poly_int64 reg_offsets.
	(aarch64_process_components): Likewise.  Update the offset
	calculation to account for the SVE save area.  Only mark the
	save as frame-related if aarch64_emit_cfi_for_reg_p returns true.
	Do not try to pair SVE saves.
	(aarch64_allocate_and_probe_stack_space): Cope with poly_int64
	reg_offsets.  When handling the final allocation, expect the
	first SVE register save to be part of the initial allocation
	and for it to act as a probe at SP.  Account for the SVE callee
	save area in the dump information.
	(aarch64_expand_prologue): Update the frame diagram.  Fold the
	SVE callee allocation into the initial allocation if stack clash
	protection is enabled.  Use new variables to track the offset
	of the frame chain (and hard frame pointer) from the current
	stack pointer, and likewise the offset of the bottom of the
	register save area.  Update calls to aarch64_save_callee_saves
	and aarch64_add_cfa_expression.  Apply sve_callee_adjust before
	saving the FP&SIMD registers.  Save the predicate registers.
	(aarch64_expand_epilogue): Take below_hard_fp_saved_regs_size
	into account when setting the stack pointer from the frame pointer,
	and when deciding whether we can inherit the initial adjustment
	amount from the prologue.  Restore the predicate registers after
	the vector registers, then apply sve_callee_adjust, then restore
	the general registers.
	(aarch64_secondary_reload): Don't use secondary SVE reloads
	for VNx16BImode.
	(aapcs_vfp_sub_candidate): Assert that the type is not an SVE type.
	(aarch64_short_vector_p): Return false for SVE types.
	(aarch64_vfp_is_call_or_return_candidate): Initialize *is_ha
	at the start of the function.  Return false for SVE types.
	(aarch64_asm_output_variant_pcs): Output .variant_pcs for SVE
	functions too.
	(TARGET_STRICT_ARGUMENT_NAMING): Redefine to request strict naming.
	* config/aarch64/aarch64-sve.md (*aarch64_sve_mov<mode>_le): Extend
	to big-endian targets for bytewise moves.
	(*aarch64_sve_mov<mode>_be): Exclude the bytewise case.

gcc/testsuite/
	* gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp: New file.
	* gcc.target/aarch64/sve/pcs/annotate_1.c: New test.
	* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_5.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_6.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_10.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_11_nosc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_11_sc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_9.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_5.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_6.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_9.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_be_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_be_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_le_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_le_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_5_be.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_5_le.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/unprototyped_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_3_nosc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_3_sc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/vpcs_1.c: Likewise.
	* g++.target/aarch64/sve/catch_7.C: Likewise.

Attachment: sve-pcs.patch.bz2
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]