This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI
- From: Steve Ellcey <sellcey at marvell dot com>
- To: Richard Sandiford <richard dot sandiford at arm dot com>, Steve Ellcey <sellcey at marvell dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 11 Dec 2018 23:01:25 +0000
- Subject: Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI
- References: <1541699539.12016.6.camel@cavium.com> <87in05utpk.fsf@arm.com>
On Fri, 2018-12-07 at 17:34 +0000, Richard Sandiford wrote:
>
> I'm not an expert on this stuff, but it looks like:
>
> struct cgraph_node *node = cgraph_node::get (fndecl);
> return node && node->simdclone;
>
> might work. But in some ways it would be cleaner to add the
> aarch64_vector_pcs attribute for SIMD clones, e.g. via a new hook,
> so that the function type is "correct".
I have changed this to add the aarch64_vector_pcs attribute to clones.
To do this I had to tweak the targetm.simd_clone.adjust target
function, but I did not have to add a new target function. That change
is part of Patch 2/4 and I will resubmit that after this email. The
change in this patch is to just check for the aarch64_vector_pcs
attribute and not look at the simd attribute to determine the ABI.
>
> > @@ -4863,6 +4949,7 @@ aarch64_process_components (sbitmap
> > components, bool prologue_p)
> > mergeable with the current one into a pair. */
> > if (!satisfies_constraint_Ump (mem)
> > || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> > + || (aarch64_simd_decl_p (cfun->decl) && (FP_REGNUM_P
> > (regno)))
>
> Formatting nit: redundant brackets around FP_REGNUM_P (regno).
Fixed.
> >
> > -(define_insn "simple_return"
> > +(define_expand "simple_return"
> > + [(simple_return)]
> > + "aarch64_use_simple_return_insn_p ()"
> > + ""
> > +)
> > +
> > +(define_insn "*simple_return"
> > [(simple_return)]
> > ""
> > "ret"
>
> Can't you just change the condition on the existing define_insn,
> without turning it in a define_expand? Worth a comment explaining
> why if not.
Yes, I am not sure why I did it this way (ciopying some other target
probably) but I got rid of the define_expand and changed the
define_insn and things seem to work fine.
>
> > @@ -1487,6 +1538,23 @@
> > [(set_attr "type" "neon_store1_2reg<q>")]
> > )
> >
> > +(define_insn "storewb_pair<TX:mode>_<P:mode>"
> > + [(parallel
> > + [(set (match_operand:P 0 "register_operand" "=&k")
> > + (plus:P (match_operand:P 1 "register_operand" "0")
> > + (match_operand:P 4 "aarch64_mem_pair_offset"
> > "n")))
> > + (set (mem:TX (plus:P (match_dup 0)
> > + (match_dup 4)))
>
> Should be indented under the (match_dup 0).
Fixed.
>
> > + (match_operand:TX 2 "register_operand" "w"))
> > + (set (mem:TX (plus:P (match_dup 0)
> > + (match_operand:P 5 "const_int_operand" "n")))
> > + (match_operand:TX 3 "register_operand" "w"))])]
>
> Think this last part should be:
>
> (set (mem:TX (plus:P (plus:P (match_dup 0)
> (match_dup 4))
> (match_operand:P 5 "const_int_operand"
> "n")))
> (match_operand:TX 3 "register_operand" "w"))])]
I think you are right about this. What I have for
loadwb_pair<TX:mode>_<P:mode> matches what is there for
loadwb_pair<GPF:mode>_<P:mode>. If this one is wrong, then I assume
the others are wrong too? This won't make a practical difference since
we call these with gen_loadwb_pair*_* calls and not via pattern
recognition, but still they should be right. Should I change them
all? I did not change this as part of this patch.
>
> > + "TARGET_SIMD &&
> > + INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE
> > (<TX:MODE>mode)"
>
> && should be on the second line (which makes the line long enough to
> need breaking).
Fixed.
>
> > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> >
>
> Comment doesn't match code: this is testing a normal PCS function.
Fixed.
>
> > +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
>
> This is a nice test, but I think it would also be good to have versions
> that don't clobber full register pairs. E.g. one without q9 and another
> without q10 would test individual STR Qs.
I added two new tests for this, simd-abi-6.c and simd-abi-7.c.
Steve Ellcey
sellcey@marvell.com
ChangeLog:
2018-12-11 Steve Ellcey <sellcey@cavium.com>
* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(aarch64_use_simple_return_insn_p): New function.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair<TX:mode>_<P:mode>): Ditto.
(storewb_pair<TX:mode>_<P:mode>): Ditto.
Testsuite ChangeLog:
2018-12-11 Steve Ellcey <sellcey@cavium.com>
* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-5.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-6.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-7.c: Ditto.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 4ed886b..e64768a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -471,6 +471,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
bool aarch64_symbolic_address_p (rtx);
bool aarch64_uimm12_shift (HOST_WIDE_INT);
bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
const char *aarch64_mangle_builtin_type (const_tree);
const char *aarch64_output_casesi (rtx *);
@@ -556,6 +557,8 @@ void aarch64_split_simd_move (rtx, rtx);
/* Check for a legitimate floating point constant for FMOV. */
bool aarch64_float_const_representable_p (rtx);
+extern int aarch64_epilogue_uses (int);
+
#if defined (RTX_CODE)
void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ea7e79f..41d2e81 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1141,6 +1141,15 @@ static const struct processor *selected_tune;
/* The current tuning set. */
struct tune_params aarch64_tune_params = generic_tunings;
+/* Table of machine attributes. */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+ /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+ affects_type_identity, handler, exclude } */
+ { "aarch64_vector_pcs", 0, 0, false, true, true, false, NULL, NULL },
+ { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
#define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
/* An ISA extension in the co-processor and main instruction set space. */
@@ -1523,6 +1532,39 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
return false;
}
+/* Return true if this is a definition of a vectorized simd function. */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+ tree fntype;
+
+ if (fndecl == NULL)
+ return false;
+ fntype = TREE_TYPE (fndecl);
+ if (fntype == NULL)
+ return false;
+
+ /* Functions with the aarch64_vector_pcs attribute use the simd ABI. */
+ if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)) != NULL)
+ return true;
+
+ return false;
+}
+
+/* Return the mode a register save/restore should use. DImode for integer
+ registers, DFmode for FP registers in non-SIMD functions (they only save
+ the bottom half of a 128 bit register), or TFmode for FP registers in
+ SIMD functions. */
+
+static machine_mode
+aarch64_reg_save_mode (tree fndecl, unsigned regno)
+{
+ return GP_REGNUM_P (regno)
+ ? E_DImode
+ : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
+}
+
/* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves
the lower 64 bits of a 128-bit register. Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits. */
@@ -3349,7 +3391,9 @@ static bool
aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
tree exp ATTRIBUTE_UNUSED)
{
- /* Currently, always true. */
+ if (aarch64_simd_decl_p (cfun->decl) != aarch64_simd_decl_p (decl))
+ return false;
+
return true;
}
@@ -4210,6 +4254,7 @@ aarch64_layout_frame (void)
{
HOST_WIDE_INT offset = 0;
int regno, last_fp_reg = INVALID_REGNUM;
+ bool simd_function = aarch64_simd_decl_p (cfun->decl);
cfun->machine->frame.emit_frame_chain = aarch64_needs_frame_chain ();
@@ -4223,6 +4268,17 @@ aarch64_layout_frame (void)
cfun->machine->frame.wb_candidate1 = INVALID_REGNUM;
cfun->machine->frame.wb_candidate2 = INVALID_REGNUM;
+ /* If this is a non-leaf simd function with calls we assume that
+ at least one of those calls is to a non-simd function and thus
+ we must save V8 to V23 in the prologue. */
+
+ if (simd_function && !crtl->is_leaf)
+ {
+ for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+ if (FP_SIMD_SAVED_REGNUM_P (regno))
+ df_set_regs_ever_live (regno, true);
+ }
+
/* First mark all the registers that really need to be saved... */
for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
cfun->machine->frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
@@ -4245,7 +4301,8 @@ aarch64_layout_frame (void)
for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
if (df_regs_ever_live_p (regno)
- && !call_used_regs[regno])
+ && (!call_used_regs[regno]
+ || (simd_function && FP_SIMD_SAVED_REGNUM_P (regno))))
{
cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
last_fp_reg = regno;
@@ -4287,7 +4344,10 @@ aarch64_layout_frame (void)
{
/* If there is an alignment gap between integer and fp callee-saves,
allocate the last fp register to it if possible. */
- if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
+ if (regno == last_fp_reg
+ && has_align_gap
+ && !simd_function
+ && (offset & 8) == 0)
{
cfun->machine->frame.reg_offset[regno] = max_int_offset;
break;
@@ -4299,7 +4359,7 @@ aarch64_layout_frame (void)
else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
&& cfun->machine->frame.wb_candidate1 >= V0_REGNUM)
cfun->machine->frame.wb_candidate2 = regno;
- offset += UNITS_PER_WORD;
+ offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD;
}
offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
@@ -4442,6 +4502,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
return gen_storewb_pairdf_di (base, base, reg, reg2,
GEN_INT (-adjustment),
GEN_INT (UNITS_PER_WORD - adjustment));
+ case E_TFmode:
+ return gen_storewb_pairtf_di (base, base, reg, reg2,
+ GEN_INT (-adjustment),
+ GEN_INT (UNITS_PER_VREG - adjustment));
default:
gcc_unreachable ();
}
@@ -4454,7 +4518,7 @@ static void
aarch64_push_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment)
{
rtx_insn *insn;
- machine_mode mode = (regno1 <= R30_REGNUM) ? E_DImode : E_DFmode;
+ machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno1);
if (regno2 == INVALID_REGNUM)
return aarch64_pushwb_single_reg (mode, regno1, adjustment);
@@ -4484,6 +4548,9 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
case E_DFmode:
return gen_loadwb_pairdf_di (base, base, reg, reg2, GEN_INT (adjustment),
GEN_INT (UNITS_PER_WORD));
+ case E_TFmode:
+ return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment),
+ GEN_INT (UNITS_PER_VREG));
default:
gcc_unreachable ();
}
@@ -4497,7 +4564,7 @@ static void
aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment,
rtx *cfi_ops)
{
- machine_mode mode = (regno1 <= R30_REGNUM) ? E_DImode : E_DFmode;
+ machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno1);
rtx reg1 = gen_rtx_REG (mode, regno1);
*cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg1, *cfi_ops);
@@ -4532,6 +4599,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2,
case E_DFmode:
return gen_store_pair_dw_dfdf (mem1, reg1, mem2, reg2);
+ case E_TFmode:
+ return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2);
+
default:
gcc_unreachable ();
}
@@ -4552,6 +4622,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
case E_DFmode:
return gen_load_pair_dw_dfdf (reg1, mem1, reg2, mem2);
+ case E_TFmode:
+ return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2);
+
default:
gcc_unreachable ();
}
@@ -4591,6 +4664,7 @@ aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
{
rtx reg, mem;
poly_int64 offset;
+ int offset_diff;
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
@@ -4606,12 +4680,12 @@ aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
offset));
regno2 = aarch64_next_callee_save (regno + 1, limit);
+ offset_diff = cfun->machine->frame.reg_offset[regno2]
+ - cfun->machine->frame.reg_offset[regno];
if (regno2 <= limit
&& !cfun->machine->reg_is_wrapped_separately[regno2]
- && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD)
- == cfun->machine->frame.reg_offset[regno2]))
-
+ && known_eq (GET_MODE_SIZE (mode), offset_diff))
{
rtx reg2 = gen_rtx_REG (mode, regno2);
rtx mem2;
@@ -4659,6 +4733,7 @@ aarch64_restore_callee_saves (machine_mode mode,
continue;
rtx reg, mem;
+ int offset_diff;
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
@@ -4670,11 +4745,12 @@ aarch64_restore_callee_saves (machine_mode mode,
mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
regno2 = aarch64_next_callee_save (regno + 1, limit);
+ offset_diff = cfun->machine->frame.reg_offset[regno2]
+ - cfun->machine->frame.reg_offset[regno];
if (regno2 <= limit
&& !cfun->machine->reg_is_wrapped_separately[regno2]
- && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD)
- == cfun->machine->frame.reg_offset[regno2]))
+ && known_eq (GET_MODE_SIZE (mode), offset_diff))
{
rtx reg2 = gen_rtx_REG (mode, regno2);
rtx mem2;
@@ -4808,13 +4884,15 @@ aarch64_components_for_bb (basic_block bb)
bitmap in = DF_LIVE_IN (bb);
bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
+ bool simd_function = aarch64_simd_decl_p (cfun->decl);
sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
bitmap_clear (components);
/* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */
for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
- if ((!call_used_regs[regno])
+ if ((!call_used_regs[regno]
+ || (simd_function && FP_SIMD_SAVED_REGNUM_P (regno)))
&& (bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
|| bitmap_bit_p (kill, regno)))
@@ -4885,9 +4963,11 @@ aarch64_process_components (sbitmap components, bool prologue_p)
while (regno != last_regno)
{
- /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved
- so DFmode for the vector registers is enough. */
- machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode;
+ /* AAPCS64 section 5.1.2 requires only the low 64 bits to be saved
+ so DFmode for the vector registers is enough. For simd functions
+ we want to save the low 128 bits. */
+ machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno);
+
rtx reg = gen_rtx_REG (mode, regno);
poly_int64 offset = cfun->machine->frame.reg_offset[regno];
if (!frame_pointer_needed)
@@ -4916,6 +4996,7 @@ aarch64_process_components (sbitmap components, bool prologue_p)
mergeable with the current one into a pair. */
if (!satisfies_constraint_Ump (mem)
|| GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
+ || (aarch64_simd_decl_p (cfun->decl) && FP_REGNUM_P (regno))
|| maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
GET_MODE_SIZE (mode)))
{
@@ -5231,6 +5312,28 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
}
}
+/* Return 1 if the register is used by the epilogue. We need to say the
+ return register is used, but only after epilogue generation is complete.
+ Note that in the case of sibcalls, the values "used by the epilogue" are
+ considered live at the start of the called function.
+
+ For SIMD functions we need to return 1 for FP registers that are saved and
+ restored by a function but are not zero in call_used_regs. If we do not do
+ this optimizations may remove the restore of the register. */
+
+int
+aarch64_epilogue_uses (int regno)
+{
+ if (epilogue_completed)
+ {
+ if (regno == LR_REGNUM)
+ return 1;
+ if (aarch64_simd_decl_p (cfun->decl) && FP_SIMD_SAVED_REGNUM_P (regno))
+ return 1;
+ }
+ return 0;
+}
+
/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
is saved at BASE + OFFSET. */
@@ -5405,8 +5508,12 @@ aarch64_expand_prologue (void)
aarch64_save_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
callee_adjust != 0 || emit_frame_chain);
- aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0 || emit_frame_chain);
+ if (aarch64_simd_decl_p (cfun->decl))
+ aarch64_save_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0 || emit_frame_chain);
+ else
+ aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0 || emit_frame_chain);
/* We may need to probe the final adjustment if it is larger than the guard
that is assumed by the called. */
@@ -5432,6 +5539,19 @@ aarch64_use_return_insn_p (void)
return known_eq (cfun->machine->frame.frame_size, 0);
}
+/* Return false for non-leaf SIMD functions in order to avoid
+ shrink-wrapping them. Doing this will lose the necessary
+ save/restore of FP registers. */
+
+bool
+aarch64_use_simple_return_insn_p (void)
+{
+ if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf)
+ return false;
+
+ return true;
+}
+
/* Generate the epilogue instructions for returning from a function.
This is almost exactly the reverse of the prolog sequence, except
that we need to insert barriers to avoid scheduling loads that read
@@ -5500,8 +5620,12 @@ aarch64_expand_epilogue (bool for_sibcall)
aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
callee_adjust != 0, &cfi_ops);
- aarch64_restore_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0, &cfi_ops);
+ if (aarch64_simd_decl_p (cfun->decl))
+ aarch64_restore_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0, &cfi_ops);
+ else
+ aarch64_restore_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0, &cfi_ops);
if (need_barrier_p)
emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
@@ -18418,6 +18542,9 @@ aarch64_libgcc_floating_mode_supported_p
#undef TARGET_ESTIMATED_POLY_VALUE
#define TARGET_ESTIMATED_POLY_VALUE aarch64_estimated_poly_value
+#undef TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE aarch64_attribute_table
+
#if CHECKING_P
#undef TARGET_RUN_TARGET_SELFTESTS
#define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 0c833a8..8b5b2ab 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -409,13 +409,7 @@ extern unsigned aarch64_architecture_version;
V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31) \
}
-/* Say that the return address register is used by the epilogue, but only after
- epilogue generation is complete. Note that in the case of sibcalls, the
- values "used by the epilogue" are considered live at the start of the called
- function. */
-
-#define EPILOGUE_USES(REGNO) \
- (epilogue_completed && (REGNO) == LR_REGNUM)
+#define EPILOGUE_USES(REGNO) (aarch64_epilogue_uses (REGNO))
/* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
the stack pointer does not matter. This is only true if the function
@@ -523,6 +517,8 @@ extern unsigned aarch64_architecture_version;
#define PR_LO_REGNUM_P(REGNO)\
(((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM))
+#define FP_SIMD_SAVED_REGNUM_P(REGNO) \
+ (((unsigned) (REGNO - V8_REGNUM)) <= (V23_REGNUM - V8_REGNUM))
/* Register and constant classes. */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6657316..cf2732e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -727,7 +727,7 @@
(define_insn "simple_return"
[(simple_return)]
- ""
+ "aarch64_use_simple_return_insn_p ()"
"ret"
[(set_attr "type" "branch")]
)
@@ -1387,6 +1387,21 @@
(set_attr "arch" "*,fp")]
)
+(define_insn "load_pair_dw_tftf"
+ [(set (match_operand:TF 0 "register_operand" "=w")
+ (match_operand:TF 1 "aarch64_mem_pair_operand" "Ump"))
+ (set (match_operand:TF 2 "register_operand" "=w")
+ (match_operand:TF 3 "memory_operand" "m"))]
+ "TARGET_SIMD
+ && rtx_equal_p (XEXP (operands[3], 0),
+ plus_constant (Pmode,
+ XEXP (operands[1], 0),
+ GET_MODE_SIZE (TFmode)))"
+ "ldp\\t%q0, %q2, %1"
+ [(set_attr "type" "neon_ldp_q")
+ (set_attr "fp" "yes")]
+)
+
;; Operands 0 and 2 are tied together by the final condition; so we allow
;; fairly lax checking on the second memory operation.
(define_insn "store_pair_sw_<SX:mode><SX2:mode>"
@@ -1422,6 +1437,21 @@
(set_attr "arch" "*,fp")]
)
+(define_insn "store_pair_dw_tftf"
+ [(set (match_operand:TF 0 "aarch64_mem_pair_operand" "=Ump")
+ (match_operand:TF 1 "register_operand" "w"))
+ (set (match_operand:TF 2 "memory_operand" "=m")
+ (match_operand:TF 3 "register_operand" "w"))]
+ "TARGET_SIMD &&
+ rtx_equal_p (XEXP (operands[2], 0),
+ plus_constant (Pmode,
+ XEXP (operands[0], 0),
+ GET_MODE_SIZE (TFmode)))"
+ "stp\\t%q1, %q3, %0"
+ [(set_attr "type" "neon_stp_q")
+ (set_attr "fp" "yes")]
+)
+
;; Load pair with post-index writeback. This is primarily used in function
;; epilogues.
(define_insn "loadwb_pair<GPI:mode>_<P:mode>"
@@ -1454,6 +1484,21 @@
[(set_attr "type" "neon_load1_2reg")]
)
+(define_insn "loadwb_pair<TX:mode>_<P:mode>"
+ [(parallel
+ [(set (match_operand:P 0 "register_operand" "=k")
+ (plus:P (match_operand:P 1 "register_operand" "0")
+ (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
+ (set (match_operand:TX 2 "register_operand" "=w")
+ (mem:TX (match_dup 1)))
+ (set (match_operand:TX 3 "register_operand" "=w")
+ (mem:TX (plus:P (match_dup 1)
+ (match_operand:P 5 "const_int_operand" "n"))))])]
+ "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
+ "ldp\\t%q2, %q3, [%1], %4"
+ [(set_attr "type" "neon_ldp_q")]
+)
+
;; Store pair with pre-index writeback. This is primarily used in function
;; prologues.
(define_insn "storewb_pair<GPI:mode>_<P:mode>"
@@ -1488,6 +1533,24 @@
[(set_attr "type" "neon_store1_2reg<q>")]
)
+(define_insn "storewb_pair<TX:mode>_<P:mode>"
+ [(parallel
+ [(set (match_operand:P 0 "register_operand" "=&k")
+ (plus:P (match_operand:P 1 "register_operand" "0")
+ (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
+ (set (mem:TX (plus:P (match_dup 0)
+ (match_dup 4)))
+ (match_operand:TX 2 "register_operand" "w"))
+ (set (mem:TX (plus:P (match_dup 0)
+ (match_operand:P 5 "const_int_operand" "n")))
+ (match_operand:TX 3 "register_operand" "w"))])]
+ "TARGET_SIMD
+ && INTVAL (operands[5])
+ == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"
+ "stp\\t%q2, %q3, [%0, %4]!"
+ [(set_attr "type" "neon_stp_q")]
+)
+
;; -------------------------------------------------------------------
;; Sign/Zero extension
;; -------------------------------------------------------------------
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp b/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp
index e69de29..22f08ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp
+++ b/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp
@@ -0,0 +1,41 @@
+# Copyright (C) 2018 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `gcc-dg.exp' driver, looping over
+# optimization options.
+
+# Exit immediately if this isn't a Aarch64 target.
+if { ![istarget aarch64*-*-*] } then {
+ return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+ set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] "" $DEFAULT_CFLAGS
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c
index e69de29..249554e 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+
+void __attribute__ ((aarch64_vector_pcs))
+f (void)
+{
+ /* Clobber all fp/simd regs and verify that the correct ones are saved
+ and restored in the prologue and epilogue of a SIMD function. */
+ __asm__ __volatile__ ("" ::: "q0", "q1", "q2", "q3");
+ __asm__ __volatile__ ("" ::: "q4", "q5", "q6", "q7");
+ __asm__ __volatile__ ("" ::: "q8", "q9", "q10", "q11");
+ __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15");
+ __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19");
+ __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23");
+ __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27");
+ __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31");
+}
+
+/* { dg-final { scan-assembler {\sstp\tq8, q9} } } */
+/* { dg-final { scan-assembler {\sstp\tq10, q11} } } */
+/* { dg-final { scan-assembler {\sstp\tq12, q13} } } */
+/* { dg-final { scan-assembler {\sstp\tq14, q15} } } */
+/* { dg-final { scan-assembler {\sstp\tq16, q17} } } */
+/* { dg-final { scan-assembler {\sstp\tq18, q19} } } */
+/* { dg-final { scan-assembler {\sstp\tq20, q21} } } */
+/* { dg-final { scan-assembler {\sstp\tq22, q23} } } */
+/* { dg-final { scan-assembler {\sldp\tq8, q9} } } */
+/* { dg-final { scan-assembler {\sldp\tq10, q11} } } */
+/* { dg-final { scan-assembler {\sldp\tq12, q13} } } */
+/* { dg-final { scan-assembler {\sldp\tq14, q15} } } */
+/* { dg-final { scan-assembler {\sldp\tq16, q17} } } */
+/* { dg-final { scan-assembler {\sldp\tq18, q19} } } */
+/* { dg-final { scan-assembler {\sldp\tq20, q21} } } */
+/* { dg-final { scan-assembler {\sldp\tq22, q23} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\td} } } */
+/* { dg-final { scan-assembler-not {\sldp\td} } } */
+/* { dg-final { scan-assembler-not {\sstr\t} } } */
+/* { dg-final { scan-assembler-not {\sldr\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
index e69de29..e28e82a 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+
+void
+f (void)
+{
+ /* Clobber all fp/simd regs and verify that the correct ones are saved
+ and restored in the prologue and epilogue of a normal non-SIMD function. */
+ __asm__ __volatile__ ("" ::: "q0", "q1", "q2", "q3");
+ __asm__ __volatile__ ("" ::: "q4", "q5", "q6", "q7");
+ __asm__ __volatile__ ("" ::: "q8", "q9", "q10", "q11");
+ __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15");
+ __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19");
+ __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23");
+ __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27");
+ __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31");
+}
+
+/* { dg-final { scan-assembler {\sstp\td8, d9} } } */
+/* { dg-final { scan-assembler {\sstp\td10, d11} } } */
+/* { dg-final { scan-assembler {\sstp\td12, d13} } } */
+/* { dg-final { scan-assembler {\sstp\td14, d15} } } */
+/* { dg-final { scan-assembler {\sldp\td8, d9} } } */
+/* { dg-final { scan-assembler {\sldp\td10, d11} } } */
+/* { dg-final { scan-assembler {\sldp\td12, d13} } } */
+/* { dg-final { scan-assembler {\sldp\td14, d15} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq[01234567]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq[01234567]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq1[6789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq1[6789]} } } */
+/* { dg-final { scan-assembler-not {\sstr\t} } } */
+/* { dg-final { scan-assembler-not {\sldr\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c
index e69de29..7d4f54f 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+
+extern void g (void);
+
+void __attribute__ ((aarch64_vector_pcs))
+f (void)
+{
+ g();
+}
+
+/* { dg-final { scan-assembler {\sstp\tq8, q9} } } */
+/* { dg-final { scan-assembler {\sstp\tq10, q11} } } */
+/* { dg-final { scan-assembler {\sstp\tq12, q13} } } */
+/* { dg-final { scan-assembler {\sstp\tq14, q15} } } */
+/* { dg-final { scan-assembler {\sstp\tq16, q17} } } */
+/* { dg-final { scan-assembler {\sstp\tq18, q19} } } */
+/* { dg-final { scan-assembler {\sstp\tq20, q21} } } */
+/* { dg-final { scan-assembler {\sstp\tq22, q23} } } */
+/* { dg-final { scan-assembler {\sldp\tq8, q9} } } */
+/* { dg-final { scan-assembler {\sldp\tq10, q11} } } */
+/* { dg-final { scan-assembler {\sldp\tq12, q13} } } */
+/* { dg-final { scan-assembler {\sldp\tq14, q15} } } */
+/* { dg-final { scan-assembler {\sldp\tq16, q17} } } */
+/* { dg-final { scan-assembler {\sldp\tq18, q19} } } */
+/* { dg-final { scan-assembler {\sldp\tq20, q21} } } */
+/* { dg-final { scan-assembler {\sldp\tq22, q23} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\td} } } */
+/* { dg-final { scan-assembler-not {\sldp\td} } } */
+/* { dg-final { scan-assembler-not {\sstr\t} } } */
+/* { dg-final { scan-assembler-not {\sldr\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c
index e69de29..e399690 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c
@@ -0,0 +1,34 @@
+/* dg-do run */
+/* { dg-additional-options "-std=c99" } */
+
+
+
+/* There is nothing special about the calculations here, this is just
+ a test that can be compiled and run. */
+
+extern void abort (void);
+
+__Float64x2_t __attribute__ ((noinline, aarch64_vector_pcs))
+foo(__Float64x2_t a, __Float64x2_t b, __Float64x2_t c,
+ __Float64x2_t d, __Float64x2_t e, __Float64x2_t f,
+ __Float64x2_t g, __Float64x2_t h, __Float64x2_t i)
+{
+ __Float64x2_t w, x, y, z;
+ w = a + b * c;
+ x = d + e * f;
+ y = g + h * i;
+ return w + x * y;
+}
+
+
+int main()
+{
+ __Float64x2_t a, b, c, d;
+ a = (__Float64x2_t) { 1.0, 2.0 };
+ b = (__Float64x2_t) { 3.0, 4.0 };
+ c = (__Float64x2_t) { 5.0, 6.0 };
+ d = foo (a, b, c, (a+b), (b+c), (a+c), (a-b), (b-c), (a-c)) + a + b + c;
+ if (d[0] != 337.0 || d[1] != 554.0)
+ abort ();
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
index e69de29..7d639a5e 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
+++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+
+void __attribute__ ((aarch64_vector_pcs))
+f (void)
+{
+ /* Clobber some fp/simd regs and verify that only those are saved
+ and restored in the prologue and epilogue of a SIMD function. */
+ __asm__ __volatile__ ("" ::: "q8", "q9", "q10", "q11");
+}
+
+/* { dg-final { scan-assembler {\sstp\tq8, q9} } } */
+/* { dg-final { scan-assembler {\sstp\tq10, q11} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq1[23456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq1[23456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\td} } } */
+/* { dg-final { scan-assembler-not {\sldp\td} } } */
+/* { dg-final { scan-assembler-not {\sstr\t} } } */
+/* { dg-final { scan-assembler-not {\sldr\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-6.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-6.c
index e69de29..dc13e16 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-6.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+void __attribute__ ((aarch64_vector_pcs))
+f (void)
+{
+ /* Clobber some fp/simd regs and verify that only those are saved
+ and restored in the prologue and epilogue of a SIMD function. */
+ __asm__ __volatile__ ("" ::: "q8", "q10", "q11");
+}
+
+/* { dg-final { scan-assembler {\sstp\tq8, q10} } } */
+/* { dg-final { scan-assembler {\sstr\tq11} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq[0345679]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq[0345679]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq1[123456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq1[123456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\td} } } */
+/* { dg-final { scan-assembler-not {\sldp\td} } } */
+/* { dg-final { scan-assembler-not {\sstr\tq[023456789]} } } */
+/* { dg-final { scan-assembler-not {\sldr\tq[023456789]} } } */
+/* { dg-final { scan-assembler-not {\sstr\tq1[023456789]} } } */
+/* { dg-final { scan-assembler-not {\sldr\tq1[023456789]} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-7.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-7.c
index e69de29..aafd973 100644
--- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-7.c
+++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-7.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+void __attribute__ ((aarch64_vector_pcs))
+f (void)
+{
+ /* Clobber some fp/simd regs and verify that only those are saved
+ and restored in the prologue and epilogue of a SIMD function. */
+ __asm__ __volatile__ ("" ::: "q8", "q9", "q11");
+}
+
+/* { dg-final { scan-assembler {\sstp\tq8, q9} } } */
+/* { dg-final { scan-assembler {\sstr\tq11} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq[034567]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq1[0123456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq1[0123456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sldp\tq2[456789]} } } */
+/* { dg-final { scan-assembler-not {\sstp\td} } } */
+/* { dg-final { scan-assembler-not {\sldp\td} } } */
+/* { dg-final { scan-assembler-not {\sstr\tq[023456789]} } } */
+/* { dg-final { scan-assembler-not {\sldr\tq[023456789]} } } */
+/* { dg-final { scan-assembler-not {\sstr\tq1[023456789]} } } */
+/* { dg-final { scan-assembler-not {\sldr\tq1[023456789]} } } */