[PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
Richard Earnshaw
rearnsha@arm.com
Fri Oct 10 15:40:00 GMT 2014
On 10/10/14 15:22, Kyrill Tkachov wrote:
>
> On 10/10/14 15:18, Richard Earnshaw wrote:
>> On 10/10/14 11:53, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>>
>>> Some early revisions of the Cortex-A53 have an erratum (835769) whereby
>>> it is possible for a 64-bit multiply-accumulate instruction in
>>> AArch64 state to generate an incorrect result. The details are quite
>>> complex and hard to determine statically, since branches in the code
>>> may exist in some circumstances, but all cases end with a memory
>>> (load, store, or prefetch) instruction followed immediately by the
>>> multiply-accumulate operation.
>>>
>>> The safest work-around for this issue is to make the compiler avoid
>>> emitting multiply-accumulate instructions immediately after memory
>>> instructions and the simplest way to do this is to insert a NOP. A
>>> linker patching technique can also be used, by moving potentially
>>> affected multiply-accumulate instruction into a patch region and
>>> replacing the original instruction with a branch to the patch.
>>>
>>> This patch achieves the compiler part by using the final prescan pass.
>>> The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
>>> so that conditional branches work properly.
>>>
>>> The fix is disabled by default and can be turned on by the
>>> -mfix-cortex-a53-835769 compiler option.
>>>
>>> I'm attaching a trunk and a 4.9 version of the patch.
>>> The 4.9 version is different only in that rtx_insn* is replaced by rtx.
>>>
>>> Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
>>> built various large benchmarks with it.
>>>
>>> Ok?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2014-10-10 Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>>> Ramana Radhakrishnan<ramana.radhakrishnan@arm.com>
>>>
>>> * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
>>> (ADJUST_INSN_LENGTH): Define.
>>> * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
>>> * config/aarch64/aarch64.c (is_mem_p): New function.
>>> (is_memory_op): Likewise.
>>> (aarch64_prev_real_insn): Likewise.
>>> (is_madd_op): Likewise.
>>> (dep_between_memop_and_next): Likewise.
>>> (aarch64_madd_needs_nop): Likewise.
>>> (aarch64_final_prescan_insn): Likewise.
>>> * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
>>> and -mno-fix-cortex-a53-835769 options.
>>>
>>>
>>> a53-workaround-new.patch
>>>
>>>
>>> commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
>>> Author: Kyrill Tkachov <kyrylo.tkachov@arm.com>
>>> Date: Wed Oct 8 12:48:34 2014 +0000
>>>
>>> [AArch64] Add final prescan workaround for Cortex-A53 erratum.
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>>> index b5f53d2..c57a467 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
>>>
>>> extern void aarch64_split_combinev16qi (rtx operands[3]);
>>> extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
>>> +extern bool aarch64_madd_needs_nop (rtx_insn *);
>>> +extern void aarch64_final_prescan_insn (rtx_insn *);
>>> extern bool
>>> aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
>>> void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 5144c35..76a2480 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
>>> return NULL;
>>> }
>>>
>>> +static int
>>> +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
>>> +{
>>> + return MEM_P (*x);
>>> +}
>>> +
>>> +static bool
>>> +is_memory_op (rtx_insn *mem_insn)
>>> +{
>>> + rtx pattern = PATTERN (mem_insn);
>>> + return for_each_rtx (&pattern, is_mem_p, NULL);
>>> +}
>>> +
>>> +/* Find the first rtx_insn before insn that will generate an assembly
>>> + instruction. */
>>> +
>>> +static rtx_insn *
>>> +aarch64_prev_real_insn (rtx_insn *insn)
>>> +{
>>> + if (!insn)
>>> + return NULL;
>>> +
>>> + do
>>> + {
>>> + insn = prev_real_insn (insn);
>>> + }
>>> + while (insn && recog_memoized (insn) < 0);
>>> +
>>> + return insn;
>>> +}
>>> +
>>> +static bool
>>> +is_madd_op (enum attr_type t1)
>>> +{
>>> + unsigned int i;
>>> + /* A number of these may be AArch32 only. */
>>> + enum attr_type mlatypes[] = {
>>> + TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
>>> + TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
>>> + TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
>>> + };
>>> +
>>> + for (i = 0; i < sizeof (mlatypes) / sizeof (enum attr_type); i++)
>>> + {
>>> + if (t1 == mlatypes[i])
>>> + return true;
>>> + }
>>> +
>>> + return false;
>>> +}
>>> +
>>> +/* Check if there is a register dependency between a load and the insn
>>> + for which we hold recog_data. */
>>> +
>>> +static bool
>>> +dep_between_memop_and_curr (rtx memop)
>>> +{
>>> + rtx load_reg;
>>> + int opno;
>>> +
>>> + if (!memop)
>>> + return false;
>>> +
>>> + if (!REG_P (SET_DEST (memop)))
>>> + return false;
>>> +
>>> + load_reg = SET_DEST (memop);
>>> + for (opno = 0; opno < recog_data.n_operands; opno++)
>>> + {
>>> + rtx operand = recog_data.operand[opno];
>>> + if (REG_P (operand)
>>> + && reg_overlap_mentioned_p (load_reg, operand))
>>> + return true;
>>> +
>>> + }
>>> + return false;
>>> +}
>>> +
>>> +bool
>>> +aarch64_madd_needs_nop (rtx_insn* insn)
>>> +{
>>> + enum attr_type attr_type;
>>> + rtx_insn *prev;
>>> + rtx body;
>>> +
>>> + if (!aarch64_fix_a53_err835769)
>>> + return false;
>>> +
>>> + if (recog_memoized (insn) < 0)
>>> + return false;
>>> +
>>> + attr_type = get_attr_type (insn);
>>> + if (!is_madd_op (attr_type))
>>> + return false;
>>> +
>>> + prev = aarch64_prev_real_insn (insn);
>>> + if (!prev)
>>> + return false;
>>> +
>>> + body = single_set (prev);
>>> +
>>> + /* If the previous insn is a memory op and there is no dependency between
>>> + it and the madd, emit a nop between them. If we know the previous insn is
>>> + a memory op but body is NULL, emit the nop to be safe, it's probably a
>>> + load/store pair insn. */
>>> + if (is_memory_op (prev)
>>> + && GET_MODE (recog_data.operand[0]) == DImode
>>> + && (!dep_between_memop_and_curr (body)))
>>> + return true;
>>> +
>>> + return false;
>>> +
>>> +}
>>> +
>>> +void
>>> +aarch64_final_prescan_insn (rtx_insn *insn)
>>> +{
>>> + if (aarch64_madd_needs_nop (insn))
>>> + fprintf (asm_out_file, "\tnop // between mem op and mult-accumulate\n");
>>> +}
>>> +
>>> +
>>> /* Return the equivalent letter for size. */
>>> static char
>>> sizetochar (int size)
>>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>>> index db950da..e9e5fd8 100644
>>> --- a/gcc/config/aarch64/aarch64.h
>>> +++ b/gcc/config/aarch64/aarch64.h
>>> @@ -486,6 +486,15 @@ enum target_cpus
>>> (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS << 6))
>>> #endif
>>>
>>> +/* If inserting NOP before a mult-accumulate insn remember to adjust the
>>> + length so that conditional branching code is updated appropriately. */
>>> +#define ADJUST_INSN_LENGTH(insn, length) \
>>> + if (aarch64_madd_needs_nop (insn)) \
>>> + length += 4;
>>> +
>>> +#define FINAL_PRESCAN_INSN(INSN, OPVEC, NOPERANDS) \
>>> + aarch64_final_prescan_insn (INSN); \
>>> +
>>> /* The processor for which instructions should be scheduled. */
>>> extern enum aarch64_processor aarch64_tune;
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
>>> index f5a15b7..77deb2e 100644
>>> --- a/gcc/config/aarch64/aarch64.opt
>>> +++ b/gcc/config/aarch64/aarch64.opt
>>> @@ -67,6 +67,10 @@ mgeneral-regs-only
>>> Target Report RejectNegative Mask(GENERAL_REGS_ONLY)
>>> Generate code which uses only the general registers
>>>
>>> +mfix-cortex-a53-835769
>>> +Target Report Var(aarch64_fix_a53_err835769) Init(0)
>>> +Workaround for ARM Cortex-A53 Erratum number 835769
>>> +
>>> mlittle-endian
>>> Target Report RejectNegative InverseMask(BIG_END)
>>> Assume target CPU is configured as little endian
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 5fe7e15..acc9dd9 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -489,6 +489,7 @@ Objective-C and Objective-C++ Dialects}.
>>> -mstrict-align @gol
>>> -momit-leaf-frame-pointer -mno-omit-leaf-frame-pointer @gol
>>> -mtls-dialect=desc -mtls-dialect=traditional @gol
>>> +-mfix-cortex-a53-835769 -mno-fix-cortex-a53-835769 @gol
>>> -march=@var{name} -mcpu=@var{name} -mtune=@var{name}}
>>>
>>> @emph{Adapteva Epiphany Options}
>>> @@ -11744,6 +11745,14 @@ of TLS variables. This is the default.
>>> Use traditional TLS as the thread-local storage mechanism for dynamic accesses
>>> of TLS variables.
>>>
>>> +@item -mfix-cortex-a53-835769
>>> +@itemx -mno-fix-cortex-a53-835769
>>> +@opindex -mfix-cortex-a53-835769
>>> +@opindex -mno-fix-cortex-a53-835769
>>> +Enable or disable the workaround for the ARM Cortex-A53 erratum number 835769.
>>> +This will involve inserting a NOP instruction between memory instructions and
>>> +64-bit integer multiply-accumulate instructions.
>>> +
>>> @item -march=@var{name}
>>> @opindex march
>>> Specify the name of the target architecture, optionally suffixed by one or
>>>
>>>
>>> a53-49-workaround.patch
>>>
>>>
>>> commit 48cc738e2c38dd372ae054d30964dd780478c4d1
>>> Author: Kyrill Tkachov <kyrylo.tkachov@arm.com>
>>> Date: Tue Oct 7 12:42:01 2014 +0000
>>>
>>> [AArch64] Add workaround for Cortex-A53 erratum 835769
>>>
>>> 2014-10-07 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>> Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
>>>
>>> * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
>>> (ADJUST_INSN_LENGTH): Define.
>>> * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New
>>> * option.
>>> * config/aarch64/aarch64.c (is_mem_p): New function.
>>> (is_memory_op): Likewise.
>>> (aarch64_prev_real_insn): Likewise.
>>> (is_madd_op): Likewise.
>>> (dep_between_memop_and_curr): Likewise.
>>> (aarch64_madd_needs_nop): Likewise.
>>> (aarch64_final_prescan_insn): Likewise.
>>> * doc/invoke.texi (AArch64 Options): Document
>>> * -mfix-cortex-a53-835769
>>> and -mno-fix-cortex-a53-835769 options.
>>>
>>
>>> +
>>> +/* Check if there is a register dependency between a load and the insn
>>> + for which we hold recog_data. */
>>> +
>>> +static bool
>>> +dep_between_memop_and_curr (rtx memop)
>>> +{
>>> + rtx load_reg;
>>> + int opno;
>>> +
>>> + if (!memop)
>>> + return false;
>>> +
>>> + if (!REG_P (SET_DEST (memop)))
>>> + return false;
>>> +
>>> + load_reg = SET_DEST (memop);
>>> + for (opno = 0; opno < recog_data.n_operands; opno++)
>>> + {
>>> + rtx operand = recog_data.operand[opno];
>>> + if (REG_P (operand)
>>> + && reg_overlap_mentioned_p (load_reg, operand))
>>> + return true;
>>> +
>>> + }
>>> + return false;
>>> +}
>>> +
>>> +bool
>>> +aarch64_madd_needs_nop (rtx insn)
>>> +{
>>> + enum attr_type attr_type;
>>> + rtx prev;
>>> + rtx body;
>>> +
>>> + if (!aarch64_fix_a53_err835769)
>>> + return false;
>>> +
>>> + if (recog_memoized (insn) < 0)
>>> + return false;
>>> +
>>> + attr_type = get_attr_type (insn);
>>> + if (!is_madd_op (attr_type))
>>> + return false;
>>> +
>>> + prev = aarch64_prev_real_insn (insn);
>>> + if (!prev)
>>> + return false;
>>> +
>>> + body = single_set (prev);
>>> +
>>> + /* If the previous insn is a memory op and there is no dependency between
>>> + it and the madd, emit a nop between them. If we know the previous insn is
>>> + a memory op but body is NULL, emit the nop to be safe, it's probably a
>>> + load/store pair insn. */
>>> + if (is_memory_op (prev)
>>> + && GET_MODE (recog_data.operand[0]) == DImode
>>> + && (!dep_between_memop_and_curr (body)))
>>> + return true;
>>> +
>> It seems a bit strange to me that we expect dep_between_memop_and_curr
>> to detect the non-single_set() case and act conservatively. Better
>> would be to pull that test out to this level, giving us:
>>
>>
>> body = single_set (prev);
>> if (!body)
>> /* Complex memory operation, possibly a load/store pair insn. Just
>> be conservative for now. */
>> return true;
>
> Can do, but I think it should be:
> body = single_set (prev);
> if (!body && is_memory_op (prev))
> return true;
>
> since we want to do this only when we really do know it's a memory op.
> And restructure this a bit so that is_memory_op called checked only once...
>
Then it might be better to pull the is_memory_op() above the
single_set() entirely and bail out early if it's not a memory operation.
R.
> Thanks,
> Kyrill
>
>
>>
>> ...
>>
>> Now dep_between... can assert that it is passed a valid SET() rtx.
>>
>> R.
>>
>
More information about the Gcc-patches
mailing list