This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

From: Richard Earnshaw <rearnsha at arm dot com>
To: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Cc: Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>
Date: Fri, 10 Oct 2014 15:18:03 +0100
Subject: Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
Authentication-results: sourceware.org; auth=none
References: <5437BAA5 dot 1000003 at arm dot com>
On 10/10/14 11:53, Kyrill Tkachov wrote:
> Hi all,
> 
> 
> Some early revisions of the Cortex-A53 have an erratum (835769) whereby
> it is possible for a 64-bit multiply-accumulate instruction in
> AArch64 state to generate an incorrect result.  The details are quite
> complex and hard to determine statically, since branches in the code
> may exist in some circumstances, but all cases end with a memory
> (load, store, or prefetch) instruction followed immediately by the
> multiply-accumulate operation.
> 
> The safest work-around for this issue is to make the compiler avoid
> emitting multiply-accumulate instructions immediately after memory
> instructions and the simplest way to do this is to insert a NOP. A
> linker patching technique can also be used, by moving potentially
> affected multiply-accumulate instruction into a patch region and
> replacing the original instruction with a branch to the patch.
> 
> This patch achieves the compiler part by using the final prescan pass.
> The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
> so that conditional branches work properly.
> 
> The fix is disabled by default and can be turned on by the
> -mfix-cortex-a53-835769 compiler option.
> 
> I'm attaching a trunk and a 4.9 version of the patch.
> The 4.9 version is different only in that rtx_insn* is replaced by rtx.
> 
> Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
> built various large benchmarks with it.
> 
> Ok?
> 
> Thanks,
> Kyrill
> 
> 2014-10-10  Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>              Ramana Radhakrishnan<ramana.radhakrishnan@arm.com>
> 
>       * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
>       (ADJUST_INSN_LENGTH): Define.
>       * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
>       * config/aarch64/aarch64.c (is_mem_p): New function.
>       (is_memory_op): Likewise.
>       (aarch64_prev_real_insn): Likewise.
>       (is_madd_op): Likewise.
>       (dep_between_memop_and_next): Likewise.
>       (aarch64_madd_needs_nop): Likewise.
>       (aarch64_final_prescan_insn): Likewise.
>       * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
>       and -mno-fix-cortex-a53-835769 options.
> 
> 
> a53-workaround-new.patch
> 
> 
> commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
> Author: Kyrill Tkachov <kyrylo.tkachov@arm.com>
> Date:   Wed Oct 8 12:48:34 2014 +0000
> 
>     [AArch64] Add final prescan workaround for Cortex-A53 erratum.
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index b5f53d2..c57a467 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
>  
>  extern void aarch64_split_combinev16qi (rtx operands[3]);
>  extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
> +extern bool aarch64_madd_needs_nop (rtx_insn *);
> +extern void aarch64_final_prescan_insn (rtx_insn *);
>  extern bool
>  aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
>  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5144c35..76a2480 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
>    return NULL;
>  }
>  
> +static int
> +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
> +{
> +  return MEM_P (*x);
> +}
> +
> +static bool
> +is_memory_op (rtx_insn *mem_insn)
> +{
> +   rtx pattern = PATTERN (mem_insn);
> +   return for_each_rtx (&pattern, is_mem_p, NULL);
> +}
> +
> +/* Find the first rtx_insn before insn that will generate an assembly
> +   instruction.  */
> +
> +static rtx_insn *
> +aarch64_prev_real_insn (rtx_insn *insn)
> +{
> +  if (!insn)
> +    return NULL;
> +
> +  do
> +    {
> +      insn = prev_real_insn (insn);
> +    }
> +  while (insn && recog_memoized (insn) < 0);
> +
> +  return insn;
> +}
> +
> +static bool
> +is_madd_op (enum attr_type t1)
> +{
> +  unsigned int i;
> +  /* A number of these may be AArch32 only.  */
> +  enum attr_type mlatypes[] = {
> +    TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
> +    TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
> +    TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
> +  };
> +
> +  for (i = 0; i < sizeof (mlatypes) / sizeof (enum attr_type); i++)
> +    {
> +      if (t1 == mlatypes[i])
> +	return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Check if there is a register dependency between a load and the insn
> +   for which we hold recog_data.  */
> +
> +static bool
> +dep_between_memop_and_curr (rtx memop)
> +{
> +  rtx load_reg;
> +  int opno;
> +
> +  if (!memop)
> +    return false;
> +
> +  if (!REG_P (SET_DEST (memop)))
> +    return false;
> +
> +  load_reg = SET_DEST (memop);
> +  for (opno = 0; opno < recog_data.n_operands; opno++)
> +    {
> +      rtx operand = recog_data.operand[opno];
> +      if (REG_P (operand)
> +          && reg_overlap_mentioned_p (load_reg, operand))
> +        return true;
> +
> +    }
> +  return false;
> +}
> +
> +bool
> +aarch64_madd_needs_nop (rtx_insn* insn)
> +{
> +  enum attr_type attr_type;
> +  rtx_insn *prev;
> +  rtx body;
> +
> +  if (!aarch64_fix_a53_err835769)
> +    return false;
> +
> +  if (recog_memoized (insn) < 0)
> +    return false;
> +
> +  attr_type = get_attr_type (insn);
> +  if (!is_madd_op (attr_type))
> +    return false;
> +
> +  prev = aarch64_prev_real_insn (insn);
> +  if (!prev)
> +    return false;
> +
> +  body = single_set (prev);
> +
> +  /* If the previous insn is a memory op and there is no dependency between
> +     it and the madd, emit a nop between them.  If we know the previous insn is
> +     a memory op but body is NULL, emit the nop to be safe, it's probably a
> +     load/store pair insn.  */
> +  if (is_memory_op (prev)
> +      && GET_MODE (recog_data.operand[0]) == DImode
> +      && (!dep_between_memop_and_curr (body)))
> +    return true;
> +
> +  return false;
> +
> +}
> +
> +void
> +aarch64_final_prescan_insn (rtx_insn *insn)
> +{
> +  if (aarch64_madd_needs_nop (insn))
> +    fprintf (asm_out_file, "\tnop // between mem op and mult-accumulate\n");
> +}
> +
> +
>  /* Return the equivalent letter for size.  */
>  static char
>  sizetochar (int size)
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index db950da..e9e5fd8 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -486,6 +486,15 @@ enum target_cpus
>    (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS << 6))
>  #endif
>  
> +/* If inserting NOP before a mult-accumulate insn remember to adjust the
> +   length so that conditional branching code is updated appropriately.  */
> +#define ADJUST_INSN_LENGTH(insn, length)	\
> +  if (aarch64_madd_needs_nop (insn))		\
> +    length += 4;
> +
> +#define FINAL_PRESCAN_INSN(INSN, OPVEC, NOPERANDS)	\
> +    aarch64_final_prescan_insn (INSN);			\
> +
>  /* The processor for which instructions should be scheduled.  */
>  extern enum aarch64_processor aarch64_tune;
>  
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index f5a15b7..77deb2e 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -67,6 +67,10 @@ mgeneral-regs-only
>  Target Report RejectNegative Mask(GENERAL_REGS_ONLY)
>  Generate code which uses only the general registers
>  
> +mfix-cortex-a53-835769
> +Target Report Var(aarch64_fix_a53_err835769) Init(0)
> +Workaround for ARM Cortex-A53 Erratum number 835769
> +
>  mlittle-endian
>  Target Report RejectNegative InverseMask(BIG_END)
>  Assume target CPU is configured as little endian
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 5fe7e15..acc9dd9 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -489,6 +489,7 @@ Objective-C and Objective-C++ Dialects}.
>  -mstrict-align @gol
>  -momit-leaf-frame-pointer  -mno-omit-leaf-frame-pointer @gol
>  -mtls-dialect=desc  -mtls-dialect=traditional @gol
> +-mfix-cortex-a53-835769  -mno-fix-cortex-a53-835769 @gol
>  -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
>  
>  @emph{Adapteva Epiphany Options}
> @@ -11744,6 +11745,14 @@ of TLS variables.  This is the default.
>  Use traditional TLS as the thread-local storage mechanism for dynamic accesses
>  of TLS variables.
>  
> +@item -mfix-cortex-a53-835769
> +@itemx -mno-fix-cortex-a53-835769
> +@opindex -mfix-cortex-a53-835769
> +@opindex -mno-fix-cortex-a53-835769
> +Enable or disable the workaround for the ARM Cortex-A53 erratum number 835769.
> +This will involve inserting a NOP instruction between memory instructions and
> +64-bit integer multiply-accumulate instructions.
> +
>  @item -march=@var{name}
>  @opindex march
>  Specify the name of the target architecture, optionally suffixed by one or
> 
> 
> a53-49-workaround.patch
> 
> 
> commit 48cc738e2c38dd372ae054d30964dd780478c4d1
> Author: Kyrill Tkachov <kyrylo.tkachov@arm.com>
> Date:   Tue Oct 7 12:42:01 2014 +0000
> 
>     [AArch64] Add workaround for Cortex-A53 erratum 835769
>     
>     2014-10-07  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>                 Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
>     
>     	* config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
>     	(ADJUST_INSN_LENGTH): Define.
>     	* config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New
>     	* option.
>     	* config/aarch64/aarch64.c (is_mem_p): New function.
>     	(is_memory_op): Likewise.
>     	(aarch64_prev_real_insn): Likewise.
>     	(is_madd_op): Likewise.
>     	(dep_between_memop_and_curr): Likewise.
>     	(aarch64_madd_needs_nop): Likewise.
>     	(aarch64_final_prescan_insn): Likewise.
>     	* doc/invoke.texi (AArch64 Options): Document
>     	* -mfix-cortex-a53-835769
>     	and -mno-fix-cortex-a53-835769 options.
> 


> +
> +/* Check if there is a register dependency between a load and the insn
> +   for which we hold recog_data.  */
> +
> +static bool
> +dep_between_memop_and_curr (rtx memop)
> +{
> +  rtx load_reg;
> +  int opno;
> +
> +  if (!memop)
> +    return false;
> +
> +  if (!REG_P (SET_DEST (memop)))
> +    return false;
> +
> +  load_reg = SET_DEST (memop);
> +  for (opno = 0; opno < recog_data.n_operands; opno++)
> +    {
> +      rtx operand = recog_data.operand[opno];
> +      if (REG_P (operand)
> +          && reg_overlap_mentioned_p (load_reg, operand))
> +        return true;
> +
> +    }
> +  return false;
> +}
> +
> +bool
> +aarch64_madd_needs_nop (rtx insn)
> +{
> +  enum attr_type attr_type;
> +  rtx prev;
> +  rtx body;
> +
> +  if (!aarch64_fix_a53_err835769)
> +    return false;
> +
> +  if (recog_memoized (insn) < 0)
> +    return false;
> +
> +  attr_type = get_attr_type (insn);
> +  if (!is_madd_op (attr_type))
> +    return false;
> +
> +  prev = aarch64_prev_real_insn (insn);
> +  if (!prev)
> +    return false;
> +
> +  body = single_set (prev);
> +
> +  /* If the previous insn is a memory op and there is no dependency between
> +     it and the madd, emit a nop between them.  If we know the previous insn is
> +     a memory op but body is NULL, emit the nop to be safe, it's probably a
> +     load/store pair insn.  */
> +  if (is_memory_op (prev)
> +      && GET_MODE (recog_data.operand[0]) == DImode
> +      && (!dep_between_memop_and_curr (body)))
> +    return true;
> +

It seems a bit strange to me that we expect dep_between_memop_and_curr
to detect the non-single_set() case and act conservatively.  Better
would be to pull that test out to this level, giving us:


  body = single_set (prev);
  if (!body)
    /* Complex memory operation, possibly a load/store pair insn.  Just
       be conservative for now.  */
    return true;

  ...

Now dep_between... can assert that it is passed a valid SET() rtx.

R.
Follow-Ups:
- Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
  - From: Kyrill Tkachov
References:
- [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
  - From: Kyrill Tkachov
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]