This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Emit tighter strong atomic compare-exchange loop when comparing against zero


On Wed, Mar 08, 2017 at 04:35:38PM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> For the testcase in this patch where the value of x is zero we currently
> generate:
> foo:
>         mov     w1, 4
> .L2:
>         ldaxr   w2, [x0]
>         cmp     w2, 0
>         bne     .L3
>         stxr    w3, w1, [x0]
>         cbnz    w3, .L2
> .L3:
>         cset    w0, eq
>         ret
> 
> We currently cannot merge the cmp and b.ne inside the loop into a cbnz
> because we need the condition flags set for the return value of the function
> (i.e. the cset at the end).  But if we re-jig the sequence in that case we
> can generate a tighter loop:
> foo:
>         mov     w1, 4
> .L2:
>         ldaxr   w2, [x0]
>         cbnz    w2, .L3
>         stxr    w3, w1, [x0]
>         cbnz    w3, .L2
> .L3:
>         cmp     w2, 0
>         cset    w0, eq
>         ret
> 
> So we add an explicit compare after the loop and inside the loop we use the
> fact that we're comparing against zero to emit a CBNZ. This means we may
> re-do the comparison twice (once inside the CBNZ, once at the CMP at the
> end), but there is now less code inside the loop.
> 
> I've seen this sequence appear in glibc locking code so maybe it's worth
> adding the extra bit of complexity to the compare-exchange splitter to catch
> this case.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu. In previous iterations of
> the patch where I had gotten some logic wrong it would cause miscompiles of
> libgomp leading to timeouts in its testsuite but this version passes
> everything cleanly.
> 
> Ok for GCC 8? (I know it's early, but might as well get it out in case
> someone wants to try it out)

OK.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2017-03-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * config/aarch64/aarch64.c (aarch64_split_compare_and_swap):
>     Emit CBNZ inside loop when doing a strong exchange and comparing
>     against zero.  Generate the CC flags after the loop.
> 
> 2017-03-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: New test.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 76a2de20dfcd4ea38fb7c58a9e8612509c5987bd..5fa8e197328ce4cb1718ff7d99b1ea95e02129a4 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12095,6 +12095,17 @@ aarch64_split_compare_and_swap (rtx operands[])
>    mode = GET_MODE (mem);
>    model = memmodel_from_int (INTVAL (model_rtx));
>  
> +  /* When OLDVAL is zero and we want the strong version we can emit a tighter
> +    loop:
> +    .label1:
> +	LD[A]XR	rval, [mem]
> +	CBNZ	rval, .label2
> +	ST[L]XR	scratch, newval, [mem]
> +	CBNZ	scratch, .label1
> +    .label2:
> +	CMP	rval, 0.  */
> +  bool strong_zero_p = !is_weak && oldval == const0_rtx;
> +
>    label1 = NULL;
>    if (!is_weak)
>      {
> @@ -12111,11 +12122,21 @@ aarch64_split_compare_and_swap (rtx operands[])
>    else
>      aarch64_emit_load_exclusive (mode, rval, mem, model_rtx);
>  
> -  cond = aarch64_gen_compare_reg (NE, rval, oldval);
> -  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
> -  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> -			    gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
> -  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> +  if (strong_zero_p)
> +    {
> +      x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
> +      x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> +				gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
> +      aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> +    }
> +  else
> +    {
> +      cond = aarch64_gen_compare_reg (NE, rval, oldval);
> +      x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
> +      x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> +				 gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
> +      aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> +    }
>  
>    aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
>  
> @@ -12134,7 +12155,15 @@ aarch64_split_compare_and_swap (rtx operands[])
>      }
>  
>    emit_label (label2);
> -
> +  /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL
> +     to set the condition flags.  If this is not used it will be removed by
> +     later passes.  */
> +  if (strong_zero_p)
> +    {
> +      cond = gen_rtx_REG (CCmode, CC_REGNUM);
> +      x = gen_rtx_COMPARE (CCmode, rval, const0_rtx);
> +      emit_insn (gen_rtx_SET (cond, x));
> +    }
>    /* Emit any final barrier needed for a __sync operation.  */
>    if (is_mm_sync (model))
>      aarch64_emit_post_barrier (model);
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b14a7c294376f03cd13077d18d865f83a04bd04e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int *a)
> +{
> +  int x = 0;
> +  return __atomic_compare_exchange_n (a, &x, 4, 0,
> +				      __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE);
> +}
> +
> +/* { dg-final { scan-assembler-times "cbnz\\tw\[0-9\]+" 2 } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]