This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][AArch64] Emit tighter strong atomic compare-exchange loop when comparing against zero
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <marcus dot shawcroft at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, <nd at arm dot com>
- Date: Fri, 2 Jun 2017 15:07:00 +0100
- Subject: Re: [PATCH][AArch64] Emit tighter strong atomic compare-exchange loop when comparing against zero
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=pass (sender IP is 217.140.96.140) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com;
- Nodisclaimer: True
- References: <58C032DA.2070004@foss.arm.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Wed, Mar 08, 2017 at 04:35:38PM +0000, Kyrill Tkachov wrote:
> Hi all,
>
> For the testcase in this patch where the value of x is zero we currently
> generate:
> foo:
> mov w1, 4
> .L2:
> ldaxr w2, [x0]
> cmp w2, 0
> bne .L3
> stxr w3, w1, [x0]
> cbnz w3, .L2
> .L3:
> cset w0, eq
> ret
>
> We currently cannot merge the cmp and b.ne inside the loop into a cbnz
> because we need the condition flags set for the return value of the function
> (i.e. the cset at the end). But if we re-jig the sequence in that case we
> can generate a tighter loop:
> foo:
> mov w1, 4
> .L2:
> ldaxr w2, [x0]
> cbnz w2, .L3
> stxr w3, w1, [x0]
> cbnz w3, .L2
> .L3:
> cmp w2, 0
> cset w0, eq
> ret
>
> So we add an explicit compare after the loop and inside the loop we use the
> fact that we're comparing against zero to emit a CBNZ. This means we may
> re-do the comparison twice (once inside the CBNZ, once at the CMP at the
> end), but there is now less code inside the loop.
>
> I've seen this sequence appear in glibc locking code so maybe it's worth
> adding the extra bit of complexity to the compare-exchange splitter to catch
> this case.
>
> Bootstrapped and tested on aarch64-none-linux-gnu. In previous iterations of
> the patch where I had gotten some logic wrong it would cause miscompiles of
> libgomp leading to timeouts in its testsuite but this version passes
> everything cleanly.
>
> Ok for GCC 8? (I know it's early, but might as well get it out in case
> someone wants to try it out)
OK.
Thanks,
James
>
> Thanks,
> Kyrill
>
> 2017-03-08 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64.c (aarch64_split_compare_and_swap):
> Emit CBNZ inside loop when doing a strong exchange and comparing
> against zero. Generate the CC flags after the loop.
>
> 2017-03-08 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: New test.
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 76a2de20dfcd4ea38fb7c58a9e8612509c5987bd..5fa8e197328ce4cb1718ff7d99b1ea95e02129a4 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12095,6 +12095,17 @@ aarch64_split_compare_and_swap (rtx operands[])
> mode = GET_MODE (mem);
> model = memmodel_from_int (INTVAL (model_rtx));
>
> + /* When OLDVAL is zero and we want the strong version we can emit a tighter
> + loop:
> + .label1:
> + LD[A]XR rval, [mem]
> + CBNZ rval, .label2
> + ST[L]XR scratch, newval, [mem]
> + CBNZ scratch, .label1
> + .label2:
> + CMP rval, 0. */
> + bool strong_zero_p = !is_weak && oldval == const0_rtx;
> +
> label1 = NULL;
> if (!is_weak)
> {
> @@ -12111,11 +12122,21 @@ aarch64_split_compare_and_swap (rtx operands[])
> else
> aarch64_emit_load_exclusive (mode, rval, mem, model_rtx);
>
> - cond = aarch64_gen_compare_reg (NE, rval, oldval);
> - x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
> - x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> - gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
> - aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> + if (strong_zero_p)
> + {
> + x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
> + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
> + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> + }
> + else
> + {
> + cond = aarch64_gen_compare_reg (NE, rval, oldval);
> + x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
> + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
> + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> + }
>
> aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
>
> @@ -12134,7 +12155,15 @@ aarch64_split_compare_and_swap (rtx operands[])
> }
>
> emit_label (label2);
> -
> + /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL
> + to set the condition flags. If this is not used it will be removed by
> + later passes. */
> + if (strong_zero_p)
> + {
> + cond = gen_rtx_REG (CCmode, CC_REGNUM);
> + x = gen_rtx_COMPARE (CCmode, rval, const0_rtx);
> + emit_insn (gen_rtx_SET (cond, x));
> + }
> /* Emit any final barrier needed for a __sync operation. */
> if (is_mm_sync (model))
> aarch64_emit_post_barrier (model);
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b14a7c294376f03cd13077d18d865f83a04bd04e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int *a)
> +{
> + int x = 0;
> + return __atomic_compare_exchange_n (a, &x, 4, 0,
> + __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE);
> +}
> +
> +/* { dg-final { scan-assembler-times "cbnz\\tw\[0-9\]+" 2 } } */