This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins.
- From: Matthew Wahab <matthew dot wahab at arm dot com>
- To: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 21 May 2015 16:57:00 +0100
- Subject: [PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins.
- Authentication-results: sourceware.org; auth=none
On Aarch64, the __sync builtins are implemented using the __atomic operations
and barriers. This makes the the __sync builtins inconsistent with their
documentation which requires stronger barriers than those for the __atomic
builtins.
The difference between __sync and __atomic builtins is that the restrictions
imposed by a __sync operation's barrier apply to all memory references while the
restrictions of an __atomic operation's barrier only need to apply to a
subset. This affects Aarch64 in particular because, although its implementation
of __atomic builtins is correct, the barriers generated are too weak for the
__sync builtins.
The affected __sync builtins are the __sync_fetch_and_op (and
__sync_op_and_fetch) functions, __sync_compare_and_swap and
__sync_lock_test_and_set. This and a following patch modifies the code generated
for these functions to weaken initial load-acquires to a simple load and to add
a final fence to prevent code-hoisting. The last patch will add tests for the
code generated by the Aarch64 backend for the __sync builtins.
- Full barriers: __sync_fetch_and_op, __sync_op_and_fetch
__sync_*_compare_and_swap
[load-acquire; code; store-release]
becomes
[load; code ; store-release; fence].
- Acquire barriers: __sync_lock_test_and_set
[load-acquire; code; store]
becomes
[load; code; store; fence]
The code generated for release barriers and for the __atomic builtins is
unchanged.
This patch changes the code generated for __sync_fetch_and_<op> and
__sync_<op>_and_fetch builtins.
Tested with check-gcc for aarch64-none-linux-gnu.
Ok for trunk?
Matthew
gcc/
2015-05-21 Matthew Wahab <matthew.wahab@arm.com>
* config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
(aarch64_split_atomic_op): Check for __sync memory models, emit
appropriate initial and final barriers.
From 2092902d2738b0c24a6272e0b3480bb9cffd275c Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Fri, 15 May 2015 09:26:28 +0100
Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin.
Change-Id: I3342a572d672163ffc703e4e51603744680334fc
---
gcc/config/aarch64/aarch64.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7f0cc0d..778571f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9249,6 +9249,22 @@ aarch64_expand_compare_and_swap (rtx operands[])
emit_insn (gen_rtx_SET (bval, x));
}
+/* Emit a post-operation barrier. */
+
+static void
+aarch64_emit_post_barrier (enum memmodel model)
+{
+ const enum memmodel base_model = memmodel_base (model);
+
+ if (is_mm_sync (model)
+ && (base_model == MEMMODEL_ACQUIRE
+ || base_model == MEMMODEL_ACQ_REL
+ || base_model == MEMMODEL_SEQ_CST))
+ {
+ emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+ }
+}
+
/* Split a compare and swap pattern. */
void
@@ -9311,12 +9327,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
{
machine_mode mode = GET_MODE (mem);
machine_mode wmode = (mode == DImode ? DImode : SImode);
+ const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
+ const bool is_sync = is_mm_sync (model);
+ rtx load_model_rtx = model_rtx;
rtx_code_label *label;
rtx x;
label = gen_label_rtx ();
emit_label (label);
+ /* A __sync operation will emit a final fence to stop code hoisting, so the
+ load can be relaxed. */
+ if (is_sync)
+ load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+
if (new_out)
new_out = gen_lowpart (wmode, new_out);
if (old_out)
@@ -9325,7 +9349,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
old_out = new_out;
value = simplify_gen_subreg (wmode, value, mode, 0);
- aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+ aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx);
switch (code)
{
@@ -9361,6 +9385,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+ /* Emit any fence needed for a __sync operation. */
+ if (is_sync)
+ aarch64_emit_post_barrier (model);
}
static void
--
1.9.1