Consider the following litmus test that captures bad behaviour: ``` C test { *x = 0; *y = 0; } // fixed initial state where x and y are 0 // Concurrent threads P0 (atomic_int* y,int* x) { int r0 = *x; if (r0 == 1) { atomic_store_explicit(y,1,memory_order_relaxed); } } P1 (atomic_int* y,int* x) { int r0 = atomic_load_explicit(y,memory_order_acquire); *x = 1; } // predicate over final state - does this outcome occur? exists (P0:r0=1 /\ P1:r0=1) ``` where 'P0:r0 = 0' means thread P0, local variable r0 has value 0. When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are: ``` {P0:r0=0; P1:r0=0;} {P0:r0=1; P1:r0=0;} ``` When compiling to target armv7-a using either GCC or LLVM trunk (https://godbolt.org/z/nxWTbEfK1), the compiled program has the following outcomes when simulated under the armv7 model: ``` {P0:r0=0; P1:r0=0;} {P0:r0=1; P1:r0=0;} {P0:r0=1; P1:r0=1;} <--- forbidden by source model, bug! ``` Since the `CMP;MOVEQ r1, #1;STREQ` sequence is predicated on the result of CMP, but the STREQ can be reordered before the LDR of x on thread P0, allowing the outcome {P0:r0=1; P1:r0=1;}. The issue is that the control dependency between the load and store on P0 has been removed by the compiler. Both GCC and LLVM produce the following (replace symbolic registers e.g. %P0_x with concrete registers containing x). ``` ARM test { [P0_r0]=0;[P1_r0]=0;[x]=0;[y]=0; %P0_P0_r0=P0_r0;%P0_x=x;%P0_y=y; %P1_P1_r0=P1_r0;%P1_x=x;%P1_y=y } (*****************************************************************) (* the Telechat toolsuite *) (* *) (* Luke Geeson, University College London, UK. *) (* gcc -O1 -march=armv7-a -pthread --std=c11 -marm -fno-section-anchors*) (* *) (*****************************************************************) P0 | P1 ; LDR R2,[%P0_x] | LDR R2,[%P1_y] ; CMP R2,#1 | DMB ISH ; MOVEQ R1,#1 | MOV R1,#1 ; STREQ R1,[%P0_y] | STR R1,[%P1_x] ; STR R2,[%P0_P0_r0] | STR R2,[%P1_P1_r0] ; BX LR | BX LR ; exists (P0_r0=1 /\ P1_r0=1) <--- yes, allowed by arm.cat, bug! ``` The forbidden {P0:r0=1; P1:r0=1;} behaviour disappears in GCC when optimising using -O2 or above, since R2 is reused in the STREQ when the EQ condition holds - a data dependency masks the loss of the control dependency. This buggy behaviour remains in clang -O2 and above however (https://godbolt.org/z/hGv7P3vGW). To fix this - mark relaxed atomic load/stores as not predicated, and ideally use an explicit branch to enforce the control dependency so that it is not lost: ``` ARM test-fixed { [P0_r0]=0;[P1_r0]=0;[x]=0;[y]=0; %P0_P0_r0=P0_r0;%P0_x=x;%P0_y=y; %P1_P1_r0=P1_r0;%P1_x=x;%P1_y=y } P0 | P1 ; LDR R0,[%P0_x] | MOV R2,#1 ; CMP R0,#1 | LDR R0,[%P1_y] ; BNE L0x2c | DMB ISH ; MOV R2,#1 | STR R2,[%P1_x] ; STR R2,[%P0_y] | STR R0,[%P1_P1_r0] ; L0x2c: | ; STR R0,[%P0_P0_r0] | ; exists (P0_r0=1 /\ P1_r0=1) ``` This prevents the buggy behaviour. We will follow up by reporting the bug for GCC.
Correction We will follow up by reporting the bug for GCC. -> We have reported this in LLVM as well
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>: https://gcc.gnu.org/g:0731889c026bfe8d55c4851422ca5ec9d037f7a0 commit r14-4365-g0731889c026bfe8d55c4851422ca5ec9d037f7a0 Author: Wilco Dijkstra <wilco.dijkstra@arm.com> Date: Fri Sep 29 13:21:10 2023 +0100 Arm: Block predication on atomics [PR111235] The v7 memory ordering model allows reordering of conditional atomic instructions. To avoid this, make all atomic patterns unconditional. Expand atomic loads and stores for all architectures so the memory access can be wrapped into an UNSPEC. Reviewed-by: Ramana Radhakrishnan <ramana.gcc@googlemail.com> gcc/ChangeLog: PR target/111235 * config/arm/constraints.md: Remove Pf constraint. * config/arm/sync.md (arm_atomic_load<mode>): Add new pattern. (arm_atomic_load_acquire<mode>): Likewise. (arm_atomic_store<mode>): Likewise. (arm_atomic_store_release<mode>): Likewise. (atomic_load<mode>): Switch patterns to define_expand. (atomic_store<mode>): Likewise. (arm_atomic_loaddi2_ldrd): Remove predication. (arm_load_exclusive<mode>): Likewise. (arm_load_acquire_exclusive<mode>): Likewise. (arm_load_exclusivesi): Likewise. (arm_load_acquire_exclusivesi): Likewise. (arm_load_exclusivedi): Likewise. (arm_load_acquire_exclusivedi): Likewise. (arm_store_exclusive<mode>): Likewise. (arm_store_release_exclusivedi): Likewise. (arm_store_release_exclusive<mode>): Likewise. * config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR. gcc/testsuite/ChangeLog: PR target/111235 * gcc.dg/rtl/arm/stl-cond.c: Remove test. * gcc.target/arm/atomic_loaddi_7.c: Fix dmb count. * gcc.target/arm/atomic_loaddi_8.c: Likewise. * gcc.target/arm/pr111235.c: Add new test.
(In reply to Luke Geeson from comment #1) > Correction > We will follow up by reporting the bug for GCC. > -> > We have reported this in LLVM as well Could you include the link? Thanks.
Hi there, Apologies here you go: https://github.com/llvm/llvm-project/issues/65106
Thank you for fixing this, Wilco! I will now test this bug using the code emitted by Godbolt. Consider again the same source program. When run through gcc with the same flags we get the ARM assembly test: ``` ARM test { [P0_r0]=0;[P1_r0]=0;[x]=0;[y]=0; %P0_P0_r0=P0_r0;%P0_x=x;%P0_y=y; %P1_P1_r0=P1_r0;%P1_x=x;%P1_y=y } (*****************************************************************) (* the Telechat toolsuite *) (* *) (* Luke Geeson, University College London, UK. *) (* gcc -O1 -march=armv7-a -pthread --std=c11 -marm -fno-section-anchors*) (* *) (*****************************************************************) P0 | P1 ; LDR R2,[%P0_x] | LDR R2,[%P1_y] ; CMP R2,#1 | DMB ISH ; BEQ L3 | MOV R1,#1 ; L2: STR R2, [%P0_P0_r0]| STR R1, [%P1_x] ; BX LR | STR R2, [%P1_P1_r0] ; L3: MOV R1, #1 | ; STR R1, [%P0_y] | ; B L2 | ; exists (P0_r0=1 /\ P1_r0=1) ``` Which under the arm model has the outcomes: ``` [P0_r0]=0; [P1_r0]=0; [P0_r0]=1; [P1_r0]=0; ``` As you can see the buggy behaviour has gone away, the bug is fixed.
Fixed