Consider the following litmus test that has buggy behaviour: ``` C test { int64_t x = 0; int64_t y = 0 } P0 (_Atomic int64_t *x, _Atomic int64_t *y) { atomic_fetch_add_explicit(x,1,memory_order_seq_cst); int32_t r0 = atomic_load_explicit(y,memory_order_seq_cst); } void P1 (_Atomic int64_t *x, _Atomic int64_t *y) { atomic_store_explicit(y,1,memory_order_seq_cst); int32_t r0 = atomic_load_explicit(x,memory_order_seq_cst); } exists P0:r0 = 0 /\ P1:r0 = 0 ``` where 'P0:r0 = 0' means thread P0, local variable r0 has value 0 When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are: ``` { P0:r0=0; P1:r0=1; } { P0:r0=1; P1:r0=0; } { P0:r0=1; P1:r0=1; } ``` When compiling P1, to target armv7-a cortex-a53 (https://godbolt.org/z/efGnsa19G) using clang trunk, compiling the fetch_add on P0 to target a cortex-a53 using clang trunk (`ldaexd;add;stlexd` loop), and the load on P0 to target a cortex-a15 (`ldrd;dmb`) using GCC trunk for cortex-a15. The compiled program has the following outcomes when simulated under the aarch32 model: ``` { P0:r0=0; P1:r0=0; } <--- Forbidden by source model, bug! { P0:r0=0; P1:r0=1; } { P0:r0=1; P1:r0=0; } { P0:r0=1; P1:r0=1; } ``` which is due to the fact the LDRD on P0 can be reordered befofre the stlexd on P0 since there is no dmb barrier to prevent the reordering. Since there is no acquire load on armv7, we propose to fix the bug by adding a fence before the ldrd: ``` dmb ish; ldrd; dmb ish ``` Which prevents the buggy outcome under the aarch32 memory model. I have validated this bug whilst discussing with Wilco from Arm's compiler teams. This bug would not have been caught in normal execution, but only when multiple implementations are mixed together.
This will be fixed by https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629607.html
Fixed by commit r14-4365-g0731889c026bfe8d55c4851422ca5ec9d037f7a0 #include <stdatomic.h> #include <stdint.h> int64_t f (_Atomic int64_t *p) { return atomic_load (p); } now generates with -O2 -mcpu=cortex-a15: dmb ish ldrd r0, r1, [r0] dmb ish bx lr