Bug 111416 - [Armv7/v8 Mixing Bug]: 64-bit Sequentially Consistent Load can be Reordered before Store of RMW when v7 and v8 Implementations are Mixed
Summary: [Armv7/v8 Mixing Bug]: 64-bit Sequentially Consistent Load can be Reordered b...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: 14.0
Assignee: Wilco
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-14 13:17 UTC by Luke Geeson
Modified: 2023-10-31 16:52 UTC (History)
3 users (show)

See Also:
Host:
Target: arm-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-09-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Geeson 2023-09-14 13:17:55 UTC
Consider the following litmus test that has buggy behaviour:
```
C test

{ int64_t x = 0; int64_t y = 0 }

P0 (_Atomic int64_t *x, _Atomic int64_t *y) {
  atomic_fetch_add_explicit(x,1,memory_order_seq_cst);
  int32_t r0 = atomic_load_explicit(y,memory_order_seq_cst);
}

void P1 (_Atomic int64_t  *x, _Atomic int64_t  *y) {
  atomic_store_explicit(y,1,memory_order_seq_cst);
  int32_t r0 = atomic_load_explicit(x,memory_order_seq_cst);
}

exists P0:r0 = 0 /\ P1:r0 = 0
```
where 'P0:r0 = 0' means thread P0, local variable r0 has value 0

When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are:
```
{ P0:r0=0; P1:r0=1; }
{ P0:r0=1; P1:r0=0; }
{ P0:r0=1; P1:r0=1; }
```
When compiling P1, to target armv7-a cortex-a53 (https://godbolt.org/z/efGnsa19G) using clang trunk, compiling the fetch_add on P0 to target a cortex-a53 using clang trunk (`ldaexd;add;stlexd` loop), and the load on P0 to target a cortex-a15 (`ldrd;dmb`) using GCC trunk for cortex-a15. The compiled program has the following outcomes when simulated under the aarch32 model:
```
{ P0:r0=0; P1:r0=0; } <--- Forbidden by source model, bug!
{ P0:r0=0; P1:r0=1; }
{ P0:r0=1; P1:r0=0; }
{ P0:r0=1; P1:r0=1; }
```
which is due to the fact the LDRD on P0 can be reordered befofre the stlexd on P0 since there is no dmb barrier to prevent the reordering.

Since there is no acquire load on armv7, we propose to fix the bug by adding a fence before the ldrd:
```
dmb ish; ldrd; dmb ish
```
Which prevents the buggy outcome under the aarch32 memory model.

I have validated this bug whilst discussing with Wilco from Arm's compiler teams.

This bug would not have been caught in normal execution, but only when multiple implementations are mixed together.
Comment 1 Wilco 2023-09-14 16:58:58 UTC
This will be fixed by https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629607.html
Comment 2 Wilco 2023-10-31 16:50:39 UTC
Fixed by commit r14-4365-g0731889c026bfe8d55c4851422ca5ec9d037f7a0 

#include <stdatomic.h>
#include <stdint.h>

int64_t f (_Atomic int64_t *p)
{
  return atomic_load (p);
}

now generates with -O2 -mcpu=cortex-a15:

        dmb     ish
        ldrd    r0, r1, [r0]
        dmb     ish
        bx      lr
Comment 3 Wilco 2023-10-31 16:51:26 UTC
Fixed by commit r14-4365-g0731889c026bfe8d55c4851422ca5ec9d037f7a0 

#include <stdatomic.h>
#include <stdint.h>

int64_t f (_Atomic int64_t *p)
{
  return atomic_load (p);
}

now generates with -O2 -mcpu=cortex-a15:

        dmb     ish
        ldrd    r0, r1, [r0]
        dmb     ish
        bx      lr