Bug 111235 - [Armv7-a]: Control-dependency between atomic accesses removed by -O1.
Summary: [Armv7-a]: Control-dependency between atomic accesses removed by -O1.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: 14.0
Assignee: Wilco
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2023-08-30 09:42 UTC by Luke Geeson
Modified: 2024-05-22 18:39 UTC (History)
4 users (show)

See Also:
Host:
Target: arm
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Geeson 2023-08-30 09:42:04 UTC
Consider the following litmus test that captures bad behaviour:
```
C test

{ *x = 0; *y = 0; } // fixed initial state where x and y are 0

// Concurrent threads
P0 (atomic_int* y,int* x) {
  int r0 = *x;
  if (r0 == 1) {
    atomic_store_explicit(y,1,memory_order_relaxed);
  }
}

P1 (atomic_int* y,int* x) {
  int r0 = atomic_load_explicit(y,memory_order_acquire);
  *x = 1;
}

// predicate over final state - does this outcome occur?
exists (P0:r0=1 /\ P1:r0=1)
```
where 'P0:r0 = 0' means thread P0, local variable r0 has value 0.

When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are:
```
{P0:r0=0; P1:r0=0;}
{P0:r0=1; P1:r0=0;}   
```

When compiling to target armv7-a using either GCC or LLVM trunk (https://godbolt.org/z/nxWTbEfK1), the compiled program has the following outcomes when simulated under the armv7 model:
```
{P0:r0=0; P1:r0=0;}                                           
{P0:r0=1; P1:r0=0;}
{P0:r0=1; P1:r0=1;} <--- forbidden by source model, bug!
```
Since the `CMP;MOVEQ   r1, #1;STREQ` sequence is predicated on the result of CMP, but the STREQ can be reordered before the LDR of x on thread P0, allowing the outcome {P0:r0=1; P1:r0=1;}. 

The issue is that the control dependency between the load and store on P0 has been removed by the compiler. Both GCC and LLVM produce the following (replace symbolic registers e.g. %P0_x with concrete registers containing x).

```
ARM test

{ [P0_r0]=0;[P1_r0]=0;[x]=0;[y]=0;
  %P0_P0_r0=P0_r0;%P0_x=x;%P0_y=y;
  %P1_P1_r0=P1_r0;%P1_x=x;%P1_y=y }
(*****************************************************************)
(*                 the Telechat toolsuite                        *)
(*                                                               *)
(* Luke Geeson, University College London, UK.                   *)
(* gcc -O1 -march=armv7-a -pthread --std=c11 -marm -fno-section-anchors*)
(*                                                               *)
(*****************************************************************)
  P0                   |  P1                   ;
   LDR R2,[%P0_x]      |   LDR R2,[%P1_y]      ;
   CMP R2,#1           |   DMB ISH             ;
   MOVEQ R1,#1         |   MOV R1,#1           ;
   STREQ R1,[%P0_y]    |   STR R1,[%P1_x]      ;
   STR R2,[%P0_P0_r0]  |   STR R2,[%P1_P1_r0]  ;
   BX LR               |   BX LR               ;


exists (P0_r0=1 /\ P1_r0=1) <--- yes, allowed by arm.cat, bug!
```
The forbidden {P0:r0=1; P1:r0=1;} behaviour disappears in GCC when optimising using -O2 or above, since R2 is reused in the STREQ when the EQ condition holds - a data dependency masks the loss of the control dependency. This buggy behaviour remains in clang -O2 and above however (https://godbolt.org/z/hGv7P3vGW).

To fix this - mark relaxed atomic load/stores as not predicated, and ideally use an explicit branch to enforce the control dependency so that it is not lost:
```
ARM test-fixed

{ [P0_r0]=0;[P1_r0]=0;[x]=0;[y]=0;
  %P0_P0_r0=P0_r0;%P0_x=x;%P0_y=y;
  %P1_P1_r0=P1_r0;%P1_x=x;%P1_y=y }

  P0                   |  P1                   ;
   LDR R0,[%P0_x]      |   MOV R2,#1           ;
   CMP R0,#1           |   LDR R0,[%P1_y]      ;
   BNE L0x2c           |   DMB ISH             ;
   MOV R2,#1           |   STR R2,[%P1_x]      ;
   STR R2,[%P0_y]      |   STR R0,[%P1_P1_r0]  ;
  L0x2c:               |                       ;
   STR R0,[%P0_P0_r0]  |                       ;


exists (P0_r0=1 /\ P1_r0=1)
```
This prevents the buggy behaviour.

We will follow up by reporting the bug for GCC.
Comment 1 Luke Geeson 2023-08-30 09:42:59 UTC
Correction 
We will follow up by reporting the bug for GCC.
->
We have reported this in LLVM as well
Comment 2 GCC Commits 2023-10-02 15:07:33 UTC
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:

https://gcc.gnu.org/g:0731889c026bfe8d55c4851422ca5ec9d037f7a0

commit r14-4365-g0731889c026bfe8d55c4851422ca5ec9d037f7a0
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date:   Fri Sep 29 13:21:10 2023 +0100

    Arm: Block predication on atomics [PR111235]
    
    The v7 memory ordering model allows reordering of conditional atomic
    instructions.  To avoid this, make all atomic patterns unconditional.
    Expand atomic loads and stores for all architectures so the memory access
    can be wrapped into an UNSPEC.
    
    Reviewed-by: Ramana Radhakrishnan <ramana.gcc@googlemail.com>
    
    gcc/ChangeLog:
            PR target/111235
            * config/arm/constraints.md: Remove Pf constraint.
            * config/arm/sync.md (arm_atomic_load<mode>): Add new pattern.
            (arm_atomic_load_acquire<mode>): Likewise.
            (arm_atomic_store<mode>): Likewise.
            (arm_atomic_store_release<mode>): Likewise.
            (atomic_load<mode>): Switch patterns to define_expand.
            (atomic_store<mode>): Likewise.
            (arm_atomic_loaddi2_ldrd): Remove predication.
            (arm_load_exclusive<mode>): Likewise.
            (arm_load_acquire_exclusive<mode>): Likewise.
            (arm_load_exclusivesi): Likewise.
            (arm_load_acquire_exclusivesi): Likewise.
            (arm_load_exclusivedi): Likewise.
            (arm_load_acquire_exclusivedi): Likewise.
            (arm_store_exclusive<mode>): Likewise.
            (arm_store_release_exclusivedi): Likewise.
            (arm_store_release_exclusive<mode>): Likewise.
            * config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR.
    
    gcc/testsuite/ChangeLog:
            PR target/111235
            * gcc.dg/rtl/arm/stl-cond.c: Remove test.
            * gcc.target/arm/atomic_loaddi_7.c: Fix dmb count.
            * gcc.target/arm/atomic_loaddi_8.c: Likewise.
            * gcc.target/arm/pr111235.c: Add new test.
Comment 3 Sam James 2023-10-02 15:39:40 UTC
(In reply to Luke Geeson from comment #1)
> Correction 
> We will follow up by reporting the bug for GCC.
> ->
> We have reported this in LLVM as well

Could you include the link? Thanks.
Comment 4 Luke Geeson 2023-10-03 13:19:11 UTC
Hi there,
Apologies here you go:
https://github.com/llvm/llvm-project/issues/65106
Comment 5 Luke Geeson 2023-10-25 16:10:20 UTC
Thank you for fixing this, Wilco! I will now test this bug using the code emitted by Godbolt. 

Consider again the same source program. When run through gcc with the same flags we get the ARM assembly test:
```
ARM test

{ [P0_r0]=0;[P1_r0]=0;[x]=0;[y]=0;
  %P0_P0_r0=P0_r0;%P0_x=x;%P0_y=y;
  %P1_P1_r0=P1_r0;%P1_x=x;%P1_y=y }
(*****************************************************************)
(*                 the Telechat toolsuite                        *)
(*                                                               *)
(* Luke Geeson, University College London, UK.                   *)
(* gcc -O1 -march=armv7-a -pthread --std=c11 -marm -fno-section-anchors*)
(*                                                               *)
(*****************************************************************)
  P0                   |  P1                   ;
   LDR R2,[%P0_x]      |   LDR R2,[%P1_y]      ;
   CMP R2,#1           |   DMB ISH             ;
   BEQ L3              |   MOV R1,#1           ;
L2: STR R2, [%P0_P0_r0]|   STR R1, [%P1_x]     ;
   BX LR               |   STR R2, [%P1_P1_r0] ;
L3: MOV R1, #1         |                       ;
   STR R1, [%P0_y]     |                       ;
   B L2                |                       ;


exists (P0_r0=1 /\ P1_r0=1)
```
Which under the arm model has the outcomes:
```
[P0_r0]=0; [P1_r0]=0;
[P0_r0]=1; [P1_r0]=0;
```

As you can see the buggy behaviour has gone away, the bug is fixed.
Comment 6 Wilco 2023-10-31 16:42:24 UTC
Fixed