[Bug target/111246] New: PPC64 Sequentially Consistent Load allows Reordering of Stores
luke.geeson at cs dot ucl.ac.uk
gcc-bugzilla@gcc.gnu.org
Wed Aug 30 20:23:04 GMT 2023
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111246
Bug ID: 111246
Summary: PPC64 Sequentially Consistent Load allows Reordering
of Stores
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: luke.geeson at cs dot ucl.ac.uk
Target Milestone: ---
Consider the following litmus test that captures bad behaviour:
```
C test
{ *x = 0; *y = 0; } // fixed initial state where x and y are 0
// Concurrent threads
P0 (atomic_int* y,atomic_int* x) {
atomic_store_explicit(x,2,memory_order_relaxed);
atomic_thread_fence(memory_order_consume);
atomic_store_explicit(y,1,memory_order_seq_cst);
}
P1 (atomic_int* y,atomic_int* x) {
int r0 = atomic_load_explicit(y,memory_order_seq_cst);
atomic_thread_fence(memory_order_relaxed);
atomic_store_explicit(x,1,memory_order_relaxed);
}
exists ([x]=2 /\ 1:r0=1)
```
where 'P0:r0 = 0' means thread P0, local variable r0 has value 0. [x] means x
is a global.
When simulating this test under the C/C++ model from its initial state, the
outcome of execution in the exists clause is forbidden by the source model. The
allowed outcomes are:
```
{ P1:r0=0; [x]=1; }
{ P1:r0=0; [x]=2; }
{ P1:r0=1; [x]=1; }
```
When compiling to target PPC64le with GCC trunk `-c -g -O1 -pthread --std=c11
-ffreestanding -mabi=elfv1` (https://godbolt.org/z/xj9W7nr9G), the compiled
program has the following outcomes when simulated under the PPC model:
```
{ P1:r0=0; [x]=1; }
{ P1:r0=0; [x]=2; }
{ P1:r0=1; [x]=1; }
{ P1:r0=1; [x]=2; } <--- forbidden by source model, bug!
```
which occurs since the `b L0x50; isync` pattern allows the memory effect of the
load of y on thread P1 to be reordered after the store to x on P1. We observe
the execution:
{ P1:r0=0;[x]=0} -> P1:W[x]=1 -> P0:W[x]=2 -> P0:W[y]=1 -> P1:R[y]=0 -> {
P1:r0=1; [x]=2; }
where the local variable `r0` is stored in register r9 in the compiled code.
The problem is isync forces the instructions to finish, but not their memory
effects. Hence the forbidden outcome { P1:r0=1; [x]=2; } is allowed under the
ppc model.
```
PPC test
{ [P1_r0]=0;[x]=0;[y]=0;
uint64_t %P0_x=x; uint64_t %P0_y=y;
uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x;
uint64_t %P1_y=y }
(*****************************************************************)
(* the Telechat toolsuite *)
(* *)
(* Luke Geeson, University College London, UK. *)
(* *)
(*****************************************************************)
P0 | P1 ;
li r10,2 | sync ;
stw r10,0(%P0_x) | lwz r9,0(%P1_y) ;
lwsync | cmpw r9,r9 ;
sync | b L0x50 ;
li r10,1 | L0x50: isync ;
stw r10,0(%P0_y) | li r8,1 ;
| stw r8,0(%P1_x) ;
| stw r9,0(%P1_P1_r0) ;
exists ([x]=2 /\ P1_r0=1) // YES under PPC model
```
We have observed this behaviour with several hundred Message Passing and 'S'
pattern tests (the above is an S pattern test).
To fix this, we advise replacing the isync with lwfence to forbid the buggy
behaviour:
```
PPC test-fixed
{ [P1_r0]=0;[x]=0;[y]=0;
uint64_t %P0_x=x; uint64_t %P0_y=y;
uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x;
uint64_t %P1_y=y }
(*****************************************************************)
(* the Telechat toolsuite *)
(* *)
(* Luke Geeson, University College London, UK. *)
(* *)
(*****************************************************************)
P0 | P1 ;
li r10,2 | sync ;
stw r10,0(%P0_x) | lwz r9,0(%P1_y) ;
lwsync | cmpw r9,r9 ;
sync | b L0x50 ;
li r10,1 | L0x50: lwfence ;
stw r10,0(%P0_y) | li r8,1 ;
| stw r8,0(%P1_x) ;
| stw r9,0(%P1_P1_r0) ;
exists ([x]=2 /\ P1_r0=1) // NO under PPC model
```
which no longer allows the buggy outcome under the ppc model:
```
{ P1:r0=0; [x]=1; }
{ P1:r0=0; [x]=2; }
{ P1:r0=1; [x]=1; }
```
I'm happy to discuss this as needed.
More information about the Gcc-bugs
mailing list