[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
jgreenhalgh at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Apr 10 15:09:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
--- Comment #12 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
There are two problems here, one of which concerns me more in the real world,
and both of which rely on races if you are in the C/C++11 model - there isn't
much I can do about that as the __sync primitives are legacy and give stronger
guarantees about "visible memory references" (which includes ALL global data).
Consider:
int x = 0;
int y = 0;
int z = 0;
void foo (void)
{
x = 1;
__sync_fetch_and_add (&y, 1);
z = 1;
}
My reading of what a full barrier implies would suggest that we should never
have a modification order which is any of:
z = 1, x = 1, y = 0+1
z = 1, y = 0+1, x = 1
x = 1, z = 1, y = 0+1
GCC 5.0-ish (recent trunk) will emit:
foo:
adrp x3, .LANCHOR0
mov w1, 1
add x0, x3, :lo12:.LANCHOR0
add x2, x0, 4
str w1, [x3, #:lo12:.LANCHOR0]
.L2:
ldaxr w3, [x2]
add w3, w3, w1
stlxr w4, w3, [x2]
cbnz w4, .L2
str w1, [x0, 8]
ret
Dropping some of the context and switching to pseudo-asm we have:
str 1, [x]
.L2:
ldaxr tmp, [y]
add tmp, tmp, 1
stlxr flag, tmp, [y]
cbnz flag, .L2
str 1, [z]
As far as I understand it, the memory ordering guarantees of the half-barrier
LDAXR/STLXR instructions do not prevent this being reordered to an execution
which looks like:
ldaxr tmp, [y]
str 1, [z]
str 1, [x]
add tmp, tmp, 1
stlxr flag, tmp, [y]
cbnz flag, .L2
which gives one of the modification orders we wanted to forbid above. Similar
reordering can give the other undesirable modification orders.
As mentioned, this reordering is permitted under C11, as the stores to x and z
are racy - this permits the CPPMEM/Cambridge documentation of what an SEQ_CST
must do. However, my feeling is that it is at the very least *surprising* for
the __sync primitives to allow this ordering, and more likely points to a break
in the AArch64 implementation.
Fixing this requires a hard memory barrier - but I really don't want to see us
penalise C11 SEQ_CST to get the right behaviour out of __sync_fetch_and_add...
The second problem relies on a similar argument to show that in a
__sync_lock_test_and_set operation we can reorder a later access before the
"set" half of the operation. i.e. that
__sync_lock_test_and_set (&y, 1)
x = 1;
Can be seen as
tmp = load_acquire (y)
store (x, 1)
store (y, 1)
Giving an incorrect modification order.
The AArch64 parts are something I can argue through with smarter people than
me, but I think my comments on the __sync primitives are correct?
More information about the Gcc-bugs
mailing list