[Bug middle-end/51766] [4.7 regression] sync_fetch_and_xxx atomicity

Mon Jan 9 16:51:00 GMT 2012

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51766

--- Comment #3 from David Edelsohn <dje at gcc dot gnu.org> 2012-01-09 16:49:10 UTC ---
> It says above them "In most cases, these
> builtins are considered a full barrier." and only __sync_lock_test_and_set and
> __sync_lock_release specify different barrier semantics.

The next sentence is:

"That is, no memory operand will be moved across the operation, either forward
or
backward."

Note that this refers to memory operands, not memory operations -- memory
stores and memory loads referenced in documentation of the other sync builtins.
 In other words, one could interpret "full memory barrier" as:

asm volatile ("" : : : "memory");

that refers to a GCC scheduling barrier.

The GCC documentation references Intel processors, which do not have have a
distinction between instructions for RELEASE, ACQ_REL and SEQ_CST semantics.

The basic problem is that the GCC builtins and atomic instruction semantics
were designed for Intel processors that do not provide the level of granularity
implemented in POWER processors.  The POWER port implemented lighter weight
ACQ_REL semantics. Retrofitting the original builtins on the new C++11 memory
model semantics and imposing SEQ_CST interpretation has changed the behavior
and performance on POWER, but not on other targets.