Gcc emits unnecessary fences for expressions involving objects of atomic types that are not (yet) shared across threads. For example, in the two functions below, the objects are not shared with other threads and thus the assignments to the atomic variables do not require any fences. Such assignments are commonplace when objects containing atomic variables are being initializing (as in the second function). In comparison, Clang emits no fences for the functions below. $ cat t.c && /build/gcc-trunk/gcc/xgcc -B /build/gcc-trunk/gcc -O2 -S -Wall -Wextra -o/dev/tty t.c int foo (void) { _Atomic int i; i = 0; return i; } struct S { _Atomic int i; int n; char data[]; }; extern void* malloc (__SIZE_TYPE__); struct S* bar (int n) { struct S *s = malloc (sizeof *s + n); s->i = 0; return s; } .file "t.c" .machine power8 .abiversion 2 .section ".toc","aw" .section ".text" .align 2 .p2align 4,,15 .globl foo .type foo, @function foo: sync li 9,0 stw 9,-16(1) sync lwz 3,-16(1) cmpw 7,3,3 bne- 7,$+4 isync extsw 3,3 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size foo,.-foo .align 2 .p2align 4,,15 .globl bar .type bar, @function bar: 0: addis 2,12,.TOC.-0b@ha addi 2,2,.TOC.-0b@l .localentry bar,.-bar mflr 0 addi 3,3,8 std 0,16(1) stdu 1,-32(1) bl malloc nop sync li 10,0 addi 1,1,32 stw 10,0(3) ld 0,16(1) mtlr 0 blr .long 0 .byte 0,0,0,1,128,0,0,0 .size bar,.-bar .ident "GCC: (GNU) 6.0.0 20151125 (experimental)" .section .note.GNU-stack,"",@progbits
To do this, GCC needs a pass which changes __atomic_stores to be relaxed stores. Confirmed. I wonder how clang does it.
http://llvm.org/devmtg/2014-10/Slides/Morisset-AtomicsPresentation.pdf
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html This means someone should implement this for GCC.