I looked up similar bugs, but I could not quite understand why it redirects to libatomic when used with 128-bit cmpxchg in x86-64 even when '-mcx16' flag is specified. Especially because similar cmpxchg8b for x86 (32-bit) is still used without redirecting to libatomic. 80878 mentioned something about read-only memory, but that should only apply to atomic_load, not atomic_compare_and_exchange. Right? It is especially annoying because libatomic will not guarantee lock-freedom, therefore, these functions become useless in many cases. This compiler behavior is inconsistent with clang. For instance, for the following code: #include <stdatomic.h> __uint128_t cmpxhg_weak(_Atomic(__uint128_t) * obj, __uint128_t * expected, __uint128_t desired) { return atomic_compare_exchange_weak(obj, expected, desired); } GCC generates: (gcc -std=c11 -mcx16 -Wall -O2 -S test.c) cmpxhg_weak: subq $8, %rsp movl $5, %r9d movl $5, %r8d call __atomic_compare_exchange_16@PLT xorl %edx, %edx movzbl %al, %eax addq $8, %rsp ret While clang/llvm generates the code which is obviously lock-free: cmpxhg_weak: # @cmpxhg_weak pushq %rbx movq %rdx, %r8 movq (%rsi), %rax movq 8(%rsi), %rdx xorl %r9d, %r9d movq %r8, %rbx lock cmpxchg16b (%rdi) sete %cl je .LBB0_2 movq %rax, (%rsi) movq %rdx, 8(%rsi) .LBB0_2: movb %cl, %r9b xorl %edx, %edx movq %r9, %rax popq %rbx retq However, for 32-bit GCC still generates cmpxchg8b: #include <stdatomic.h> #include <inttypes.h> uint64_t cmpxhg_weak(_Atomic(uint64_t) * obj, uint64_t * expected, uint64_t desired) { return atomic_compare_exchange_weak(obj, expected, desired); } gcc -std=c11 -m32 -Wall -O2 -S test.c cmpxhg_weak: pushl %edi pushl %esi pushl %ebx movl 20(%esp), %esi movl 24(%esp), %ebx movl 28(%esp), %ecx movl 16(%esp), %edi movl (%esi), %eax movl 4(%esi), %edx lock cmpxchg8b (%edi) movl %edx, %ecx movl %eax, %edx sete %al je .L2 movl %edx, (%esi) movl %ecx, 4(%esi) .L2: popl %ebx movzbl %al, %eax xorl %edx, %edx popl %esi popl %edi ret
IIRC this was done because there is no atomic load/stores or a way to do backwards compatible.
Yes, but not having atomic_load is far less an issue. Oftentimes, algorithms that use 128-bit can simply use compare_and_exchange only (at least for x86-64).
(In reply to Ruslan Nikolaev from comment #2) > Yes, but not having atomic_load is far less an issue. Oftentimes, algorithms > that use 128-bit can simply use compare_and_exchange only (at least for > x86-64). In other words, can atomic_load be redirected to libatomic while compare_exchange still be generated directly (if -mcx16 is specified)?
I guess, in this case you would have to fall-back to lock-based implementation for everything. But does C11 even require that atomic_load work on read-only memory?
After more t(In reply to Andrew Pinski from comment #1) > IIRC this was done because there is no atomic load/stores or a way to do > backwards compatible. After more thinking about it... Should not it be controlled by some flag (similar to -mcx16 which enables cmpxchg16b)? This flag can basically say, that atomic_load on 128-bit will not work on read-only memory. I think, it is better than just unconditionally disabling lock-free implementation for 128-bit types in C11 (which can is useful in a number of cases) just to accommodate some rare cases when memory accesses must be read-only. That would also be more portable and compatible with other compilers such as clang.
Dup of bug 80878. *** This bug has been marked as a duplicate of bug 80878 ***