Bug 84522 - GCC does not generate cmpxchg16b when mcx16 is used
Summary: GCC does not generate cmpxchg16b when mcx16 is used
Status: RESOLVED DUPLICATE of bug 80878
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: unknown
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-22 20:43 UTC by Ruslan Nikolaev
Modified: 2018-03-29 09:31 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ruslan Nikolaev 2018-02-22 20:43:03 UTC
I looked up similar bugs, but I could not quite understand why it redirects to libatomic when used with 128-bit cmpxchg in x86-64 even when '-mcx16' flag is specified. Especially because similar cmpxchg8b for x86 (32-bit) is still used without redirecting to libatomic.

80878 mentioned something about read-only memory, but that should only apply to atomic_load, not atomic_compare_and_exchange. Right?

It is especially annoying because libatomic will not guarantee lock-freedom, therefore, these functions become useless in many cases.
This compiler behavior is inconsistent with clang.

For instance, for the following code:

#include <stdatomic.h>

__uint128_t cmpxhg_weak(_Atomic(__uint128_t) * obj, __uint128_t * expected, __uint128_t desired)
{
        return atomic_compare_exchange_weak(obj, expected, desired);
}

GCC generates:

(gcc -std=c11 -mcx16 -Wall -O2 -S test.c)

cmpxhg_weak:
        subq    $8, %rsp
        movl    $5, %r9d
        movl    $5, %r8d
        call    __atomic_compare_exchange_16@PLT
        xorl    %edx, %edx
        movzbl  %al, %eax
        addq    $8, %rsp
        ret

While clang/llvm generates the code which is obviously lock-free:
cmpxhg_weak:                            # @cmpxhg_weak
        pushq   %rbx
        movq    %rdx, %r8
        movq    (%rsi), %rax
        movq    8(%rsi), %rdx
        xorl    %r9d, %r9d
        movq    %r8, %rbx
        lock            cmpxchg16b      (%rdi)
        sete    %cl
        je      .LBB0_2
        movq    %rax, (%rsi)
        movq    %rdx, 8(%rsi)
.LBB0_2:
        movb    %cl, %r9b
        xorl    %edx, %edx
        movq    %r9, %rax
        popq    %rbx
        retq

However, for 32-bit GCC still generates cmpxchg8b:

#include <stdatomic.h>
#include <inttypes.h>

uint64_t cmpxhg_weak(_Atomic(uint64_t) * obj, uint64_t * expected, uint64_t desired)
{
        return atomic_compare_exchange_weak(obj, expected, desired);
}

gcc -std=c11 -m32 -Wall -O2 -S test.c


cmpxhg_weak:
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    20(%esp), %esi
        movl    24(%esp), %ebx
        movl    28(%esp), %ecx
        movl    16(%esp), %edi
        movl    (%esi), %eax
        movl    4(%esi), %edx
        lock cmpxchg8b  (%edi)
        movl    %edx, %ecx
        movl    %eax, %edx
        sete    %al
        je      .L2
        movl    %edx, (%esi)
        movl    %ecx, 4(%esi)
.L2:
        popl    %ebx
        movzbl  %al, %eax
        xorl    %edx, %edx
        popl    %esi
        popl    %edi
        ret
Comment 1 Andrew Pinski 2018-02-22 20:47:51 UTC
IIRC this was done because there is no atomic load/stores or a way to do backwards compatible.
Comment 2 Ruslan Nikolaev 2018-02-22 20:49:37 UTC
Yes, but not having atomic_load is far less an issue. Oftentimes, algorithms that use 128-bit can simply use compare_and_exchange only (at least for x86-64).
Comment 3 Ruslan Nikolaev 2018-02-22 20:51:11 UTC
(In reply to Ruslan Nikolaev from comment #2)
> Yes, but not having atomic_load is far less an issue. Oftentimes, algorithms
> that use 128-bit can simply use compare_and_exchange only (at least for
> x86-64).

In other words, can atomic_load be redirected to libatomic while compare_exchange still be generated directly (if -mcx16 is specified)?
Comment 4 Ruslan Nikolaev 2018-02-22 21:12:14 UTC
I guess, in this case you would have to fall-back to lock-based implementation for everything. But does C11 even require that atomic_load work on read-only memory?
Comment 5 Ruslan Nikolaev 2018-02-23 00:47:07 UTC
After more t(In reply to Andrew Pinski from comment #1)
> IIRC this was done because there is no atomic load/stores or a way to do
> backwards compatible.

After more thinking about it... Should not it be controlled by some flag (similar to -mcx16 which enables cmpxchg16b)? This flag can basically say, that atomic_load on 128-bit will not work on read-only memory. I think, it is better than just unconditionally disabling lock-free implementation for 128-bit types in C11 (which can is useful in a number of cases) just to accommodate some rare cases when memory accesses must be read-only. That would also be more portable and compatible with other compilers such as clang.
Comment 6 Andrew Pinski 2018-02-25 18:48:03 UTC
Dup of bug 80878.

*** This bug has been marked as a duplicate of bug 80878 ***