[Bug target/103069] cmpxchg isn't optimized
thiago at kde dot org
gcc-bugzilla@gcc.gnu.org
Wed Nov 3 20:53:13 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #1 from Thiago Macieira <thiago at kde dot org> ---
(the assembly doesn't match the source code, but we got your point)
Another possible improvement for the __atomic_fetch_{and,nand,or} functions is
that it can check whether the fetched value is already correct and branch out.
In your example, the __atomic_fetch_or with 0x40000000 can check if that bit is
already set and, if so, not execute the CMPXCHG at all.
This is a valid solution for x86 on memory orderings up to acq_rel. For other
architectures, they may still need barriers. For seq_cst, we either need a
barrier or we need to execute the CMPXCHG at least once.
Therefore, the emitted code might want to optimistically execute the operation
once and, if it fails, enter the load loop. That's a slightly longer codegen.
Whether we want that under -Os or not, you'll have to be the judge.
Prior art: glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S:
ENTRY(__pthread_spin_lock)
1: LOCK
decl 0(%rdi)
jne 2f
xor %eax, %eax
ret
.align 16
2: rep
nop
cmpl $0, 0(%rdi)
jg 1b
jmp 2b
END(__pthread_spin_lock)
This does the atomic operation once, hoping it'll succeed. If it fails, it
enters the PAUSE+CMP+JG loop until the value is suitable.
More information about the Gcc-bugs
mailing list