[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes
peter at cordes dot ca
gcc-bugzilla@gcc.gnu.org
Sun Aug 20 20:46:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837
--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
Seems to be fixed in gcc7.2.0: https://godbolt.org/g/jRwtZN
gcc7.2 is fine with -m32, -mx32, and -m64, but x32 is the most compact. -m64
just calls __atomic_load_16
gcc7.2 -O3 -mx32 output:
follow_nounion(std::atomic<counted_ptr>*):
movq (%edi), %rax
movl %eax, %eax
ret
vs.
gcc7.1 -O3 -mx32
follow_nounion(std::atomic<counted_ptr>*):
movq (%edi), %rcx
xorl %edx, %edx
movzbl %ch, %eax
movb %cl, %dl
movq %rcx, %rsi
movb %al, %dh
andl $16711680, %esi
andl $4278190080, %ecx
movzwl %dx, %eax
orq %rsi, %rax
orq %rcx, %rax
ret
-------
gcc7.2 -O3 -m64 just forwards its arg to __atomic_load_16 and then returns:
follow_nounion(std::atomic<counted_ptr>*):
subq $8, %rsp
movl $2, %esi
call __atomic_load_16
addq $8, %rsp
ret
It unfortunately doesn't optimize the tail-call to
movl $2, %esi
jmp __atomic_load_16
presumably because it hasn't realized early enough that it takes zero
instructions to extract the 8-byte low half of the 16-byte __atomic_load_16
return value.
More information about the Gcc-bugs
mailing list