[Bug target/96062] New: Partial register stall caused by avoidable use of SETcc, and useless MOVZBL
josephcsible at gmail dot com
gcc-bugzilla@gcc.gnu.org
Sat Jul 4 17:29:10 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96062
Bug ID: 96062
Summary: Partial register stall caused by avoidable use of
SETcc, and useless MOVZBL
Product: gcc
Version: 10.1.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: josephcsible at gmail dot com
Target Milestone: ---
Target: x86_64
Consider this C code:
long ps4_syscall0(long n) {
long ret;
int carry;
__asm__ __volatile__(
"syscall"
: "=a"(ret), "=@ccc"(carry)
: "a"(n)
: "rcx", "r8", "r9", "r10", "r11", "memory"
);
return carry ? -ret : ret;
}
With "-O3", it results in this assembly:
ps4_syscall0:
movq %rdi, %rax
syscall
setc %dl
movq %rax, %rdi
movzbl %dl, %edx
negq %rdi
testl %edx, %edx
cmovne %rdi, %rax
ret
On modern Intel CPUs, doing "setc %dl" creates a false dependency on rdx. Doing
"movzbl %dl, %edx" doesn't do anything to fix that. Here's some ways that we
could improve this code, without having to fall back to a conditional branch:
1. Get rid of "movzbl %dl, %edx" (since it doesn't help), and then do "testb
%dl, %dl" instead of "testl %edx, %edx".
2. Possibly in addition to #1, use dh instead of dl, since high-byte registers
are still renamed.
3. Instead of #1 and #2, replace the whole sequence between "syscall" and "ret"
with this:
sbbq %rcx, %rcx
xorq %rcx, %rax
subq %rcx, %rax
On Intel (but not AMD), the sbb has a false dependency too, but it's still a
lot less shuffling values around.
4. Instead of #1, #2, and #3, replace the whole sequence between "syscall" and
"ret" with this:
leaq -1(%rax), %rcx
notq %rcx
cmovc %rcx, %rax
I like this one the best. No false dependencies at all, and still way less
shuffling values around.
More information about the Gcc-bugs
mailing list