This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/69933] New: non-ideal branch layout for an early-out return
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 24 Feb 2016 03:50:41 +0000
- Subject: [Bug rtl-optimization/69933] New: non-ideal branch layout for an early-out return
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69933
Bug ID: 69933
Summary: non-ideal branch layout for an early-out return
Product: gcc
Version: 5.3.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
(just guessing about this being an RTL bug, please reassign if it's
target-specific or something else).
This simple linked-list traversal compiles to slightly bulkier code than it
needs to:
int traverse(struct foo_head *ph)
{
int a = -1;
struct foo *p, *pprev;
pprev = p = ph->h;
while (p != NULL) {
pprev = p;
p = p->n;
}
if (pprev)
a = pprev->a;
return a;
}
(gcc 5.3.0 -O3 on godbolt: http://goo.gl/r8vb5L)
movq (%rdi), %rdx
movl $-1, %eax ; only needs to happen in the early-out case
testq %rdx, %rdx
jne .L3 ; jne/ret or je / fall through would be better
jmp .L9
.L5:
movq %rax, %rdx
.L3:
movq (%rdx), %rax
testq %rax, %rax
jne .L5
movl 8(%rdx), %eax
ret
.L9:
; ARM / PPC gcc 4.8.2 put the a=-1 down here
ret ; this is a rep ret without -mtune=intel
Clang 3.7 chooses a better layout with a je .early_out instead the jne / jmp.
It arranges the loop so it can enter at the top. It actually look pretty
optimal:
movq (%rdi), %rcx
movl $-1, %eax
testq %rcx, %rcx
je .LBB0_3
.LBB0_1: # %.lr.ph
movq %rcx, %rax
movq (%rax), %rcx
testq %rcx, %rcx
jne .LBB0_1
movl 8(%rax), %eax
.LBB0_3: # %._crit_edge.thread
retq
Getting the mov $-1 out of the common case would require a separate mov/ret
block after the normal ret, so it's a code-size tradeoff which isn't worth it,
because a mov-immediate is dirt cheap.
Anyway, there are a couple different ways to lay out the branches and the mov
$-1, %eax, but gcc's choice is in no way optimal. :(