This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/67606] New: Missing optimization: load possible return value before early-out test
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 17 Sep 2015 06:08:33 +0000
- Subject: [Bug c/67606] New: Missing optimization: load possible return value before early-out test
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67606
Bug ID: 67606
Summary: Missing optimization: load possible return value
before early-out test
Product: gcc
Version: 5.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Sry if this is a duplicate; I couldn't think of any good search terms to
describe this.
In a function that might return a constant if a loop condition is false for the
first iteration, gcc generates a separate basic block for the early-out return
when it doesn't need to.
int f(int a[], int length) {
int count=0;
for (int i = 0 ; i < length ; i++)
if (a[i] > 5)
count++;
return count;
}
x86 gcc 5.2 -O3 -fno-tree-vectorize compiles it to:
f(int*, int):
testl %esi, %esi # length
jle .L4 #,
leal -1(%rsi), %eax #, D.2373
leaq 4(%rdi,%rax,4), %rcx #, D.2371
xorl %eax, %eax # count
.L3:
xorl %edx, %edx # D.2370
cmpl $5, (%rdi) #, MEM[base: _29, offset: 0B]
setg %dl #, D.2370
addq $4, %rdi #, ivtmp.7
addl %edx, %eax # D.2370, count
cmpq %rdi, %rcx # ivtmp.7, D.2371
jne .L3 #,
rep ret
.L4:
xorl %eax, %eax # count
ret
(actually g++, since I used godbolt.org, but gcc 4.9.2 looked similar)
A better sequence would be:
f(int*, int):
xorl %eax, %eax # count
testl %esi, %esi # length
jle .L4 #,
... unchanged ...
cmpq %rdi, %rcx # ivtmp.7, D.2371
jne .L3 #,
.L4:
rep ret
In this case, the ret was already a rep ret, so this change saves ~3
instruction bytes, (or more usually, none, because of padding) at zero speed
cost in either the taken or non-taken early-out case.
Also, looks like a separate bug, but what the crap is gcc doing with the two
lea instructions? Defending against integer overflow when calculating a
one-past-the end loop boundary pointer? It doesn't do that when f() takes a
size_t arg. (In this case, nothing depends on the lea results soon enough
for it to matter. They're dependent, anyway, and the first one is one of the
first 4 insns.
leal -1(%rsi), %eax #, D.2373
leaq 4(%rdi,%rax,4), %rcx #, D.2371
xorl %eax, %eax # count
Unless I'm missing something, leaq (%rdi, %rsi, 4), %rcx should be just as
safe. Is something adding the -1-before-scaling, correct after scaling thing
for a reason? Is it something that doesn't realize it will be implemented with
lea, which avoids any problems? It's an extra instruction, and 4(...) stores
the offset as a 32bit constant in the insn encoding, not a sign-extended 8bit
value.