This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/86270] New: Simple loop needs an extra register and an extra instruction
- From: "jamborm at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 21 Jun 2018 16:18:30 +0000
- Subject: [Bug tree-optimization/86270] New: Simple loop needs an extra register and an extra instruction
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86270
Bug ID: 86270
Summary: Simple loop needs an extra register and an extra
instruction
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
Target Milestone: ---
Compiling the following simple example with GCC 8 on an x86_64 with
just -O2 -S:
----------------------------------------
int *a;
long len;
int
test ()
{
for (int i = 0; i < len + 1; i++)
a[i]=i;
}
----------------------------------------
Results in a loop comparing the value before increment with the upper
bound after actually doing the incrementation, which means the loop
needs an extra register, an extra instruction, and has a rather
convoluted structure (alignment directives and some labels omitted):
----------------------------------------
test:
.cfi_startproc
movq len(%rip), %rcx
testq %rcx, %rcx
js .L2
movq a(%rip), %rsi
xorl %eax, %eax
jmp .L3
.L4:
movq %rdx, %rax
.L3:
movl %eax, (%rsi,%rax,4)
leaq 1(%rax), %rdx
cmpq %rax, %rcx
jne .L4
.L2:
ret
----------------------------------------
as opposed to GCC 7 or when compiling with -fno-tree-fwprop:
----------------------------------------
test:
.cfi_startproc
movq len(%rip), %rdx
testq %rdx, %rdx
js .L2
movq a(%rip), %rcx
addq $1, %rdx
xorl %eax, %eax
.L3:
movl %eax, (%rcx,%rax,4)
addq $1, %rax
cmpq %rdx, %rax
jne .L3
.L2:
ret
----------------------------------------
This problem (specifically the need for an extra register) causes
that, on an AMD Ryzen machine, 465.tonto is almost 5% faster when
compiled with -fno-tree-fwprop.