[Bug tree-optimization/101481] New: -ftree-loop-distribute-patterns can slow down and increases size of code
andres at anarazel dot de
gcc-bugzilla@gcc.gnu.org
Sat Jul 17 02:59:16 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481
Bug ID: 101481
Summary: -ftree-loop-distribute-patterns can slow down and
increases size of code
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: andres at anarazel dot de
Target Milestone: ---
Created attachment 51168
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51168&action=edit
simplified example reproducing problem
Hi,
I found -ftree-loop-distribute-patterns to be far too aggressive in replacing
code, leading to increased code size and substantial slowdowns (12% in the
program I just hit this).
The code size increase & slowdown are partially caused by the function call
itself, and partially due to the spilling necessary to make that function call.
Worsened by the PLT call to memmove().
A very simplified example (also attached) is this:
typedef struct node
{
unsigned char chunks[4];
unsigned char count;
} node;
void
foo(node *a, unsigned char newchunk, unsigned char off)
{
if (a->count > 3)
__builtin_unreachable();
for (int i = a->count - 1; i >= off; i--)
a->chunks[i + 1] = a->chunks[i];
a->chunks[off] = newchunk;
}
which with `-O2 -fPIC` boils down to:
foo(node*, unsigned char, unsigned char):
pushq %r12
movl %edx, %r8d
movl %esi, %r12d
pushq %rbp
movq %rdi, %rbp
pushq %rbx
movzbl 4(%rdi), %ecx
movzbl %r8b, %ebx
leal -1(%rcx), %edx
cmpl %ebx, %edx
jl .L2
movl %ecx, %eax
movslq %edx, %rsi
subl %ebx, %ecx
subl $1, %ecx
movq %rsi, %rdx
subq %rcx, %rdx
leaq 1(%rcx), %r8
leaq (%rdi,%rdx), %rsi
movzbl %al, %edi
movq %r8, %rdx
movq %rdi, %rax
subq %rcx, %rax
leaq 0(%rbp,%rax), %rdi
call memmove@PLT
.L2:
movb %r12b, 0(%rbp,%rbx)
popq %rbx
popq %rbp
popq %r12
ret
compare to `-O2 -fPIC -fno-tree-loop-distribute-patterns`
foo(node*, unsigned char, unsigned char):
movzbl 4(%rdi), %eax
movzbl %dl, %edx
subl $1, %eax
cmpl %edx, %eax
jl .L2
cltq
.L3:
movzbl (%rdi,%rax), %ecx
movb %cl, 1(%rdi,%rax)
subq $1, %rax
cmpl %eax, %edx
jle .L3
.L2:
movb %sil, (%rdi,%rdx)
ret
Which I think makes the problem apparent.
Regards,
Andres Freund
More information about the Gcc-bugs
mailing list