[Bug tree-optimization/101481] New: -ftree-loop-distribute-patterns can slow down and increases size of code

Sat Jul 17 02:59:16 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481

            Bug ID: 101481
           Summary: -ftree-loop-distribute-patterns can slow down and
                    increases size of code
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andres at anarazel dot de
  Target Milestone: ---

Created attachment 51168
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51168&action=edit
simplified example reproducing problem

Hi,

I found -ftree-loop-distribute-patterns to be far too aggressive in replacing
code, leading to increased code size and substantial slowdowns (12% in the
program I just hit this).

The code size increase & slowdown are partially caused by the function call
itself, and partially due to the spilling necessary to make that function call.
Worsened by the PLT call to memmove().

A very simplified example (also attached) is this:

typedef struct node
{
    unsigned char chunks[4];
    unsigned char count;
} node;

void
foo(node *a, unsigned char newchunk, unsigned char off)
{
    if (a->count > 3)
        __builtin_unreachable();

    for (int i = a->count - 1; i >= off; i--)
        a->chunks[i + 1] = a->chunks[i];
    a->chunks[off] = newchunk;
}

which with `-O2 -fPIC` boils down to:
foo(node*, unsigned char, unsigned char):
        pushq   %r12
        movl    %edx, %r8d
        movl    %esi, %r12d
        pushq   %rbp
        movq    %rdi, %rbp
        pushq   %rbx
        movzbl  4(%rdi), %ecx
        movzbl  %r8b, %ebx
        leal    -1(%rcx), %edx
        cmpl    %ebx, %edx
        jl      .L2
        movl    %ecx, %eax
        movslq  %edx, %rsi
        subl    %ebx, %ecx
        subl    $1, %ecx
        movq    %rsi, %rdx
        subq    %rcx, %rdx
        leaq    1(%rcx), %r8
        leaq    (%rdi,%rdx), %rsi
        movzbl  %al, %edi
        movq    %r8, %rdx
        movq    %rdi, %rax
        subq    %rcx, %rax
        leaq    0(%rbp,%rax), %rdi
        call    memmove@PLT
.L2:
        movb    %r12b, 0(%rbp,%rbx)
        popq    %rbx
        popq    %rbp
        popq    %r12
        ret

compare to `-O2 -fPIC -fno-tree-loop-distribute-patterns`

foo(node*, unsigned char, unsigned char):
        movzbl  4(%rdi), %eax
        movzbl  %dl, %edx
        subl    $1, %eax
        cmpl    %edx, %eax
        jl      .L2
        cltq
.L3:
        movzbl  (%rdi,%rax), %ecx
        movb    %cl, 1(%rdi,%rax)
        subq    $1, %rax
        cmpl    %eax, %edx
        jle     .L3
.L2:
        movb    %sil, (%rdi,%rdx)
        ret

Which I think makes the problem apparent.

Regards,

Andres Freund