This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/86270] New: Simple loop needs an extra register and an extra instruction


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86270

            Bug ID: 86270
           Summary: Simple loop needs an extra register and an extra
                    instruction
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
  Target Milestone: ---

Compiling the following simple example with GCC 8 on an x86_64 with
just -O2 -S:

----------------------------------------
int *a;
long len;

int
test ()
{
  for (int i = 0; i < len + 1; i++)
    a[i]=i;
}
----------------------------------------

Results in a loop comparing the value before increment with the upper
bound after actually doing the incrementation, which means the loop
needs an extra register, an extra instruction, and has a rather
convoluted structure (alignment directives and some labels omitted):

----------------------------------------
test:
        .cfi_startproc
        movq    len(%rip), %rcx
        testq   %rcx, %rcx
        js      .L2
        movq    a(%rip), %rsi
        xorl    %eax, %eax
        jmp     .L3
.L4:
        movq    %rdx, %rax
.L3:
        movl    %eax, (%rsi,%rax,4)
        leaq    1(%rax), %rdx
        cmpq    %rax, %rcx
        jne     .L4
.L2:
        ret
----------------------------------------

as opposed to GCC 7 or when compiling with -fno-tree-fwprop:

----------------------------------------
test:
        .cfi_startproc
        movq    len(%rip), %rdx
        testq   %rdx, %rdx
        js      .L2
        movq    a(%rip), %rcx
        addq    $1, %rdx
        xorl    %eax, %eax
.L3:
        movl    %eax, (%rcx,%rax,4)
        addq    $1, %rax
        cmpq    %rdx, %rax
        jne     .L3
.L2:
        ret
----------------------------------------

This problem (specifically the need for an extra register) causes
that, on an AMD Ryzen machine, 465.tonto is almost 5% faster when
compiled with -fno-tree-fwprop.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]