This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c/67606] New: Missing optimization: load possible return value before early-out test


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67606

            Bug ID: 67606
           Summary: Missing optimization: load possible return value
                    before early-out test
           Product: gcc
           Version: 5.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

Sry if this is a duplicate; I couldn't think of any good search terms to
describe this.


In a function that might return a constant if a loop condition is false for the
first iteration, gcc generates a separate basic block for the early-out return
when it doesn't need to.

int f(int a[], int length) {
  int count=0;
  for (int i = 0 ; i < length ; i++)
    if (a[i] > 5)
      count++;

  return count;
}

x86 gcc 5.2 -O3 -fno-tree-vectorize compiles it to:

f(int*, int):
        testl   %esi, %esi      # length
        jle     .L4     #,
        leal    -1(%rsi), %eax  #, D.2373
        leaq    4(%rdi,%rax,4), %rcx    #, D.2371
        xorl    %eax, %eax      # count
.L3:
        xorl    %edx, %edx      # D.2370
        cmpl    $5, (%rdi)      #, MEM[base: _29, offset: 0B]
        setg    %dl     #, D.2370
        addq    $4, %rdi        #, ivtmp.7
        addl    %edx, %eax      # D.2370, count
        cmpq    %rdi, %rcx      # ivtmp.7, D.2371
        jne     .L3     #,
        rep ret
.L4:
        xorl    %eax, %eax      # count
        ret

(actually g++, since I used godbolt.org, but gcc 4.9.2 looked similar)


A better sequence would be:

f(int*, int):
        xorl    %eax, %eax      # count
        testl   %esi, %esi      # length
        jle     .L4     #,

... unchanged ...

        cmpq    %rdi, %rcx      # ivtmp.7, D.2371
        jne     .L3     #,
.L4:
        rep ret



In this case, the ret was already a rep ret, so this change saves ~3
instruction bytes, (or more usually, none, because of padding) at zero speed
cost in either the taken or non-taken early-out case.


Also, looks like a separate bug, but what the crap is gcc doing with the two
lea instructions?  Defending against integer overflow when calculating a
one-past-the end loop boundary pointer?  It doesn't do that when f() takes a
size_t arg.    (In this case, nothing depends on the lea results soon enough
for it to matter.  They're dependent, anyway, and the first one is one of the
first 4 insns.

leal    -1(%rsi), %eax  #, D.2373
        leaq    4(%rdi,%rax,4), %rcx    #, D.2371
        xorl    %eax, %eax      # count

Unless I'm missing something, leaq (%rdi, %rsi, 4), %rcx  should be just as
safe.  Is something adding the -1-before-scaling, correct after scaling thing
for a reason?  Is it something that doesn't realize it will be implemented with
lea, which avoids any problems?  It's an extra instruction, and 4(...) stores
the offset as a 32bit constant in the insn encoding, not a sign-extended 8bit
value.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]