[Bug target/87601] New: Missed opportunity for flag reuse and macro-op fusion on x86

Fri Oct 12 18:39:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87601

            Bug ID: 87601
           Summary: Missed opportunity for flag reuse and macro-op fusion
                    on x86
           Product: gcc
           Version: 8.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vgatherps at gmail dot com
  Target Milestone: ---

When I compile the following code with gcc 8.2 and options -O2 (or Os) and
-mtune=intel (or broadwell):

int sum(int *vals, int l) {
    int a = 0;
    if (l <= 0) {
        return 0;
    }
    for (int i = l; i != 0; i--) {
        a += vals[i-1];
    }
    return a;
}

The following code is generated:

sum(int*, int):
  xor eax, eax
  test esi, esi
  jle .L1
  movsx rsi, esi
.L3:
  add eax, DWORD PTR [rdi-4+rsi*4]
  sub rsi, 1
  test esi, esi
  jne .L3
.L1:
  ret

When passing -march=broadwell or -Os, sub is replaced by dec but otherwise it's
the same.

Inside the loop, the sequence:
  sub rsi, 1
  test esi, esi
  jne .L3

can be replaced with:
  sub rsi, 1
  jne .L3

since sub rsi, 1 since that would set the same zero flag that test would. This
would improve macro-op fusion on relatively recent architectures as well.
Anecdotally, I've seen similar decisions being made along the lines of 

sub index, 1

// some more asm here not using index

test index, index
jne loop_start

But don't have a nice clean test case for it. This suggests to me that the
optimization around flag reuse and macro-op fusion could be improved in
general, and I'll work on getting some clean test cases for other cases.