[Bug target/87601] New: Missed opportunity for flag reuse and macro-op fusion on x86
vgatherps at gmail dot com
gcc-bugzilla@gcc.gnu.org
Fri Oct 12 18:39:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87601
Bug ID: 87601
Summary: Missed opportunity for flag reuse and macro-op fusion
on x86
Product: gcc
Version: 8.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vgatherps at gmail dot com
Target Milestone: ---
When I compile the following code with gcc 8.2 and options -O2 (or Os) and
-mtune=intel (or broadwell):
int sum(int *vals, int l) {
int a = 0;
if (l <= 0) {
return 0;
}
for (int i = l; i != 0; i--) {
a += vals[i-1];
}
return a;
}
The following code is generated:
sum(int*, int):
xor eax, eax
test esi, esi
jle .L1
movsx rsi, esi
.L3:
add eax, DWORD PTR [rdi-4+rsi*4]
sub rsi, 1
test esi, esi
jne .L3
.L1:
ret
When passing -march=broadwell or -Os, sub is replaced by dec but otherwise it's
the same.
Inside the loop, the sequence:
sub rsi, 1
test esi, esi
jne .L3
can be replaced with:
sub rsi, 1
jne .L3
since sub rsi, 1 since that would set the same zero flag that test would. This
would improve macro-op fusion on relatively recent architectures as well.
Anecdotally, I've seen similar decisions being made along the lines of
sub index, 1
// some more asm here not using index
test index, index
jne loop_start
But don't have a nice clean test case for it. This suggests to me that the
optimization around flag reuse and macro-op fusion could be improved in
general, and I'll work on getting some clean test cases for other cases.
More information about the Gcc-bugs
mailing list