[Bug optimization/12771] New: Weak loop optimizer, significant performance regression
tm at kloo dot net
gcc-bugzilla@gcc.gnu.org
Sat Oct 25 00:54:00 GMT 2003
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12771
Summary: Weak loop optimizer, significant performance regression
Product: gcc
Version: 3.4
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tm at kloo dot net
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i386-linux
GCC host triplet: i386-linux
GCC target triplet: i386-linux
This is based on Scott Robert Ladd's lpbench benchmark, which is derived from
linpack. He found a significant performance improvement on linpack when
-freduce-all-givs was used. This is the analysis of his situation using
gcc-3.4-20031024.
The majority of the time in Linpack is spent in the second loop in daxpy(). This
is compiled using "-O2 -S" to the following code:
.L98:
movl 20(%ebp), %edx <- memory ref #1
flds (%edx,%eax,4) <- memory ref #2
movl 12(%ebp), %edx <- memory ref #3
fmuls (%edx,%eax,4) <- memory ref #4
incl %eax
faddp %st, %st(1) <- memory ref #5
Here is the code as compiled with -freduce-all-givs:
.L85:
flds (%ecx) <- memory ref #1
addl $4, %ecx
fmuls (%edx) <- memory ref #2
addl $4, %edx
decl %eax
faddp %st, %st(1) <- memory ref #3
jne .L85
Basically, by default the loop optimizer chooses to optimize:
for (i = 0;i < n; i++) {
dy[i] = dy[i] + da*dx[i];
}
using a dual-register indirect addressing mode 4(%edx,%eax). This is bad because
it uses an extra register which causes the register allocator to reload dx and
dy every iteration through the loop, which results in two extra memory loads in
the inner loop.
The -freduce-all-givs version eliminates the biv which frees up a register, and
this removes two memory loads in the inner loop.
The loop optimizer should be able to estimate register pressure and should
eliminate the biv (perform giv reduction) automagically if it will reduce
register pressure.
More information about the Gcc-bugs
mailing list