This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107
- From: "dominiq at lps dot ens.fr" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 10 Dec 2011 18:39:15 +0000
- Subject: [Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107
- Auto-submitted: auto-generated
- References: <bug-51497-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51497
--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-12-10 18:39:15 UTC ---
The profiles are without -flto:
+ 34.6%, nf3dprecon.2105.constprop.1, a.out
| 34.6%, nf2dprecon.2116, a.out
33.5%, spmmult.2139, a.out
+ 29.8%, nfcg_, a.out
| + 7.6%, nf3dprecon.2105.constprop.1, a.out
| | 0.4%, nf2dprecon.2116, a.out
| 0.4%, nf2dprecon.2116, a.out
0.9%, mattest_, a.out
and with -flto
+ 37.7%, nf3dprecon.2105.2457.constprop.1.2435, a.out
| 37.7%, nf2dprecon.2116.2442.2436, a.out
32.7%, spmmult.2139.2426.2446, a.out
+ 27.6%, nfcg_, a.out
| + 7.0%, nf3dprecon.2105.2457.constprop.1.2435, a.out
| | 0.4%, nf2dprecon.2116.2442.2436, a.out
| 0.4%, nf2dprecon.2116.2442.2436, a.out
| 0.0%, free, libSystem.B.dylib
0.8%, mattest_, a.out
So the slow routines are nf2dprecon, accounting for ~1.2s, and spmmult,
accounting for ~0.5s. If I am reading the assembly correctly, in nf2dprecon,
the implicit loop
x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)
is unrolled eight times without -flto and four times with -flto. In spmmult,
the implicit loop
b = ad*x
is unrolled four times and vectorized without -flto and eight times, but not
vectorized, with -flto.
Note that --param max-unroll-times=4 does not change the times.