This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug lto/45810] 40% slowdown when using LTO for a single-file program
- From: "dominiq at lps dot ens.fr" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 22 Sep 2011 15:25:48 +0000
- Subject: [Bug lto/45810] 40% slowdown when using LTO for a single-file program
- Auto-submitted: auto-generated
- References: <bug-45810-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #26 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-09-22 15:25:48 UTC ---
AFAICT this pr has been fixed since some time. Here are the results I get on
x86_64-apple-darwin10 (Core2Duo 2.53Ghz, 3Mb cache, 4Gb RAM) at revision
179079:
Compile options : -fprotect-parens -Ofast -funroll-loops -fwhole-program
without -flto with -flto
Benchmark Compile Executable Ave Run Compile Executable Ave Run
Name (secs) (bytes) (secs) (secs) (bytes) (secs)
--------- ------- ---------- ------- ------- ---------- -------
ac 3.28 54936 8.81 6.64 54968 8.81
aermod 75.46 1184280 18.65 131.50 1212648 18.20
air 11.24 106336 7.26 22.38 106904 7.39
capacita 3.87 77152 41.29 7.36 77200 41.31
channel 1.25 34744 3.03 2.39 34864 3.03
doduc 12.40 200016 28.02 22.47 200496 27.69
fatigue 4.06 77400 4.83 8.17 77488 4.84
gas_dyn 9.32 119256 4.92 16.64 119816 4.92
induct 7.37 148840 13.83 14.76 153224 13.84
linpk 0.70 26024 21.64 1.93 26064 21.64
mdbx 3.77 80864 12.46 7.21 81040 12.46
nf 4.08 71848 19.34 8.07 71896 19.35
protein 15.17 131304 35.30 26.05 127224 35.48
rnflow 12.58 130888 28.25 23.76 131000 26.92
test_fpu 4.78 92968 10.63 13.35 93024 10.64
tfft 0.74 22352 3.28 1.98 22432 3.28
Geometric Mean Execution Time = 12.23 secs 12.18 secs
Compile options : -fprotect-parens -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer --param max-inline-insns-auto=200 -fwhole-program
without -flto with -flto
Benchmark Compile Executable Ave Run Compile Executable Ave Run
Name (secs) (bytes) (secs) (secs) (bytes) (secs)
--------- ------- ---------- ------- ------- ---------- -------
ac 4.05 54904 8.11 8.18 54920 8.11
aermod 101.55 1494688 18.17 169.63 1527120 18.12
air 14.46 114328 7.05 30.35 114912 7.04
capacita 5.39 97552 40.24 10.80 97584 40.21
channel 1.68 38792 2.91 3.17 38888 2.91
doduc 12.98 208112 27.47 25.77 208584 27.52
fatigue 4.84 81440 2.95 10.27 81504 2.93
gas_dyn 13.55 143776 4.86 25.03 144392 4.86
induct 12.95 189872 13.78 24.32 190176 13.96
linpk 0.73 21856 21.69 2.44 21888 21.69
mdbx 4.32 84928 12.45 9.39 85104 12.54
nf 7.41 92248 18.93 17.82 92272 18.91
protein 17.26 160040 35.51 31.08 155984 35.47
rnflow 15.16 138880 28.27 27.28 139040 26.85
test_fpu 5.05 92872 10.65 14.65 92928 10.65
tfft 0.75 22352 3.28 1.72 22432 3.28
Geometric Mean Execution Time = 11.67 secs 11.64 secs
The option -flto improves the run time for rnflow.f90 by ~5% without slowdown
for the other tests. Could these results be checked on other platforms and this
PR closed if they agree with mine?