[Bug gcov-profile/94369] New: 505.mcf_r is 6-7% slower at -Ofast -march=native with PGO+LTO than with just LTO
jamborm at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Mar 27 19:39:30 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94369
Bug ID: 94369
Summary: 505.mcf_r is 6-7% slower at -Ofast -march=native with
PGO+LTO than with just LTO
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: gcov-profile
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
SPEC 2017 INTrate benchmark 505.mcf_r, when compiled with options
-Ofast -march=native -mtune=native, is 6-7% slower when compiled with
both PGO and LTO than when built with just LTO. I have observed this
on both AMD Zen2 (7%) and Intel Cascade Lake (6%) server CPUs. The
train run cannot be very bad because without LTO, PGO improves
run-time by 15% on both systems. This is with master revision
26b3e568a60.
Profiling results (from an AMD CPU):
LTO:
Overhead Samples Shared Object Symbol
........ ......... ............... ........................
39.53% 518450 mcf_r_peak.mine spec_qsort.constprop.0
22.13% 289745 mcf_r_peak.mine master.constprop.0
19.00% 248641 mcf_r_peak.mine replace_weaker_arc
9.37% 122669 mcf_r_peak.mine main
8.60% 112601 mcf_r_peak.mine spec_qsort.constprop.1
PGO+LTO:
Overhead Samples Shared Object Symbol
........ ......... ............... .......................................
40.13% 562770 mcf_r_peak.mine spec_qsort.constprop.0
21.68% 303543 mcf_r_peak.mine master.constprop.0
18.24% 255236 mcf_r_peak.mine replace_weaker_arc
10.32% 144433 mcf_r_peak.mine main
8.07% 112775 mcf_r_peak.mine arc_compare
Perhaps I should note that we have patched qsort in the benchmark to
work with strict aliasing even with LTO. But the performance gap is
there also with -fno-strict-aliasing.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
More information about the Gcc-bugs
mailing list