[Bug middle-end/90283] New: 519.lbm_r is 7%-10% slower with -Ofast -march=native and both LTO and PGO than with GCC 8
jamborm at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Apr 29 17:45:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90283
Bug ID: 90283
Summary: 519.lbm_r is 7%-10% slower with -Ofast -march=native
and both LTO and PGO than with GCC 8
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
CC: rsandifo at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
When I build 519.lbm_r with GCC 9 (specifically, r270364) using -Ofast
-march=native -mtune=native and both LTO and PGO, the binary is then
about 7%-10% slower than when built with GCC 8 and the same options.
I can see this on both and AMD Zen machine (10%) and an Intel Skylake
server (7%).
I have bisected the regression on the Zen machine where it regressed
in two steps. The first one is r260348, which causes a 7% regression
on both the Zen and Intel server CPUs. Because it affects both in a
similar way, I hope it is not another manifestation of PR 84200.
As far as profile data are concerned, in all cases 99% of run-time is
spent in function main. Perf stat output is the following:
Fast (r260347) on Zen:
Performance counter stats for 'numactl -C 0 -l specinvoke':
157862.072201 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
4354 page-faults:u # 0.028 K/sec
490921430199 cycles:u #
5942617830 stalled-cycles-frontend:u # 1.21% frontend cycles
idle (83.36%)
11565687163 stalled-cycles-backend:u # 2.36% backend cycles
idle (83.32%)
1121945505076 instructions:u # 2.29 insn per cycle
# 0.01 stalled cycles per
insn (83.32%)
11591019938 branches:u # 73.425 M/sec
(83.36%)
50878910 branch-misses:u # 0.44% of all branches
(83.33%)
158.013578100 seconds time elapsed
Slower (r260348) on Zen:
Performance counter stats for 'numactl -C 0 -l specinvoke':
166747.570030 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
4354 page-faults:u # 0.026 K/sec
520147919104 cycles:u #
4619521659 stalled-cycles-frontend:u # 0.89% frontend cycles
idle (83.32%)
11565577319 stalled-cycles-backend:u # 2.22% backend cycles
idle (83.32%)
1133497632829 instructions:u # 2.18 insn per cycle
# 0.01 stalled cycles per
insn (83.36%)
11583199072 branches:u # 69.465 M/sec
(83.33%)
50821264 branch-misses:u # 0.44% of all branches
(83.32%)
166.898923990 seconds time elapsed
The second performance drop on Zen happened at r265795, albeit only by
3% and the revision does not seem to have any effect on the Intel CPU
(and thus given how weirdly the benchmark can sometimes behave, may
not be that interesting).
Just before the second drop (r265794):
Performance counter stats for 'numactl -C 0 -l specinvoke':
165315.997872 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
4354 page-faults:u # 0.026 K/sec
520201473687 cycles:u #
4890796962 stalled-cycles-frontend:u # 0.94% frontend cycles
idle (83.37%)
11565134531 stalled-cycles-backend:u # 2.22% backend cycles
idle (83.32%)
1132849187518 instructions:u # 2.18 insn per cycle
# 0.01 stalled cycles per
insn (83.31%)
11591493304 branches:u # 70.117 M/sec
(83.37%)
50879513 branch-misses:u # 0.44% of all branches
(83.32%)
165.498590592 seconds time elapsed
Second drop (r265795):
Performance counter stats for 'numactl -C 0 -l specinvoke':
170908.963939 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
4430 page-faults:u # 0.026 K/sec
539336426342 cycles:u #
3889378937 stalled-cycles-frontend:u # 0.72% frontend cycles
idle (83.36%)
11564727183 stalled-cycles-backend:u # 2.14% backend cycles
idle (83.32%)
1146203876321 instructions:u # 2.13 insn per cycle
# 0.01 stalled cycles per
insn (83.31%)
11589809180 branches:u # 67.813 M/sec
(83.37%)
50679537 branch-misses:u # 0.44% of all branches
(83.32%)
171.089470855 seconds time elapsed
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
More information about the Gcc-bugs
mailing list