This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/72861] New: [7 Regression] 25% tramp3d-v4 performance regression on ppc64le
- From: "trippels at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 10 Aug 2016 08:18:46 +0000
- Subject: [Bug target/72861] New: [7 Regression] 25% tramp3d-v4 performance regression on ppc64le
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72861
Bug ID: 72861
Summary: [7 Regression] 25% tramp3d-v4 performance regression
on ppc64le
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: trippels at gcc dot gnu.org
Target Milestone: ---
Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
Build: powerpc64le-unknown-linux-gnu
Performance of tramp3d-v4 regressed more than 25% compared to gcc-6
on ppc64le (gcc112):
gcc-6:
trippels@gcc2-power8 ~ % ~/gcc_6/usr/local/bin/g++ -w -Ofast -mlra -mcpu=power8
tramp3d-v4.cpp
Performance counter stats for './a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20'
(5 runs):
1972.550946 task-clock (msec) # 0.999 CPUs utilized
( +- 0.22% )
159 context-switches # 0.081 K/sec
( +- 0.90% )
0 cpu-migrations # 0.000 K/sec
1,224 page-faults # 0.621 K/sec
( +- 0.02% )
6,748,308,064 cycles # 3.421 GHz
( +- 0.22% ) [66.46%]
102,294,018 stalled-cycles-frontend # 1.52% frontend cycles
idle ( +- 3.23% ) [49.91%]
4,241,962,795 stalled-cycles-backend # 62.86% backend cycles
idle ( +- 0.42% ) [50.41%]
7,902,269,951 instructions # 1.17 insns per cycle
# 0.54 stalled cycles per
insn ( +- 0.17% ) [67.10%]
740,198,353 branches # 375.249 M/sec
( +- 0.12% ) [50.14%]
12,209,406 branch-misses # 1.65% of all branches
( +- 0.25% ) [49.82%]
1.973964281 seconds time elapsed
( +- 0.22% )
gcc-7:
trippels@gcc2-power8 ~ % ~/gcc_7/usr/local/bin/g++ -w -Ofast -mlra -mcpu=power8
tramp3d-v4.cpp
Performance counter stats for './a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20'
(5 runs):
2677.865248 task-clock (msec) # 0.999 CPUs utilized
( +- 0.84% )
163 context-switches # 0.061 K/sec
( +- 1.77% )
0 cpu-migrations # 0.000 K/sec
( +-100.00% )
2,092 page-faults # 0.781 K/sec
( +- 0.03% )
9,149,015,944 cycles # 3.417 GHz
( +- 0.92% ) [66.65%]
105,804,553 stalled-cycles-frontend # 1.16% frontend cycles
idle ( +- 5.21% ) [50.12%]
6,383,265,282 stalled-cycles-backend # 69.77% backend cycles
idle ( +- 1.30% ) [50.31%]
8,980,496,614 instructions # 0.98 insns per cycle
# 0.71 stalled cycles per
insn ( +- 0.32% ) [66.96%]
682,369,238 branches # 254.818 M/sec
( +- 0.25% ) [49.93%]
10,159,864 branch-misses # 1.49% of all branches
( +- 0.61% ) [49.82%]
2.679415575 seconds time elapsed
( +- 0.84% )