This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/86020] New: Performance regression in Eigen geometry.cpp test starting with r248333
- From: "wschmidt at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 31 May 2018 19:58:36 +0000
- Subject: [Bug tree-optimization/86020] New: Performance regression in Eigen geometry.cpp test starting with r248333
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86020
Bug ID: 86020
Summary: Performance regression in Eigen geometry.cpp test
starting with r248333
Product: gcc
Version: 8.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: wschmidt at gcc dot gnu.org
Target Milestone: ---
GCC 8.1 has regressed by about 30% compared with GCC 7.3 on one of the Eigen
test cases when measured on Power9 hardware (powerpc64le-linux-gnu). Similar
performance loss is seen on Power8 hardware as well. Pat Haugen did some
bisecting and found that this began with r248334.
2017-05-22 Jan Hubicka <hubicka@ucw.cz>
* ipa-inline.c (edge_badness): Use inlined_time instead of
inline_summaries->get.
The performance difference seems to be due to a change in inlining as a result
of the above patch. The symptom is a marked increase in stack load/store
activity.
An example occurs in one of the hottest routines:
_ZN5Eigen8internal20generic_product_implINS_5BlockIKNS_6MatrixIfLi4ELi4ELi0ELi4ELi4EEELi3ELi3ELb0EEENS3_IfLi3ELi8ELi0ELi3ELi8EEENS_10DenseShapeES8_Li3EE5addToINS2_IS7_Li3ELi8ELb1EEEEEvRT_RKS6_RKS7_
Below is the code with GCC 8.1.0 :
0.15 : 1000b274: lfs f7,8(r9)
0.00 : 1000b278: li r5,2
0.00 : 1000b27c: addi r4,r1,464
0.06 : 1000b280: lfs f8,4(r9)
0.09 : 1000b284: lfs f9,0(r9)
0.00 : 1000b288: lfs f0,4(r31)
0.00 : 1000b28c: addi r3,r1,648
2.61 : 1000b290: lfs f10,32(r10)
0.71 : 1000b294: lfs f11,16(r10)
0.46 : 1000b298: lfs f12,0(r10)
1.99 : 1000b29c: fmuls f10,f10,f7
4.36 : 1000b2a0: fmadds f11,f11,f8,f10
4.20 : 1000b2a4: fmadds f12,f12,f9,f11
4.08 : 1000b2a8: fadds f0,f0,f12
7.67 : 1000b2ac: stfs f0,4(r31)
0.00 : 1000b2b0: bl 10004838
0.00 : 1000b2b4: nop
0.06 : 1000b2b8: ld r5,664(r1)
0.00 : 1000b2bc: ld r10,680(r1)
0.00 : 1000b2c0: ld r9,512(r1)
0.00 : 1000b2c4: addi r4,r1,32
0.00 : 1000b2c8: ld r6,688(r1)
0.06 : 1000b2cc: ld r7,696(r1)
0.00 : 1000b2d0: ld r8,712(r1)
0.03 : 1000b2d4: ld r0,648(r1)
0.00 : 1000b2d8: addi r3,r1,1104
0.00 : 1000b2dc: ld r11,704(r1)
0.09 : 1000b2e0: std r28,144(r1)
0.00 : 1000b2e4: std r30,152(r1)
0.00 : 1000b2e8: std r26,160(r1)
0.18 : 1000b2ec: std r5,48(r1)
0.09 : 1000b2f0: std r10,64(r1)
0.03 : 1000b2f4: ld r5,728(r1)
0.00 : 1000b2f8: ld r10,720(r1)
0.06 : 1000b2fc: std r9,136(r1)
0.00 : 1000b300: add r9,r9,r29
0.09 : 1000b304: std r0,32(r1)
0.46 : 1000b308: std r6,600(r1)
0.06 : 1000b30c: std r6,72(r1)
0.18 : 1000b310: std r7,608(r1)
0.09 : 1000b314: std r7,80(r1)
0.12 : 1000b318: std r11,88(r1)
0.40 : 1000b31c: std r9,120(r1)
0.09 : 1000b320: std r8,624(r1)
0.06 : 1000b324: std r8,96(r1)
0.03 : 1000b328: std r10,632(r1)
0.06 : 1000b32c: std r10,104(r1)
0.06 : 1000b330: std r5,112(r1)
0.15 : 1000b334: bl 1000a248 <calls the map base evaluator
function>
The additional instructions can be seen between addresses 1000b2b8 and
1000b330.
Below is the code for GCC 7.3.0 :
0.16 : 1000ac88: lfs f7,8(r9)
0.04 : 1000ac8c: ld r10,488(r1)
0.08 : 1000ac90: addi r4,r1,176
0.35 : 1000ac94: lfs f8,4(r9)
0.28 : 1000ac98: lfs f9,0(r9)
0.12 : 1000ac9c: lfs f0,0(r31)
0.00 : 1000aca0: ld r9,496(r1)
0.00 : 1000aca4: addi r3,r1,1160
3.39 : 1000aca8: lfs f10,32(r8)
0.87 : 1000acac: lfs f11,16(r8)
0.83 : 1000acb0: lfs f12,0(r8)
0.32 : 1000acb4: std r10,864(r1)
0.12 : 1000acb8: std r10,776(r1)
0.04 : 1000acbc: std r9,872(r1)
0.24 : 1000acc0: std r9,784(r1)
1.93 : 1000acc4: fmuls f10,f10,f7
4.38 : 1000acc8: fmadds f11,f11,f8,f10
5.48 : 1000accc: fmadds f12,f12,f9,f11
5.99 : 1000acd0: fadds f0,f0,f12
10.25 : 1000acd4: stfs f0,0(r31)
0.00 : 1000acd8: std r7,304(r1)
0.20 : 1000acdc: std r9,224(r1)
0.00 : 1000ace0: std r21,176(r1)
0.24 : 1000ace4: std r27,192(r1)
0.00 : 1000ace8: std r23,208(r1)
0.00 : 1000acec: std r24,232(r1)
0.04 : 1000acf0: std r26,240(r1)
0.00 : 1000acf4: std r30,248(r1)
0.12 : 1000acf8: std r26,256(r1)
0.79 : 1000acfc: std r28,264(r1)
0.16 : 1000ad00: std r25,280(r1)
0.08 : 1000ad04: std r30,288(r1)
0.28 : 1000ad08: std r29,296(r1)
0.00 : 1000ad0c: std r10,216(r1)
0.00 : 1000ad10: bl 1000aae8 <calls the map base evaluator
function>
The GCC 8.1 code has an additional call, no longer inlined, to the constructor
Eigen::Matrix(). We suspect this and similar changes in other variants of this
function are responsible.
It is hard to understand the patch that made this change out of context. I
went back to the gcc-patches mailing list to look for discussion, but I wasn't
able to find a justification for this change. Jan, could you explain its
purpose?