Bug 89154 - 5% degradation of CPU2006 473.astar starting with r266305
Summary: 5% degradation of CPU2006 473.astar starting with r266305
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 83215
Blocks: spec
  Show dependency treegraph
 
Reported: 2019-02-01 20:45 UTC by Pat Haugen
Modified: 2023-06-02 03:44 UTC (History)
6 users (show)

See Also:
Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
Build: powerpc64le-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pat Haugen 2019-02-01 20:45:13 UTC
Not sure if this is really tree-optimization issue, just picked as initial component since fix dealt with that. Could possibly be rtl-optimization/shrink-wrap issue brought about by additional register pressure due to CSE'ing/hoisting some additional code.

Funtion way2obj::releasepoint() degrades 20% starting with r266305. Looking at perf output, the main difference seems to be that we're no longer shrink-wrapping the early exit test at the start of the function.

Following is the annotated assembly of the start of the function.

r266304:
--------
0000000010006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int, int) total: 2032811 22.9279 */
               :    10006a40:   lis     r2,4098
               :    10006a44:   addi    r2,r2,32512
 95384  1.0758 :    10006a48:   lwz     r9,4424(r3)
               :    10006a4c:   ld      r8,8(r3)
119001  1.3422 :    10006a50:   lhz     r7,16(r3)
     1 1.1e-05 :    10006a54:   mullw   r9,r9,r5
               :    10006a58:   add     r9,r9,r4
               :    10006a5c:   extsw   r9,r9
169526  1.9121 :    10006a60:   rldicr  r9,r9,2,61
               :    10006a64:   lhzx    r10,r8,r9
 21865  0.2466 :    10006a68:   cmpw    r10,r7
               :    10006a6c:   beqlr



r266305:
--------
0000000010006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int, int) total: 2440798 26.2354 */
               :    10006a40:   lis     r2,4098
               :    10006a44:   addi    r2,r2,32512
 35498  0.3816 :    10006a48:   lwa     r6,4424(r3)
               :    10006a4c:   ld      r7,8(r3)
 26361  0.2833 :    10006a50:   std     r30,-16(r1)
               :    10006a54:   mr      r30,r3
157660  1.6946 :    10006a58:   mfcr    r12
162000  1.7413 :    10006a5c:   lhz     r3,16(r3)
    17 1.8e-04 :    10006a60:   std     r23,-72(r1)
   139  0.0015 :    10006a64:   mr      r23,r4
     2 2.1e-05 :    10006a68:   mullw   r9,r6,r5
    59 6.3e-04 :    10006a6c:   stw     r12,8(r1)
244832  2.6316 :    10006a70:   stdu    r1,-112(r1)
     4 4.3e-05 :    10006a74:   add     r9,r9,r4
     5 5.4e-05 :    10006a78:   extsw   r9,r9
   201  0.0022 :    10006a7c:   rldicr  r8,r9,2,61
   343  0.0037 :    10006a80:   add     r4,r7,r8
     9 9.7e-05 :    10006a84:   lhzx    r10,r7,r8
151595  1.6294 :    10006a88:   cmpw    r10,r3
               :    10006a8c:   beq     10006c64 <_ZN7way2obj12releasepointEii+0x224>

The target of the conditional branch in the slow version is just the epilogue code to restore R1, R23, R30 and CR3/CR4 and return.
Comment 1 Segher Boessenkool 2019-02-02 22:59:08 UTC
The new version needs to save r4 because it reuses the reg for storing r7+r8.
And we still don't wrap CR separately, sigh.
Comment 2 Richard Biener 2019-02-04 09:09:08 UTC
r266305 made type-based alias analysis stronger (both on GIMPLE and RTL), this
really looks like an unfortunate side-effect or a missed shrink-wrapping opportunity.
Comment 3 Pat Haugen 2019-02-05 21:52:21 UTC
(In reply to Segher Boessenkool from comment #1)
> The new version needs to save r4 because it reuses the reg for storing r7+r8.
> And we still don't wrap CR separately, sigh.

Yes, and similar for r3 since it's reused in the block. Another thing that could be moved is the r1 adjustment, is that also a component that isn't handled separately?
Comment 4 Segher Boessenkool 2019-02-06 11:23:10 UTC
The r1 adjustment is establishing the stack frame.  It needs to precede all
stack accesses (not just those by the prologue saves!)  We could separately
wrap it, if that would help?  You can then get multiple copies of it, that
will be the only real benefit.