Bug 37273 - [4.4/4.5/4.6/4.7 Regression] IRA does not re-materializes addresses (loads from the TOC)
Summary: [4.4/4.5/4.6/4.7 Regression] IRA does not re-materializes addresses (loads fr...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.4.0
: P2 normal
Target Milestone: 4.7.0
Assignee: Jeffrey A. Law
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks: 4.7pending
  Show dependency treegraph
 
Reported: 2008-08-29 05:00 UTC by Andrew Pinski
Modified: 2011-03-17 20:17 UTC (History)
5 users (show)

See Also:
Host:
Target: powerpc64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-01-10 18:30:58


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2008-08-29 05:00:03 UTC
As reported here http://gcc.gnu.org/ml/gcc-patches/2008-08/msg02071.html, we now produce worse code for PPC64 Linux (and PPC AIX).
We produce now produce worse code for gcc.dg/20020103-1.c on powerpc64
with TOCs (and the stack space goes up, 256 vs 272).
Before IRA we emitted:
        ld 3,.LC0@toc(2)

but after we had spill the address to the stack which is wrose:
--- before.s    2008-08-28 02:14:27.000000000 +0900
+++ after.s     2008-08-28 02:14:13.000000000 +0900
@@ -40,8 +40,9 @@ bar:
        std 27,-40(1)
        std 28,-32(1)
        std 29,-24(1)
-       stdu 1,-256(1)
+       stdu 1,-272(1)
        lwzu 5,4(31)
+       std 9,112(1)
        lwa 4,0(30)
        lwa 3,0(9)
        extsw 5,5
@@ -52,12 +53,13 @@ bar:
        #asm
  # 0 "" 2
 #NO_APP
+       ld 9,112(1)
        mr 4,30
-       ld 3,.LC0@toc(2)
        mr 5,31
+       mr 3,9
        bl f2
        nop
-       addi 1,1,256
+       addi 1,1,272
        ld 0,16(1)
        ld 14,-144(1)
        mtlr 0

Also if we are going to spill it to memory, then why use another
register and not just r3?

Note the comment on the testcase is incorrect:
/* Verify that constant equivalences get reloaded properly, either by being
   spilled to the stack, or regenerated, but not dropped to memory.  */

We do regnerate it so the check should not be done on lP64 Powerpc (or
any PowerOpen [TOC] based ABIs); I should mention that powerpc64 elf
has @got but xcoff does not have a @got which is why GCC uses the old
way of producing the TOC section.

Also note this testcase now passes on 32bit PPC Linux but that is only because we spill the address instead of recreating it.

I bet this is a generic issue too and causes performance issues and stack usage on more than just PowerPC 64.
Comment 1 Andrew Pinski 2008-08-29 05:27:02 UTC
This spilling also happens on x86-darwin.  Note x86-darwin uses -fPIC by default.
For PPC it looks like IRA is not looking into REG_EQUAL:
We have in the dump right before IRA:
(insn:HI 29 26 30 2 gcc/gcc/testsuite/gcc.dg/20020103-1.c:38 (set (reg:DI 3 3)
        (reg/f:DI 119)) 352 {*movdi_internal64} (expr_list:REG_DEAD (reg/f:DI 119)
        (expr_list:REG_EQUAL (symbol_ref:DI ("ext") [flags 0xc0] <var_decl 0xf7cca6c0 ext>)
            (nil))))

So we had spilled reg 119 instead of the re-materialize the address from the REG_EQUAL.

x86-darwin looks like a different issue though, but it looks like IRA is not calling targetm.delegitimize_address to get the original address to re-materialize it after the asm.
Comment 2 Andrew Pinski 2008-12-31 21:30:03 UTC
Note the testcase is from gcc.dg/20020103-1.c.
Comment 3 Andrew Pinski 2010-03-02 18:18:18 UTC
Still happens on the trunk.
Comment 4 Jeffrey A. Law 2011-01-10 18:30:58 UTC
The issue I see here is IRA decides it is cheaper to allocate "ext" into a call-clobbered register (an asm makes all call-saved registers conflict with the pseudo holding "ext").  

So with "ext" in a call-clobbered hard register, caller-saves then realizes the value is live across calls and saves/restores it appropriately.

Regardless of whether or not the pseudo is allocated or not to a call clobbered register we're going to be loading it from memory (either the caller-save slot or the constant pool).

However, when allocated to a call-clobbered register, we have an extra store (to initialize the caller-save slot) and additional stack space.  So in theory we ought to be able to easily determine that allocating a call-clobbered hard reg for this value isn't a win.

I'm still poking.
Comment 5 Jeffrey A. Law 2011-01-12 13:38:20 UTC
First I think we need to note that if we allocate the pseudo holding the value for "ext" in memory that we get some savings (because "ext" is already sitting in memory waiting to be used).

This is very similar to how we recognize some savings from allocating a pseudo to memory when the pseudo represents an argument loaded from its stack slot in ira-costs.

Second, in that same code, we're double-counting the cost of allocating the pseudo to memory.

If those two problems are fixed, we get the desired PPC code; however, significant benchmarking would be required to determine how this change affects a broader codebase.

I'll note there's also a deficiency in caller-saves in that it insists on reloading the value into its original hard reg, even if the hard reg is going to be immediately copied to a different hard reg.  I don't want to spend too much time on the caller-save aspects since ultimately I think caller-save needs to go away or at least have another rewrite.  It's approaching 2 decades since it's last major revision.
Comment 6 Jeffrey A. Law 2011-01-17 16:05:40 UTC
Some sniff testing on x86 shows that fixing this also helps x86, so I'll be doing more rigorous benchmarking.
Comment 7 Jeffrey A. Law 2011-01-21 17:13:35 UTC
SPEC 2k6 testing on x86-64 is a wash for integer codes.  Small improvement for 403.gcc 462.libquantum and small regressions for 456.hmmr and 471.omnetpp are evident.  There's a couple other tests which seem to have a tiny bias for improvement, but without many more runs it'd be impossible to know if the improvements are in the noise or not.  The net specint change is a 14.5 -> 14.6, but I wouldn't put a lot of stock into that.

FP shows a big improvement for 454.calculix and a small improvement for 459.gemsfdtd.  Small regressions are seen for 410.bwaves, 433.milc and 434.zeusmp, and 437.leslie3d.  The improvement for calculix dwarfs the minor regressions and is probably responsible for the 13.6->13.7 improvement in the FP base.

I've also verified that the patch consistently improves code density on x86 and gives us the desired code on PPC for the testcase in the PR.

And, at least in my mind, the patch makes sense and I'll be posting it shortly.
Comment 8 Jeffrey A. Law 2011-01-25 14:10:51 UTC
Author: law
Date: Tue Jan 25 14:10:46 2011
New Revision: 169231

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169231
Log:
	PR rtl-optimization/37273
	* ira-costs.c (scan_one_insn): Detect constants living in memory and
	handle them like argument loads from stack slots.  Do not double
	count memory for memory constants and argument loads from stack slots.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-costs.c
Comment 9 Jeffrey A. Law 2011-01-25 14:11:33 UTC
Fixed
Comment 10 Jeffrey A. Law 2011-01-26 22:47:02 UTC
Followup patch:

http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01938.html
Comment 11 Diego Novillo 2011-02-02 17:46:20 UTC
Author: dnovillo
Date: Wed Feb  2 17:46:18 2011
New Revision: 169579

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169579
Log:
	PR rtl-optimization/37273
	* ira-costs.c (scan_one_insn): Detect constants living in memory and
	handle them like argument loads from stack slots.  Do not double
	count memory for memory constants and argument loads from stack slots.

Modified:
    branches/google/integration/gcc/ChangeLog
    branches/google/integration/gcc/ira-costs.c
Comment 12 Jeffrey A. Law 2011-02-15 21:22:03 UTC
Author: law
Date: Tue Feb 15 21:21:59 2011
New Revision: 170199

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170199
Log:

	Revert
	2011-01-25  Jeff Law  <law@redhat.com>

	PR rtl-optimization/37273
	* ira-costs.c (scan_one_insn): Detect constants living in memory and
	handle them like argument loads from stack slots.  Do not double
	count memory for memory constants and argument loads from stack slots.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-costs.c
Comment 13 Jeffrey A. Law 2011-02-15 21:24:13 UTC
Patch reverted due to exposing several latent bugs.  Bugfixes for the latent issues have been kept.  Patch queued for gcc-4.7.  Prior to installing for gcc-4.7 sparc-rtems needs testing.
Comment 14 Jeffrey A. Law 2011-03-17 20:08:06 UTC
Author: law
Date: Thu Mar 17 20:08:01 2011
New Revision: 171111

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171111
Log:

	PR rtl-optimization/37273
	* ira-costs.c (scan_one_insn): Detect constants living in memory and
	handle them like argument loads from stack slots.  Do not double
	count memory for memory constants and argument loads from stack slots.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-costs.c
Comment 15 Jeffrey A. Law 2011-03-17 20:17:23 UTC
reinstalling fix now that we're open for stage1