As reported here http://gcc.gnu.org/ml/gcc-patches/2008-08/msg02071.html, we now produce worse code for PPC64 Linux (and PPC AIX). We produce now produce worse code for gcc.dg/20020103-1.c on powerpc64 with TOCs (and the stack space goes up, 256 vs 272). Before IRA we emitted: ld 3,.LC0@toc(2) but after we had spill the address to the stack which is wrose: --- before.s 2008-08-28 02:14:27.000000000 +0900 +++ after.s 2008-08-28 02:14:13.000000000 +0900 @@ -40,8 +40,9 @@ bar: std 27,-40(1) std 28,-32(1) std 29,-24(1) - stdu 1,-256(1) + stdu 1,-272(1) lwzu 5,4(31) + std 9,112(1) lwa 4,0(30) lwa 3,0(9) extsw 5,5 @@ -52,12 +53,13 @@ bar: #asm # 0 "" 2 #NO_APP + ld 9,112(1) mr 4,30 - ld 3,.LC0@toc(2) mr 5,31 + mr 3,9 bl f2 nop - addi 1,1,256 + addi 1,1,272 ld 0,16(1) ld 14,-144(1) mtlr 0 Also if we are going to spill it to memory, then why use another register and not just r3? Note the comment on the testcase is incorrect: /* Verify that constant equivalences get reloaded properly, either by being spilled to the stack, or regenerated, but not dropped to memory. */ We do regnerate it so the check should not be done on lP64 Powerpc (or any PowerOpen [TOC] based ABIs); I should mention that powerpc64 elf has @got but xcoff does not have a @got which is why GCC uses the old way of producing the TOC section. Also note this testcase now passes on 32bit PPC Linux but that is only because we spill the address instead of recreating it. I bet this is a generic issue too and causes performance issues and stack usage on more than just PowerPC 64.
This spilling also happens on x86-darwin. Note x86-darwin uses -fPIC by default. For PPC it looks like IRA is not looking into REG_EQUAL: We have in the dump right before IRA: (insn:HI 29 26 30 2 gcc/gcc/testsuite/gcc.dg/20020103-1.c:38 (set (reg:DI 3 3) (reg/f:DI 119)) 352 {*movdi_internal64} (expr_list:REG_DEAD (reg/f:DI 119) (expr_list:REG_EQUAL (symbol_ref:DI ("ext") [flags 0xc0] <var_decl 0xf7cca6c0 ext>) (nil)))) So we had spilled reg 119 instead of the re-materialize the address from the REG_EQUAL. x86-darwin looks like a different issue though, but it looks like IRA is not calling targetm.delegitimize_address to get the original address to re-materialize it after the asm.
Note the testcase is from gcc.dg/20020103-1.c.
Still happens on the trunk.
The issue I see here is IRA decides it is cheaper to allocate "ext" into a call-clobbered register (an asm makes all call-saved registers conflict with the pseudo holding "ext"). So with "ext" in a call-clobbered hard register, caller-saves then realizes the value is live across calls and saves/restores it appropriately. Regardless of whether or not the pseudo is allocated or not to a call clobbered register we're going to be loading it from memory (either the caller-save slot or the constant pool). However, when allocated to a call-clobbered register, we have an extra store (to initialize the caller-save slot) and additional stack space. So in theory we ought to be able to easily determine that allocating a call-clobbered hard reg for this value isn't a win. I'm still poking.
First I think we need to note that if we allocate the pseudo holding the value for "ext" in memory that we get some savings (because "ext" is already sitting in memory waiting to be used). This is very similar to how we recognize some savings from allocating a pseudo to memory when the pseudo represents an argument loaded from its stack slot in ira-costs. Second, in that same code, we're double-counting the cost of allocating the pseudo to memory. If those two problems are fixed, we get the desired PPC code; however, significant benchmarking would be required to determine how this change affects a broader codebase. I'll note there's also a deficiency in caller-saves in that it insists on reloading the value into its original hard reg, even if the hard reg is going to be immediately copied to a different hard reg. I don't want to spend too much time on the caller-save aspects since ultimately I think caller-save needs to go away or at least have another rewrite. It's approaching 2 decades since it's last major revision.
Some sniff testing on x86 shows that fixing this also helps x86, so I'll be doing more rigorous benchmarking.
SPEC 2k6 testing on x86-64 is a wash for integer codes. Small improvement for 403.gcc 462.libquantum and small regressions for 456.hmmr and 471.omnetpp are evident. There's a couple other tests which seem to have a tiny bias for improvement, but without many more runs it'd be impossible to know if the improvements are in the noise or not. The net specint change is a 14.5 -> 14.6, but I wouldn't put a lot of stock into that. FP shows a big improvement for 454.calculix and a small improvement for 459.gemsfdtd. Small regressions are seen for 410.bwaves, 433.milc and 434.zeusmp, and 437.leslie3d. The improvement for calculix dwarfs the minor regressions and is probably responsible for the 13.6->13.7 improvement in the FP base. I've also verified that the patch consistently improves code density on x86 and gives us the desired code on PPC for the testcase in the PR. And, at least in my mind, the patch makes sense and I'll be posting it shortly.
Author: law Date: Tue Jan 25 14:10:46 2011 New Revision: 169231 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169231 Log: PR rtl-optimization/37273 * ira-costs.c (scan_one_insn): Detect constants living in memory and handle them like argument loads from stack slots. Do not double count memory for memory constants and argument loads from stack slots. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-costs.c
Fixed
Followup patch: http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01938.html
Author: dnovillo Date: Wed Feb 2 17:46:18 2011 New Revision: 169579 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169579 Log: PR rtl-optimization/37273 * ira-costs.c (scan_one_insn): Detect constants living in memory and handle them like argument loads from stack slots. Do not double count memory for memory constants and argument loads from stack slots. Modified: branches/google/integration/gcc/ChangeLog branches/google/integration/gcc/ira-costs.c
Author: law Date: Tue Feb 15 21:21:59 2011 New Revision: 170199 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170199 Log: Revert 2011-01-25 Jeff Law <law@redhat.com> PR rtl-optimization/37273 * ira-costs.c (scan_one_insn): Detect constants living in memory and handle them like argument loads from stack slots. Do not double count memory for memory constants and argument loads from stack slots. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-costs.c
Patch reverted due to exposing several latent bugs. Bugfixes for the latent issues have been kept. Patch queued for gcc-4.7. Prior to installing for gcc-4.7 sparc-rtems needs testing.
Author: law Date: Thu Mar 17 20:08:01 2011 New Revision: 171111 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171111 Log: PR rtl-optimization/37273 * ira-costs.c (scan_one_insn): Detect constants living in memory and handle them like argument loads from stack slots. Do not double count memory for memory constants and argument loads from stack slots. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-costs.c
reinstalling fix now that we're open for stage1