This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register

From: "law at redhat dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 19 Feb 2015 13:18:23 +0000
Subject: [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
Auto-submitted: auto-generated
References: <bug-64317-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #12 from Jeffrey A. Law <law at redhat dot com> ---
I'm very aware that the x86 backend doesn't support a fixed PIC register
anymore.  

RA was going to have to spill something.  THe PIC register is needed in three
different loops, L0, L1 and L2.  L0 needs 9 general purpose registers, L8 needs
8 general purpose registers and L2 needs 9 general purpose registers.  ie,
there's going to be spills, there's simply no way around it.

r107 (PIC pseudo) gets split into 3 allocnos.  A1, A11 and A237, covering Loops
0, 2, 1 respectively.  It's live throughout most of the resultant function by
way of explicit references and the need to have %ebx set up prior to external
calls.

For Loop 1 & Loop 2, the respective allocnos (A237, A11) are not used/set
within the loop at all, ie, they are transparent within their respective loops.
 IRA does exactly what we want here by keeping the PIC register in memory which
frees up a register within those loops for other objects that are used within
the loop.  Of course, to do that we have to reload the value for the uses
outside the boundary of those loops.

Loop 0 (bbs 0, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) is the most
interesting and is the one where we have those annoying reloads.

One of the things I notice is that LRA is generating sequences like:
(insn 581 89 90 6 (set (reg:SI 3 bx [107])
        (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 28 [0x1c])) [4 %sfp+-4 S4 A32])) j.c:19 90
{*movsi_internal}
     (nil))
(insn 90 581 91 6 (set (reg/f:SI 3 bx [orig:142 D.2145 ] [142])
        (mem/f/c:SI (plus:SI (reg:SI 3 bx [107])
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("out") [flags 0x2] <var_decl
0x7ffff670bc60 out>)
                        ] UNSPEC_GOTOFF))) [1 out+0 S4 A32])) j.c:19 90
{*movsi_internal}
     (nil))

Note how we load %ebx from memory, then use/clobber it in the next insn.  That
makes it impossible for the post-reload optimizers to help clean this up.  How
hard would it be to generate code in LRA where those two insns set different
registers in those local snippets of code?

In this particular case, %ebp is locally available and there's no strong reason
why we have to use %ebx.   By using different destinations for insns 581 and
90, the source MEM of insn 581 would be available after insn 90.  And by making
the value available postreload-gcse would be able to commonize those loads.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]