This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: mips16 LRA vs reload - Excess reload registers

From: Vladimir Makarov <vmakarov at redhat dot com>
To: Matthew Fortune <Matthew dot Fortune at imgtec dot com>
Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, "bernds at codesourcery dot com" <bernds at codesourcery dot com>
Date: Tue, 10 Sep 2013 17:11:54 -0400
Subject: Re: mips16 LRA vs reload - Excess reload registers
Authentication-results: sourceware.org; auth=none
References: <6D39441BF12EF246A7ABCE6654B023533DC32E at LEMAIL01 dot le dot imgtec dot org> <522CAAE0 dot 5010006 at redhat dot com> <6D39441BF12EF246A7ABCE6654B023533E40C5 at LEMAIL01 dot le dot imgtec dot org>

On 09/09/2013 03:49 PM, Matthew Fortune wrote:
>
>> -----Original Message-----
>> From: Vladimir Makarov [mailto:vmakarov@redhat.com]
>> Sent: 08 September 2013 17:51
>> To: Matthew Fortune
>> Cc: gcc@gcc.gnu.org; bernds@codesourcery.com
>> Subject: Re: mips16 LRA vs reload - Excess reload registers
>>
>> On 13-08-23 5:26 AM, Matthew Fortune wrote:
>>> Hi Vladimir,
>>>
>>> I've been working on code size improvements for mips16 and have been
>> pleased to see some improvement when switching to use LRA instead of
>> classic reload. At the same time though I have also seen some differences
>> between reload and LRA in terms of how efficiently reload registers are
>> reused.
>>> The trigger for LRA to underperform compared with classic reload is when
>> IRA allocates inappropriate registers and thus puts a lot of stress on
>> reloading. Mips16 showed this because it can only access a small subset of
>> the MIPS registers for general instructions. The remaining MIPS registers are
>> still available as they can be accessed by some special instructions and used
>> via move instructions as temporaries. In the current mips16 backend,
>> register move costings lead IRA to determine that although the preferred
>> class for most pseudos is M16_REGS, the allocno class ends up as GR_REGS.
>> IRA then resorts to allocating registers outside of M16_REGS more and more
>> as register pressure increases, even though this is fairly stupid.
>>> When using classic reload the inappropriate register allocations are
>> effectively reverted as the reload pseudos that get invented tend to all
>> converge on the same hard register completely removing the original
>> pseudo. For LRA the reloads tend to diverge and different hard registers are
>> assigned to the reload pseudos leaving us with two new pseudos and the
>> original. Two extra move instructions and two extra hard registers used.
>> While I'm not saying it is LRA's fault for not fixing this situation perfectly it
>> does seem that classic reload is better at it.
>>> I have found a potential solution to the original IRA register allocation
>> problem but I think there may still be something to address in LRA to
>> improve this scenario anyway. My proposed solution to the IRA problem for
>> mips16 is to adjust register move costings such that the total of moving
>> between M16_REGS and GR_REGS and back is more expensive than memory,
>> but moving from GR_REGS to GR_REGS is cheaper than memory (even
>> though this is a bit weird as you have to go through an M16_REG to move
>> from one GR_REG to another GR_REG).
>>> GR_REGS to GR_REGS has to be cheaper than memory as it needs to be a
>> candidate pressure class but the additional cost for M16->GR->M16 means
>> that IRA does not use GR_REGS as an alternative class and the allocno class is
>> just M16_REGS as desired. This feels a bit like a hack but may be the best
>> solution. The hard register costings used when allocating registers from an
>> allocno class just don't seem to be strong enough to prevent poor register
>> allocation in this case, I don't know if the hard register costs are supposed to
>> resolve this issue or if they are just about fine tuning.
>>> With the fix in place, LRA outperforms classic reload which is fantastic!
>>>
>>> I have a small(ish) test case for this and dumps for IRA, LRA and classic
>> reload along with the patch to enable LRA for mips16. I can also provide the
>> fix to register costing that effectively avoids/hides this problem for mips16.
>> Should I post them here or put them in a bugzilla ticket?
>>> Any advice on which area needs fixing would be welcome and I am quite
>> happy to work on this given some direction. I suspect these issues are
>> relevant for any architecture that is not 100% orthogonal which is pretty
>> much all and particularly important for compressed instruction sets.
>> Sorry again than I did not find time to answer you earlier, Matt.
>>
>> Your hack could work.  And I guess it is always worth to post the patch for
>> public with examples of the generated code before and after the patch.
>> May be some collective mind helps to figure out more what to do with the
>> patch.
> I'll post that shortly.
>  
>> But I guess there is still a thing to do. After constraining allocation only to
>> MIPS16 regs we still could use non-MIPS16 GR_REGS for storing values of
>> less frequently used pseudos (as storing them in non-MIPS16 GR_REGS is
>> better than in memory).  E.g. x86-64 LRA can use SSE regs for storing values
>> of less frequently used pseudos requiring GENERAL_REGS.
>> Please look at spill_class target hook and its implementation for x86-64.
> I have indeed implemented that for mips16 and found that not only does it help to enable the use of non-mips16 registers as spill_class registers but including the mips16 call clobbered registers is also worthwhile. It seems that the spill_class logic is able to find some instances where spilled pseudos could actually have been colored and effectively eliminates the reload.
Good.

> My original post was trying to point out an instance where LRA is not performing as well as reload. Although I can avoid this for mips16 it may well occur in other circumstances but not be as noticeable. Is this something worth pursuing?
>
Yes, it is worth pursuing.  Whatever reload does to improve code of IRA,
it can be better done by global register allocator as it sees all
picture not just a local context.

Besides right hard reg move cost value problem, finding reg class for
pseudos (in IRA or in the old RA) has some pitfalls which can be
generally fixed only by early choosing insn alternatives before RA.  For
example, I know that a problem with better use of ARM neon registers
could be fixed by this.  But it is a bit different story about early
code selection.

Follow-Ups:
- RE: mips16 LRA vs reload - Excess reload registers
  - From: Matthew Fortune

References:
- Re: mips16 LRA vs reload - Excess reload registers
  - From: Vladimir Makarov
- RE: mips16 LRA vs reload - Excess reload registers
  - From: Matthew Fortune

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]