mips16 LRA vs reload - Excess reload registers
Sun Sep 8 16:50:00 GMT 2013
On 13-08-23 5:26 AM, Matthew Fortune wrote:
> Hi Vladimir,
> I've been working on code size improvements for mips16 and have been pleased to see some improvement when switching to use LRA instead of classic reload. At the same time though I have also seen some differences between reload and LRA in terms of how efficiently reload registers are reused.
> The trigger for LRA to underperform compared with classic reload is when IRA allocates inappropriate registers and thus puts a lot of stress on reloading. Mips16 showed this because it can only access a small subset of the MIPS registers for general instructions. The remaining MIPS registers are still available as they can be accessed by some special instructions and used via move instructions as temporaries. In the current mips16 backend, register move costings lead IRA to determine that although the preferred class for most pseudos is M16_REGS, the allocno class ends up as GR_REGS. IRA then resorts to allocating registers outside of M16_REGS more and more as register pressure increases, even though this is fairly stupid.
> When using classic reload the inappropriate register allocations are effectively reverted as the reload pseudos that get invented tend to all converge on the same hard register completely removing the original pseudo. For LRA the reloads tend to diverge and different hard registers are assigned to the reload pseudos leaving us with two new pseudos and the original. Two extra move instructions and two extra hard registers used. While I'm not saying it is LRA's fault for not fixing this situation perfectly it does seem that classic reload is better at it.
> I have found a potential solution to the original IRA register allocation problem but I think there may still be something to address in LRA to improve this scenario anyway. My proposed solution to the IRA problem for mips16 is to adjust register move costings such that the total of moving between M16_REGS and GR_REGS and back is more expensive than memory, but moving from GR_REGS to GR_REGS is cheaper than memory (even though this is a bit weird as you have to go through an M16_REG to move from one GR_REG to another GR_REG).
> GR_REGS to GR_REGS has to be cheaper than memory as it needs to be a candidate pressure class but the additional cost for M16->GR->M16 means that IRA does not use GR_REGS as an alternative class and the allocno class is just M16_REGS as desired. This feels a bit like a hack but may be the best solution. The hard register costings used when allocating registers from an allocno class just don't seem to be strong enough to prevent poor register allocation in this case, I don't know if the hard register costs are supposed to resolve this issue or if they are just about fine tuning.
> With the fix in place, LRA outperforms classic reload which is fantastic!
> I have a small(ish) test case for this and dumps for IRA, LRA and classic reload along with the patch to enable LRA for mips16. I can also provide the fix to register costing that effectively avoids/hides this problem for mips16. Should I post them here or put them in a bugzilla ticket?
> Any advice on which area needs fixing would be welcome and I am quite happy to work on this given some direction. I suspect these issues are relevant for any architecture that is not 100% orthogonal which is pretty much all and particularly important for compressed instruction sets.
Sorry again than I did not find time to answer you earlier, Matt.
Your hack could work. And I guess it is always worth to post the patch
for public with examples of the generated code before and after the
patch. May be some collective mind helps to figure out more what to do
with the patch.
But I guess there is still a thing to do. After constraining allocation
only to MIPS16 regs we still could use non-MIPS16 GR_REGS for storing
values of less frequently used pseudos (as storing them in non-MIPS16
GR_REGS is better than in memory). E.g. x86-64 LRA can use SSE regs for
storing values of less frequently used pseudos requiring GENERAL_REGS.
Please look at spill_class target hook and its implementation for x86-64.
More information about the Gcc