RFA: enable LRA for rs6000 [patch for WRF]
Vladimir Makarov
vmakarov@redhat.com
Wed Apr 17 16:13:00 GMT 2013
On 13-04-16 6:56 PM, Michael Meissner wrote:
> I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
> allocator.
>
> At one point LRA has decided to use the CTR to hold a CCmode value:
>
> (insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
> (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1}
> (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
> (nil)))
>
> Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
> but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
> IMHO, LRA should not be generating NOP moves that are later deleted.
>
> There are two ways to solve the problem. One is not to let anything but int
> modes into CTR/LR, which will also eliminate the register allocator from
> spilling floating point values there (which we've seen in the past, but the
> last time I tried to eliminate it I couldn't). The following patch does this,
> and also changes the assertion to call fatal_insn_not_found to make it clearer
> what the error is.
>
> I imagine, I could add a NOP move insn to movcc_internal1, but that just
> strikes me as wrong.
>
> Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
> can't bootstrap the compiler using --with-cpu=power7, which I will get to
> tomorrow.
>
> 2013-04-16 Michael Meissner <meissner@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
> control whether we only allow int modes to go in the CTR, LR,
> VRSAVE, VSCR registers.
> * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
> (rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
> constrained.
> (rs6000_option_override_internal): Set -mconstrain-regs if we are
> using the LRA register allocator.
>
> * lra.c (check_rtl): Use fatal_insn_not_found to report constraint
> does not match.
>
Mike, thanks for the patch and all the SPEC2006 data (which are very
useful as I have no access to power machine which can be used for
benchmarking). I guess that may be some benchmark scores are lower
because of LRA lacks some micro-optimizations which reload implements
through many power hooks (e.g. LRA does not use push reload). Although
sometimes it is not a bad thing (e.g. LRA does not use
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for
other useful things).
In general I got impression that power7 is the most difficult port for
LRA. If we manage to port it, LRA ports for other targets will be easier.
I also reproduced bootstrap failure --with-cpu=power7 and I am going to
work on this and after that on SPEC2006 you wrote about.
More information about the Gcc-patches
mailing list