RFA: enable LRA for rs6000 [patch for WRF]

Wed Apr 17 16:13:00 GMT 2013

On 13-04-16 6:56 PM, Michael Meissner wrote:
> I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
> allocator.
>
> At one point LRA has decided to use the CTR to hold a CCmode value:
>
> (insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
>          (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1}
>       (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
>          (nil)))
>
> Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
> but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
> IMHO, LRA should not be generating NOP moves that are later deleted.
>
> There are two ways to solve the problem.  One is not to let anything but int
> modes into CTR/LR, which will also eliminate the register allocator from
> spilling floating point values there (which we've seen in the past, but the
> last time I tried to eliminate it I couldn't).  The following patch does this,
> and also changes the assertion to call fatal_insn_not_found to make it clearer
> what the error is.
>
> I imagine, I could add a NOP move insn to movcc_internal1, but that just
> strikes me as wrong.
>
> Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
> can't bootstrap the compiler using --with-cpu=power7, which I will get to
> tomorrow.
>
> 2013-04-16  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
> 	* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
> 	control whether we only allow int modes to go in the CTR, LR,
> 	VRSAVE, VSCR registers.
> 	* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
> 	(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
> 	constrained.
> 	(rs6000_option_override_internal): Set -mconstrain-regs if we are
> 	using the LRA register allocator.
>
> 	* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
> 	does not match.
>
Mike, thanks for the patch and all the SPEC2006 data  (which are very 
useful as I have no access to power machine which can be used for 
benchmarking).  I guess that may be some benchmark scores are lower 
because of LRA lacks some micro-optimizations which reload implements 
through many power hooks (e.g. LRA does not use push reload).  Although 
sometimes it is not a bad thing (e.g. LRA does not use  
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for 
other useful things).

In general I got impression that power7 is the most difficult port for 
LRA.  If we manage to port it, LRA ports for other targets will be easier.

I also reproduced bootstrap failure --with-cpu=power7 and I am going to 
work on this and after that on SPEC2006 you wrote about.