[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

Fri Jan 28 16:02:13 GMT 2022

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178

--- Comment #27 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #17)
> So in .reload we have (with unpatched trunk)
> 
>   401: NOTE_INSN_BASIC_BLOCK 6
>   462: ax:DF=[`*.LC0']
>       REG_EQUAL 9.850689999999999724167309977929107844829559326171875e-1
>   407: xmm2:DF=ax:DF
>   463: ax:DF=[`*.LC0']
>       REG_EQUAL 9.850689999999999724167309977929107844829559326171875e-1
>   408: xmm4:DF=ax:DF
> 
> why??!  We can load .LC0 into xmm4 directly.  IRA sees
> 
>   401: NOTE_INSN_BASIC_BLOCK 6
>   407: r118:DF=r482:DF
>   408: r119:DF=r482:DF
> 
> now I cannot really decipher IRA or LRA dumps but my guess would be that
> inheritance (causing us to load from LC0) interferes badly with register
> class assignment?
> 
> Changing pseudo 482 in operand 1 of insn 407 on equiv
> 9.850689999999999724167309977929107844829559326171875e-1
> ...
>           alt=21,overall=9,losers=1,rld_nregs=1
>          Choosing alt 21 in insn 407:  (0) v  (1) r {*movdf_internal}
>       Creating newreg=525, assigning class GENERAL_REGS to r525
>   407: r118:DF=r525:DF
>     Inserting insn reload before:
>   462: r525:DF=[`*.LC0']
>       REG_EQUAL 9.850689999999999724167309977929107844829559326171875e-1
> 
> we should have preferred alt 14 I think (0) v (1) m, but that has
> 
>           alt=14,overall=13,losers=1,rld_nregs=0
>             0 Spill pseudo into memory: reject+=3
>             Using memory insn operand 0: reject+=3
>             0 Non input pseudo reload: reject++
>             1 Non-pseudo reload: reject+=2
>             1 Non input pseudo reload: reject++
>             alt=15,overall=28,losers=3 -- refuse
>             0 Costly set: reject++
>             alt=16: Bad operand -- refuse
>             0 Costly set: reject++
>             1 Costly loser: reject++
>             1 Non-pseudo reload: reject+=2
>             1 Non input pseudo reload: reject++
>             alt=17,overall=17,losers=2 -- refuse
>             0 Costly set: reject++
>             1 Spill Non-pseudo into memory: reject+=3
>             Using memory insn operand 1: reject+=3
>             1 Non input pseudo reload: reject++
>             alt=18,overall=14,losers=1 -- refuse
>             0 Spill pseudo into memory: reject+=3
>             Using memory insn operand 0: reject+=3
>             0 Non input pseudo reload: reject++
>             1 Costly loser: reject++
>             1 Non-pseudo reload: reject+=2
>             1 Non input pseudo reload: reject++
>             alt=19,overall=29,losers=3 -- refuse
>             0 Non-prefered reload: reject+=600
>             0 Non input pseudo reload: reject++
>             alt=20,overall=607,losers=1 -- refuse
>             1 Non-pseudo reload: reject+=2
>             1 Non input pseudo reload: reject++
> 
> I'm not sure I can decipher the reasoning but I don't understand how it
> doesn't seem to anticipate the cost of reloading the GPR in the alternative
> it chooses?
> 
> Vlad?

All this diagnostics is just description of voodoo from the old reload pass. 
LRA choosing alternative the same way as the old reload pass (I doubt that any
other approach will not break all existing targets).  Simply the old reload
pass does not report its decisions in the dump.

LRA code (lra-constraints.cc::process_alt_operands) choosing the insn
alternatives (as the old reload pass) does not use any memory or register move
costs.  Instead, the alternative is chosen by heuristics and insn constraints
hints (like ? !). The only case where these costs are used, when we have
reg:=reg and the register move costs for this is 2.  In this case LRA(reload)
does not bother to check the insn constraints.