[Bug target/55212] [SH] Switch to LRA

Sun Nov 16 13:16:00 GMT 2014

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212

--- Comment #82 from Kazumoto Kojima <kkojima at gcc dot gnu.org> ---
(In reply to Kazumoto Kojima from comment #77)
> Created attachment 33788 [details]
> another reduced test case of compiler/vam

It seems that unsigned char memory accesses make this bad code
with LRA.
We have no (zero_extend:SI (reg) (mem)) instruction for QI/HImode.
When displacement addressing is used, (set rA (mem (plus rX disp)))
and a zero extension between registers are generated for it.
The combine phase may combine that extension and another move insn
and deletes the original extension:

      (set rA (mem (plus rX disp)))        (set rA (mem (plus rX disp)))
      (zero_extend:SI rB rA)        deleted insn
      ...                 =>  
      (set rC rB)                (zero_extend:SI rC rA)

RA will assign r0 to rA for QI/HImode and make a long live range
for R0 unfortunately.  This may cause anomalous register spills in
some case with LRA.  I guess that the old reload has something
to handle such exceptional case.  I think that the long live R0 is
problematic in the first place, though.
The above combine doesn't happen if rA is a hard reg of which reg
class is likely spilled to avoid such problem.  We can split
(set rA (mem (plus rX disp))) to two move insns via r0 which is
a likely spilled register.  It will make some codes worse but it
wins on average.
CSiBE shows 0.036% total code size improvement when the above split
is done in prepare_move_operands for load only and 0.049% when
done for both load and store.  I'll attach the patch for it.  With
it, the generated code for the 2nd call of the test case looks like

    mov.b    @(7,r8),r0
    mov    r0,r7
    mov.b    @(6,r8),r0
    extu.b    r7,r7
    mov    r0,r6
    mov.b    @(5,r8),r0
    extu.b    r6,r6
    mov    r0,r5
    mov.b    @(4,r8),r0
    extu.b    r5,r5
    jsr    @r9
    extu.b    r0,r4

Spills to memory went away, though it's still worse than non LRA case.
Before that patch, I've tried several splitters for zero_extend of
which operands are register and memory with displacement addressing.
They gave similar codes against the test case but can't win CSiBE on
average.  Even with some peepholes to fix regressions, the total code
size of CSiBE increases 0.05-0.1%.