[Bug target/39423] [4.6/4.7/4.8 Regression] [SH] performance regression: lost mov @(disp,Rn)

Thu Jul 12 15:37:00 GMT 2012

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423

--- Comment #20 from chrbr at gcc dot gnu.org 2012-07-12 15:36:59 UTC ---
> I'm having a look at your implementation to see how they compare and
> > possibly combined together. Both approaches look interesting.
> 
> I guess folding the mul-add sequences like you did should be more useful than
> just
> handling one mem:SI pattern.  In any case, if you find my impl useful please
> let me know,
> because then I'd also pop in patterns for mem:QI and mem:HI patterns.

Finally, I think both combine pattern are not exclusive and rather independent,
even if interacting each other due to combine orders.

I like your mem,plus,plus combine that look better than my folding. just there
are 2 little details that I tried while playing with it: there was a
lost of generality with the hardcoding of the shift constant, and I'm not sure
if there was a risk of clobbering a live operands[1]. For those reasons I
modified it a bit as followed. 

(define_insn_and_split "*movsi_disp"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
        (mem:SI (plus:SI
          (plus:SI (mult:SI (match_operand:SI 1 "arith_reg_operand" "r")
                    (match_operand:SI 2 "const_int_operand" "i"))
               (match_operand:SI 3 "arith_reg_operand" "K06"))
        (match_operand:SI 4 "const_int_operand" "i"))))]
  "TARGET_SH1 && satisfies_constraint_K06 (operands[4]) && exact_log2 (INTVAL
(operands[2])) > 0"
{
  gcc_unreachable ();
  return "#";
}
  "&& 1"
  [(set (match_dup 1) (ashift:SI (match_dup 1) (match_dup 2)))
   (set (match_dup 1) (plus:SI (match_dup 1) (match_dup 3)))
   (set (match_dup 0) (mem:SI (plus:SI (match_dup 1) (match_dup 4))))]
"{
    int n = exact_log2 (INTVAL (operands[2]));
    rtx res = gen_reg_rtx (SImode);
    emit_move_insn (res, operands[1]);

    operands[1] = res;
    operands[2] = GEN_INT (n);
}"
)

(define_insn "ashlsi3_k"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r,r,r")
    (ashift:SI (match_operand:SI 1 "arith_reg_operand" "0,0,0")
           (match_operand:SI 2 "const_int_operand" "M,P27,r")))]
  "TARGET_SH1"
  "@
    add    %0,%0
    shll%O2    %0
    shld    %2,%0"
  [(set_attr "type" "arith")])

Using those changes, a snipet like 

int
foo4 (long long tab[], int index)
{
  return (int)tab [index+4];
}

not compiles as:

       mov     #3,r1
        shld    r1,r5
        add     r4,r5
        rts
        mov.l   @(32,r5),r0

2)plus_mulsi

I see some interactions with the movsi_disp pattern, due to different ordering
on the matching in the combiner. I'll need to play more this them activated
together.

So I think you can go ahead with your combiner movsi_disp pattern and propose
to Kaz when ready to close this defect. Feel free to take or not my
suggestions.

I'll then go with my plus_mulsi combiner in a second time, making clear that
it's provide additional optimization opportunities without mixing the impacts.

On a related thread, for further work, I'm thinking on adding support for the
MAC instruction, now that was have the multiply and add. But this requires
exposing the MACLH registers to reload. Had anyone had a thought on this ? I'd
like to give this a try pretty soon.

Cheers