This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend


First of all, thanks a lot for doing this.

Robert Suchanek <Robert.Suchanek@imgtec.com> writes:
> diff --git gcc/config/mips/mips.c gcc/config/mips/mips.c
> index 143169b..f27a801 100644
> --- gcc/config/mips/mips.c
> +++ gcc/config/mips/mips.c
> @@ -2255,7 +2255,7 @@ mips_regno_mode_ok_for_base_p (int regno, enum machine_mode mode,
>       All in all, it seems more consistent to only enforce this restriction
>       during and after reload.  */
>    if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
> -    return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
> +    return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>  
>    return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
>  }

Not sure about this one.  We would need to update the comment that
explains why "!strict_p" is there, but AFAIK reason (1) would still apply.

Was this needed for correctness or because it gave better code?

> +/* Return a register priority for hard reg REGNO.  */
> +
> +static int
> +mips_register_priority (int hard_regno)
> +{
> +  if (TARGET_MIPS16)
> +   {
> +     /* Treat MIPS16 registers with higher priority than other regs.  */
> +     switch (hard_regno)
> +       {
> +       case 2:
> +       case 3:
> +       case 4:
> +       case 5:
> +       case 6:
> +       case 7:
> +       case 16:
> +       case 17:
> +         return 1;
> +       default:
> +         return 0;
> +       }
> +   }
> +  return 0;
> +}
> +

Easier as:

  if (TARGET_MIPS16
      && TEST_HARD_REG_BIT (reg_class_contents[M16_REGS], hard_regno))
    return 1;
  return 0;

> diff --git gcc/config/mips/mips.h gcc/config/mips/mips.h
> index a786d4c..712008f 100644
> --- gcc/config/mips/mips.h
> +++ gcc/config/mips/mips.h
> @@ -1871,10 +1871,12 @@ enum reg_class
>  {
>    NO_REGS,			/* no registers in set */
>    M16_REGS,			/* mips16 directly accessible registers */
> +  M16F_REGS,			/* mips16 + frame */

Constraints are supposed to be operating on real registers, after
elimination, so it seems odd to include a fake register.  What went
wrong with just M16_REGS?

> +  SPILL_REGS,			/* All but $sp and call preserved regs are in here */
...
> +  { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* SPILL_REGS */      	\

These two don't seem to match.  I think literally it would be 0x0300fffc,
but maybe you had to make SPILL_REGS a superset of M16_REGs?

> +/* Add costs to hard registers based on frequency. This helps to negate
> +   some of the reduced cost associated with argument registers which 
> +   unfairly promotes their use and increases register pressure */
> +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO)                       \
> +  (TARGET_MIPS16 && optimize_size                                       \
> +   ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)

So we would be trying to use, say, $4 for the first incoming argument
even after it had been spilled?  Hmm.

Since this change is aimed specifically at one heuristic, I wonder
whether we should parameterise that heuristic somehow rather than
try to use a general hook to undo it.  But I don't think there's
anything particularly special about MIPS16 here, so maybe it's too
eager for all targets.

> diff --git gcc/config/mips/mips.md gcc/config/mips/mips.md
> index 1e3e9e6..ababd5e 100644
> --- gcc/config/mips/mips.md
> +++ gcc/config/mips/mips.md
> @@ -1622,40 +1622,66 @@
>  ;; copy instructions.  Reload therefore thinks that the second alternative
>  ;; is two reloads more costly than the first.  We add "*?*?" to the first
>  ;; alternative as a counterweight.
> +;;
> +;; LRA simulates reload but the cost of reloading scratches is lower
> +;; than of the classic reload. For the time being, removing the counterweight
> +;; for LRA is more profitable.
>  (define_insn "*mul_acc_si"
> -  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
> -	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
> -			  (match_operand:SI 2 "register_operand" "d,d"))
> -		 (match_operand:SI 3 "register_operand" "0,d")))
> -   (clobber (match_scratch:SI 4 "=X,l"))
> -   (clobber (match_scratch:SI 5 "=X,&d"))]
> +  [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d?")
> +	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
> +			  (match_operand:SI 2 "register_operand" "d,d,d"))
> +		 (match_operand:SI 3 "register_operand" "0,0,d")))
> +   (clobber (match_scratch:SI 4 "=X,X,l"))
> +   (clobber (match_scratch:SI 5 "=X,X,&d"))]
>    "GENERATE_MADD_MSUB && !TARGET_MIPS16"
>    "@
>      madd\t%1,%2
> +    madd\t%1,%2
>      #"
>    [(set_attr "type"	"imadd")
>     (set_attr "accum_in"	"3")
>     (set_attr "mode"	"SI")
> -   (set_attr "insn_count" "1,2")])
> +   (set_attr "insn_count" "1,1,2")
> +   (set (attr "enabled")
> +        (cond [(and (eq_attr "alternative" "0")
> +                    (match_test "TARGET_RELOAD"))
> +                  (const_string "yes")
> +               (and (eq_attr "alternative" "1")
> +                    (match_test "!TARGET_RELOAD"))
> +                  (const_string "yes")
> +               (eq_attr "alternative" "2")
> +                  (const_string "yes")]
> +              (const_string "no")))])

Looks good.  Glad to see the operand 0 hack going away. :-)

> @@ -2986,31 +3023,14 @@
>     (set_attr "mode" "<MODE>")])
>  
>  (define_insn "*and<mode>3_mips16"
> -  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d")
> -	(and:GPR (match_operand:GPR 1 "nonimmediate_operand" "%W,W,W,d,0")
> -		 (match_operand:GPR 2 "and_operand" "Yb,Yh,Yw,Yw,d")))]
> +  [(set (match_operand:GPR 0 "register_operand" "=d,d")
> +	(and:GPR (match_operand:GPR 1 "register_operand" "%d,0")
> +		 (match_operand:GPR 2 "and_operand" "Yw,d")))]
>    "TARGET_MIPS16 && and_operands_ok (<MODE>mode, operands[1], operands[2])"
> -{
> -  switch (which_alternative)
> -    {
> -    case 0:
> -      operands[1] = gen_lowpart (QImode, operands[1]);
> -      return "lbu\t%0,%1";
> -    case 1:
> -      operands[1] = gen_lowpart (HImode, operands[1]);
> -      return "lhu\t%0,%1";
> -    case 2:
> -      operands[1] = gen_lowpart (SImode, operands[1]);
> -      return "lwu\t%0,%1";
> -    case 3:
> -      return "#";
> -    case 4:
> -      return "and\t%0,%2";
> -    default:
> -      gcc_unreachable ();
> -    }
> -}
> -  [(set_attr "move_type" "load,load,load,shift_shift,logical")
> +  "@
> +   #
> +   and\t%0,%2"
> +  [(set_attr "move_type" "shift_shift,logical")
>     (set_attr "mode" "<MODE>")])

I think we want to keep the LWU case at the very least, since I assume
otherwise we'll load the full 64-bit spill slot and use a pair of shifts
to zero-extend it.  Reloading the stack address into a base register
should always be better than that.

I agree it's less clear for the byte and halfword cases.  All other
things -- including size -- being equal, reloading a stack address into
a base register and using an extending load should be better than
reloading the full spill slot and extending it, since the reloaded stack
address is more likely to be reused in other instructions.

The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
I think using LBU and LHU there was probably a win.  I can imagine
it making sense to disable LBU and LHU for MIPS16e though.

It might be better to do any changes to this pattern as a follow-on
patch, since I think LRA should cope with it as-is.

> @@ -4139,7 +4159,7 @@
>    [(set (match_operand:DI 0 "register_operand" "=d")
>  	(match_operand:DI 1 "absolute_symbolic_operand" ""))
>     (clobber (match_scratch:DI 2 "=&d"))]
> -  "TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
> +  "!TARGET_MIPS16 && TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
>    "#"
>    "&& reload_completed"
>    [(set (match_dup 0) (high:DI (match_dup 3)))

Minor nit, but please keep within 80 chars:

  "!TARGET_MIPS16
   && TARGET_EXPLICIT_RELOCS
   && ABI_HAS_64BIT_SYMBOLS
   && cse_not_expected"

> diff --git gcc/lra-constraints.c gcc/lra-constraints.c
> index ba4d489..ab85495 100644
> --- gcc/lra-constraints.c
> +++ gcc/lra-constraints.c
> @@ -2615,6 +2615,39 @@ valid_address_p (struct address_info *ad)
>    return ok_p;
>  }
>  
> +/* Make reload base reg from address AD.  */
> +static rtx
> +base_to_reg (struct address_info *ad)
> +{
> +  enum reg_class cl;
> +  int code = -1;
> +  rtx new_inner = NULL_RTX;
> +  rtx new_reg = NULL_RTX;
> +  rtx insn;
> +  rtx last_insn = get_last_insn();
> +
> +  lra_assert (ad->base == ad->base_term && ad->disp == ad->disp_term);
> +  cl = base_reg_class (ad->mode, ad->as, ad->base_outer_code,
> +                       get_index_code (ad));
> +  new_reg = lra_create_new_reg (GET_MODE (*ad->base_term), NULL_RTX,
> +                                cl, "base");
> +  new_inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), new_reg,
> +                                   ad->disp_term == NULL
> +                                   ? gen_int_mode (0, ad->mode) 
> +                                   : *ad->disp_term);
> +  if (!valid_address_p (ad->mode, new_inner, ad->as))
> +    return NULL_RTX;
> +  insn = emit_insn (gen_rtx_SET (ad->mode, new_reg, *ad->base_term));
> +  code = recog_memoized (insn);
> +  if (code < 0)
> +    {
> +      delete_insns_since (last_insn);
> +      return NULL_RTX;
> +    }
> +  
> +  return new_inner;
> +}
> +
>  /* Make reload base reg + disp from address AD.  Return the new pseudo.  */
>  static rtx
>  base_plus_disp_to_reg (struct address_info *ad)
> @@ -2818,6 +2851,8 @@ process_address (int nop, rtx *before, rtx *after)
>  
>       3) the address is a frame address with an invalid offset.
>  
> +     4) the address is a frame address with an invalid base.
> +
>       All these cases involve a non-autoinc address, so there is no
>       point revalidating other types.  */
>    if (ad.autoinc_p || valid_address_p (&ad))
> @@ -2899,14 +2934,19 @@ process_address (int nop, rtx *before, rtx *after)
>        int regno;
>        enum reg_class cl;
>        rtx set, insns, last_insn;
> +      /* Try to reload base into register only if the base is invalid 
> +         for the address but with valid offset, case (4) above.  */
> +      start_sequence ();
> +      new_reg = base_to_reg (&ad);
> +
>        /* base + disp => new base, cases (1) and (3) above.  */
>        /* Another option would be to reload the displacement into an
>  	 index register.  However, postreload has code to optimize
>  	 address reloads that have the same base and different
>  	 displacements, so reloading into an index register would
>  	 not necessarily be a win.  */
> -      start_sequence ();
> -      new_reg = base_plus_disp_to_reg (&ad);
> +      if (new_reg == NULL_RTX)
> +        new_reg = base_plus_disp_to_reg (&ad);
>        insns = get_insns ();
>        last_insn = get_last_insn ();
>        /* If we generated at least two insns, try last insn source as

Obviously this is Vlad's call, but the idea looks good to me.
Speculatively calling lra_create_new_reg for cases where (4) doesn't
end up applying is probably too wasteful though.

> diff --git gcc/rtlanal.c gcc/rtlanal.c
> index 7a9efec..f699d17 100644
> --- gcc/rtlanal.c
> +++ gcc/rtlanal.c
> @@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
>      inner = strip_address_mutations (&XEXP (*inner, 0));
>    if (REG_P (*inner)
>        || MEM_P (*inner)
> -      || GET_CODE (*inner) == SUBREG)
> +      || GET_CODE (*inner) == SUBREG
> +      || CONSTANT_P (*inner))
>      return inner;
>    return 0;
>  }

This doesn't look right; the general form is base+index+displacement+segment,
with the constant term being treated as the displacement.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]