This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][MIPS] Enable load-load/store-store bonding


Hi Sameera,

Thanks for the patch.

Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes:
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index b5b5ba7..9804ef2 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -18813,6 +18813,9 @@ mips_option_override (void)
>    if (TARGET_MICROMIPS && TARGET_MIPS16)
>      error ("unsupported combination: %s", "-mips16 -mmicromips");
>  
> +  if (TARGET_FIX_24K && TUNE_P5600)
> +    error ("unsupported combination: %s", "-mtune=p5600 -mfix-24k");
> +
>    /* Save the base compression state and process flags as though we
>       were generating uncompressed code.  */
>    mips_base_compression_flags = TARGET_COMPRESSION;

Although it's a bit of an odd combination, we need to accept
-mfix-24k -mtune=p5600 and continue to implement the 24k workarounds.
The idea is that a distributor can build for a common base architecture,
add -mfix- options for processors that might run the code, and add -mtune=
for the processor that's most of interest optimisation-wise.

We should just make the pairing of stores conditional on !TARGET_FIX_24K.

> diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
> index b9cfd62..d4135cf 100644
> --- a/gcc/config/mips/mips.opt
> +++ b/gcc/config/mips/mips.opt
> @@ -445,3 +445,7 @@ Enum(mips_lib_setting) String(tiny) Value(MIPS_LIB_TINY)
>  
>  msched-weight
>  Target Report Var(TARGET_SCHED_WEIGHT) Undocumented
> +
> +mld-st-pairing
> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING)
> +Enable load/store pairing

Other options are just "TARGET_" + the captialised form of the option name,
so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be misleading
since it's an abbreviation for "load" rather than the LD instruction.
Maybe -mload-store-pairs, since plurals are more common than "-ing"?
Not sure that's a great suggestion though.

If we want a user-level option then it needs to be documented in
invoke.texi.

> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index 86ca419..4478f81e 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -3184,3 +3184,6 @@ extern GTY(()) struct target_globals *mips16_globals;
>     with arguments ARGS.  */
>  #define PMODE_INSN(NAME, ARGS) \
>    (Pmode == SImode ? NAME ## _si ARGS : NAME ## _di ARGS)
> +
> +#define ENABLE_LD_ST_PAIRING \
> +  (TARGET_ENABLE_LD_ST_PAIRING && TUNE_P5600)

The patch requires -mld-st-pairing to be passed explicitly even for
-mtune=p5600.  Is that because it's not a consistent enough win for us
to enable it by default?  It sounded from the description like it should
be an improvement more often that not.

We should allow pairing even without -mtune=p5600.

> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index 7229e8f..05605c5 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -780,6 +780,7 @@
>  
>  (define_mode_iterator MOVEP1 [SI SF])
>  (define_mode_iterator MOVEP2 [SI SF])
> +(define_mode_iterator JOINLDST1 [SI SF DF])

Maybe:

(define_mode_iterator JOIN_MODE [
  SI
  (DI "TARGET_64BIT")
  (SF "TARGET_HARD_FLOAT")
  (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])

and then extend:

> @@ -883,6 +884,8 @@
>  (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
>  (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
>  
> +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
> +
>  ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
>  ;; are different.  Some forms of unextended addiu have an 8-bit immediate
>  ;; field but the equivalent daddiu has only a 5-bit field.

this accordingly.

> @@ -7442,6 +7445,153 @@
>    { return MIPS_CALL ("jal", operands, 0, -1); }
>    [(set_attr "type" "call")
>     (set_attr "insn_count" "3")])
> +
> +(define_insn "*join2_load_store<JOINLDST1:mode>"
> +  [(parallel
> +    [(set (match_operand:JOINLDST1 0 "nonimmediate_operand" "=<reg>,m")
> +	  (match_operand:JOINLDST1 1 "nonimmediate_operand" "m,<reg>"))
> +     (set (match_operand:JOINLDST1 2 "nonimmediate_operand" "=<reg>,m")
> +	  (match_operand:JOINLDST1 3 "nonimmediate_operand" "m,<reg>"))])]
> +  "ENABLE_LD_ST_PAIRING && reload_completed"
> +  {
> +    output_asm_insn (mips_output_move (operands[0], operands[1]), operands);
> +    output_asm_insn (mips_output_move (operands[2], operands[3]), &operands[2]);
> +    return "";
> +  }
> +  [(set_attr "move_type" "<insn_type>load,<insn_type>store")
> +   (set_attr_alternative "insn_count"
> +	[(mult (symbol_ref "mips_load_store_insns (operands[1], insn)")
> +	       (const_int 2))
> +	 (mult (symbol_ref "mips_load_store_insns (operands[0], insn)")
> +	       (const_int 2))])])

Outer (parallel ...)s are redundant in a define_insn.

It would be better to add the mips_load_store_insns for each operand
rather than multiplying one of them by 2.  Or see the next bit
for an alternative.

> +;;2 SI/SF/DF loads are joined.
> +(define_peephole2
> +  [(set (match_operand:JOINLDST1 0 "register_operand")
> +	(mem:JOINLDST1 (plus:SI (match_operand:SI 1 "register_operand")
> +				(match_operand:SI 2 "const_int_operand"))))
> +   (set (match_operand:JOINLDST1 3 "register_operand")
> +	(mem:JOINLDST1 (plus:SI (match_dup 1)
> +				(match_operand:SI 4 "const_int_operand"))))]
> +  "ENABLE_LD_ST_PAIRING &&
> +   REGNO_REG_CLASS (REGNO (operands[0]))
> +	== REGNO_REG_CLASS (REGNO (operands[3])) &&
> +   (abs (INTVAL (operands[2]) - INTVAL (operands[4]))
> +	== GET_MODE_SIZE (<JOINLDST1:MODE>mode))"
> +
> +  [(parallel [(set (match_dup 0)
> +		   (match_dup 2))
> +	      (set (match_dup 3)
> +		   (match_dup 4))])]
> +  {
> +    operands[4] = SET_SRC (PATTERN (next_active_insn (curr_insn)));
> +    operands[2] = SET_SRC (PATTERN (curr_insn));
> +  })

The operands in the new pattern should come directly from the operands
in the original, rather than going via curr_insn.  That would mean
matching them as memory_operands rather than matching their addresses.

Also, matching (plus ... (const_int)) will miss the important case of
a register with no offset.  It would be better to add a function such
as mips_join_load_store_pair_p that analyses the addresses (via
mips_classify_address) and says whether the pair is OK.

mips_join_load_store_pair_p should also make sure that the destination
of the first load does not overlap the base register, as it does in
something like:

        lw      $2,0($2)
        lw      $3,4($2)

This would be a false optimisation anyway, but more importantly,
the rtl semantics of the single parallel define_insn would be
different from the original, since parallel sets read all their
inputs before modifying any outputs.  You also need to make
sure that the destinations of the two loads don't overlap.

For this reason, the insn patterns should also check
mips_join_load_store_pair_p in their condition.

mips_join_load_store_pair_p could also check whether:

    mips_address_insns (..., mode, false) == 1

We're only really interested in cases where the load or store will be a
single instruction.  If you check that up-front you could get rid of the
mips_load_store_insns stuff and just set the insn_count to 2.

It's possible for SI to be stored in FPRs and SF to be stored in GPRs.
You should either make the define_insn handle that or make the peephole
check for the right class of register.  (Either's OK with me FWIW.)
The REGNO_REG_CLASS check above only checks whether the two registers
are the same class, not whether they're "d" for integer modes and "f"
for floating-point.

> +(define_insn "*join2_storehi"
> +  [(parallel
> +    [(set (match_operand:HI 0 "memory_operand" "=m")
> +	  (match_operand:HI 1 "register_operand" "r"))
> +     (set (match_operand:HI 2 "memory_operand" "=m")
> +	  (match_operand:HI 3 "register_operand" "r"))])]
> +  "ENABLE_LD_ST_PAIRING && reload_completed"
> +  {
> +    output_asm_insn (mips_output_move (operands[0], operands[1]), operands);
> +    output_asm_insn (mips_output_move (operands[2], operands[3]), &operands[2]);
> +    return "";
> +  }
> +  [(set_attr "move_type" "store")
> +   (set_attr_alternative "insn_count"
> +	[(mult	(symbol_ref "mips_load_store_insns (operands[0], insn)")
> +		(const_int 2))])])
> +
> +(define_insn "*join2_loadhi"
> +  [(parallel
> +    [(set (match_operand:SI 0 "register_operand" "=r")
> +	  (any_extend:SI (match_operand:HI 1 "memory_operand" "m")))
> +     (set (match_operand:SI 2 "register_operand" "=r")
> +	  (any_extend:SI (match_operand:HI 3 "memory_operand" "m")))])]
> +  "ENABLE_LD_ST_PAIRING && reload_completed"
> +  {
> +    output_asm_insn ("lh<u>\t%0,%1", operands);
> +    output_asm_insn ("lh<u>\t%2,%3", operands);
> +    return "";
> +  }
> +  [(set_attr "move_type" "load")
> +   (set_attr_alternative "insn_count"
> +	[(mult	(symbol_ref "mips_load_store_insns (operands[1], insn)")
> +		(const_int 2))])])
> +
> +
> +;; 2 16 bit integer loads are joined.
> +(define_peephole2
> +  [(set (match_operand:SI 0 "register_operand")
> +	(any_extend:SI (mem:HI (plus:SI
> +				  (match_operand:SI 1 "register_operand")
> +				  (match_operand:SI 2 "const_int_operand")))))
> +   (set (match_operand:SI 3 "register_operand")
> +	(any_extend:SI (mem:HI (plus:SI
> +				  (match_dup 1)
> +				  (match_operand:SI 4 "const_int_operand")))))]
> +  "ENABLE_LD_ST_PAIRING &&
> +   REGNO_REG_CLASS (REGNO (operands[0]))
> +	== REGNO_REG_CLASS (REGNO (operands[3])) &&
> +   (abs (INTVAL (operands[2]) - INTVAL (operands[4]))
> +	== GET_MODE_SIZE (HImode))"
> +
> +  [(parallel [(set (match_dup 0)
> +		   (match_dup 2))
> +	      (set (match_dup 3)
> +		   (match_dup 4))])]
> +  {
> +    operands[4] = SET_SRC (PATTERN (next_active_insn (curr_insn)));
> +    operands[2] = SET_SRC (PATTERN (curr_insn));
> +  })
> +
> +;; 2 16 bit integer stores are joined.
> +(define_peephole2
> +  [(set (mem:HI (plus:SI (match_operand:SI 0 "register_operand")
> +			 (match_operand:SI 1 "const_int_operand")))
> +	(match_operand:HI 2 "register_operand"))
> +   (set (mem:HI (plus:SI (match_dup 0)
> +			 (match_operand:SI 3 "const_int_operand")))
> +	(match_operand:HI 4 "register_operand"))]
> +  "ENABLE_LD_ST_PAIRING &&
> +   REGNO_REG_CLASS (REGNO (operands[2]))
> +	== REGNO_REG_CLASS (REGNO (operands[4])) &&
> +   (abs (INTVAL (operands[1]) - INTVAL (operands[3]))
> +	== GET_MODE_SIZE (HImode))"
> +  [(parallel [(set (match_dup 1)
> +		   (match_dup 2))
> +	      (set (match_dup 3)
> +		   (match_dup 4))])]
> +  {
> +    operands[3] = SET_DEST (PATTERN (next_active_insn (curr_insn)));
> +    operands[1] = SET_DEST (PATTERN (curr_insn));
> +  })
> +

Please instead add HI to the define_mode_iterator so that we can use
the same peephole and define_insn.

Are QImodes not paired in the same way?  If so, it'd be worth adding
a comment above the define_mode_iterator saying that QI is deliberately
excluded.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]