This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][MIPS] Enable load-load/store-store bonding


Hi!

Sorry for delay in sending this patch for review.
Please find attached updated patch.

In P5600, 2 consecutive loads/stores of same type which access contiguous memory locations are bonded together by instruction issue unit to dispatch single load/store instruction which accesses both locations. This allows 2X improvement in memory intensive code. This optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.

This patch adds peephole2 patterns to identify such loads/stores, and put them in parallel, so that the scheduler will not split it - thereby guaranteeing h/w level load/store bonding.

The patch is tested with dejagnu for correctness, and tested on hardware for performance.
Ok for trunk?

Changelog:
gcc/
        * config/mips/mips.md (JOIN_MODE): New mode iterator.
	(join2_load_Store<JOIN_MODE:mode>): New pattern.
	(join2_loadhi): Likewise.
	(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
	load-load and store-stores.
	* config/mips/mips.opt (mload-store-pairs): New option.
	(TARGET_LOAD_STORE_PAIRS): New macro.
	*config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
	*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
	*config/mips/mips.c(mips_load_store_bonding_p): New function.

- Thanks and regards,
  Sameera D.

On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:
Hi Richard,

Thanks for the review.
Please find attached updated patch after your review comments.

Changelog:
gcc/
	* config/mips/mips.md (JOIN_MODE): New mode iterator.
	(join2_load_Store<JOIN_MODE:mode>): New pattern.
	(join2_loadhi): Likewise.
	(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
	load-load and store-stores.
	* config/mips/mips.opt (mload-store-pairs): New option.
	(TARGET_LOAD_STORE_PAIRS): New macro.
	*config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
	*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
	*config/mips/mips.c(mips_load_store_bonding_p): New function.

The change is tested with dejagnu with additional options -mload-store-pairs and -mtune=p5600.
The perf measurement is yet to finish.

We had offline discussion based on your comment. There is additional
view on the same.
Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
ISAs do not support P5600.
For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
implemented separately, and hence, as you suggested, P5600 Ld-ST
bonding optimization should not be enabled for them.
So, is it fine if I emit error for any ISAs other than mips32r2,
mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
continue by emitting warning and disabling P5600?

No, the point is that we have two separate concepts: ISA and optimisation
target.  -mipsN and -march=N control the ISA (which instructions are
available) and -mtune=M controls optimisation decisions within the
constraints of that N, such as scheduling and the cost of things like
multiplication and division.

E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
compatible code, optimise it for p5600, but make sure that 24k workarounds
are used.  The code would run correctly on any MIPS II-compatible processor
without known errata and also on the 24k.
Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific patterns to be matched.

+
+mld-st-pairing
+Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
+pairing

Other options are just "TARGET_" + the captialised form of the option name,
so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
misleading since it's an abbreviation for "load" rather than the LD instruction.
Maybe -mload-store-pairs, since plurals are more common than "-ing"?
Not sure that's a great suggestion though.
Renamed the option and corresponding macro as suggested.

Performance testing for this patch is not yet done.
If the patch proves beneficial in most of the testcases (which we
believe will do on P5600) we will enable this optimization by default
for P5600 - in which case this option can be removed.

OK.  Sending the patch for comments before performance testing is fine, but
I think it'd be better to commit the patch only after the testing is done, since
otherwise the patch might need to be tweaked.

I don't see any problem with keeping the option in case people want to
experiment with it.  I just think the patch should only go in once it can be
enabled by default for p5600.  I.e. the option would exist to turn off the
pairing.

Not having the option is fine too of course.
Yes, after perf analysis, I will share the results across, and then depending upon the impact, the decision can be made - whether to make the option as default or not, and then the patch will be submitted.

We should allow pairing even without -mtune=p5600.
The load-store pairing is currently attribute of P5600, so I have not enabled the pairing without mtune=5600. If need be, can enable that without mtune=p5600.


(define_mode_iterator JOIN_MODE [
   SI
   (DI "TARGET_64BIT")
   (SF "TARGET_HARD_FLOAT")
   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])

Done this change.

and then extend:

@@ -883,6 +884,8 @@
  (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
(define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])

+(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
+
  ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
;; are different.  Some forms of unextended addiu have an 8-bit
immediate  ;; field but the equivalent daddiu has only a 5-bit field.

this accordingly.
In order to allow d/f for both register classes, the pattern join2_load_store<mode> was altered a bit which eliminated this mode iterator.


Outer (parallel ...)s are redundant in a define_insn.
Removed.


It would be better to add the mips_load_store_insns for each operand
rather than multiplying one of them by 2.  Or see the next bit for an
alternative.
Using the alternative method as you suggested, so this change is not needed.

Please instead add HI to the define_mode_iterator so that we can use the
same peephole and define_insn.
Added HI in the mode iterator to eliminate join2_storehi pattern and corresponding peephole2.
As arithmetic operations on HImode is not supported, we generate zero or sign extended loads in such cases.
To handle that case, join2_loadhi pattern is kept.

- Thanks and regards,
    Sameera D.

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index b48e04f..244eb8d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
 extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
 extern int mips_trampoline_code_size (void);
 extern void mips_function_profiler (FILE *);
+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
 
 typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
 #ifdef RTX_CODE
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1733457..85f0591 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
   return true;
 }
 
+bool
+mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p)
+{
+  rtx reg1, reg2, mem1, mem2, base1, base2;
+  enum reg_class rc1, rc2;
+  HOST_WIDE_INT offset1, offset2;
+
+  if (load_p)
+    {
+      reg1 = operands[0];
+      reg2 = operands[2];
+      mem1 = operands[1];
+      mem2 = operands[3];
+    }
+  else
+    {
+      reg1 = operands[1];
+      reg2 = operands[3];
+      mem1 = operands[0];
+      mem2 = operands[2];
+    }
+
+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
+      || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
+    return false;
+
+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
+
+  /* Base regs do not match.  */
+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
+    return false;
+
+  /* Either of the loads is clobbering base register.  */
+  if (load_p
+      && (REGNO (reg1) == REGNO (base1)
+	  || (REGNO (reg2) == REGNO (base1))))
+    return false;
+
+  /* Loading in same registers.  */
+  if (load_p
+      && REGNO (reg1) == REGNO (reg2))
+    return false;
+
+  /* The loads/stores are not of same type.  */
+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
+  rc2 = REGNO_REG_CLASS (REGNO (reg2));
+  if (rc1 != rc2
+      && !reg_class_subset_p (rc1, rc2)
+      && !reg_class_subset_p (rc2, rc1))
+    return false;
+
+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
+    return false;
+
+  return true;
+}
+
 /* OPERANDS describes the operands to a pair of SETs, in the order
    dest1, src1, dest2, src2.  Return true if the operands can be used
    in an LWP or SWP instruction; LOAD_P says which.  */
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index ec69ed5..1bd0dae 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals *mips16_globals;
 #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
 #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
 #endif
+
+#define ENABLE_LD_ST_PAIRS \
+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 3672c0b..cfdb750 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -754,6 +754,11 @@
 
 (define_mode_iterator MOVEP1 [SI SF])
 (define_mode_iterator MOVEP2 [SI SF])
+(define_mode_iterator JOIN_MODE [HI
+				 SI
+				 (SF "TARGET_HARD_FLOAT")
+				 (DF "TARGET_HARD_FLOAT
+				      && TARGET_DOUBLE_FLOAT")])
 
 ;; This mode iterator allows :HILO to be used as the mode of the
 ;; concatenated HI and LO registers.
@@ -7419,6 +7424,108 @@
   { return MIPS_CALL ("jal", operands, 0, -1); }
   [(set_attr "type" "call")
    (set_attr "insn_count" "3")])
+
+(define_insn "*join2_load_store<JOIN_MODE:mode>"
+  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,d,f"))
+   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,d,f"))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    bool load_p = (which_alternative == 0 || which_alternative == 1);
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+      }
+    else
+      {
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+      }
+    return "";
+  }
+  [(set_attr "move_type" "load,fpload,store,fpstore")
+   (set_attr "insn_count" "2,2,2,2")])
+
+;; 2 HI/SI/SF/DF loads are joined.
+;; P5600 does not support bonding of two LBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "register_operand")
+	(match_operand:JOIN_MODE 1 "non_volatile_mem_operand"))
+   (set (match_operand:JOIN_MODE 2 "register_operand")
+	(match_operand:JOIN_MODE 3 "non_volatile_mem_operand"))]
+  "ENABLE_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, true)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+;; 2 HI/SI/SF/DF stores are joined.
+;; P5600 does not support bonding of two SBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "memory_operand")
+	(match_operand:JOIN_MODE 1 "register_operand"))
+   (set (match_operand:JOIN_MODE 2 "memory_operand")
+	(match_operand:JOIN_MODE 3 "register_operand"))]
+  "ENABLE_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, false)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+(define_insn "*join2_loadhi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+      }
+    else
+      {
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+      }
+
+    return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr "insn_count" "2")])
+
+
+;; 2 16 bit integer loads are joined.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand")))
+   (set (match_operand:SI 2 "register_operand")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand")))]
+  "ENABLE_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, HImode, true)"
+  [(parallel [(set (match_dup 0)
+		   (any_extend:SI (match_dup 1)))
+	      (set (match_dup 2)
+		   (any_extend:SI (match_dup 3)))])]
+  "")
+
 
 ;; Synchronization instructions.
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 9e89aa9..a9baebe 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -418,3 +418,7 @@ Enable use of odd-numbered single-precision registers
 
 noasmopt
 Driver
+
+mload-store-pairs
+Target Report Var(TARGET_LOAD_STORE_PAIRS) Init(1)
+Enable load/store bonding.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]