[PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added

Thu Nov 13 23:11:00 GMT 2008

"Ruan Beihong" <ruanbeihong@gmail.com> writes:
> I understand that.
> Formatting is very important.
> I think  this update attached will be nicer.

Applied, thanks.  And thanks for your patience.

For the record...

>>>>> +  if(TARGET_LOONGSON_2EF)
>>>>> +  /*Loongson 2[ef] use load to $0 to perform prefetching*/
>>>>> +    return "ld\t$0,%a0";
>>>>
>>>> Formatting; should be:
>>>>
>>>>  if (TARGET_LOONGSON_2EF)
>>>>    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
>>>>    return "ld\t$0,%a0";
>>
>> Looks like you forgot this bit.  (Comments should start with a
>> captial letter and end with ".  ".)

...this still wasn't right; the new patch had:

  /*Loongson 2[ef] use load to $0 to perform prefetching*/

but it should have been:

  /* Loongson 2[ef] use load to $0 to perform prefetching.  */

See the "Coding Standards" part of:

  http://gcc.gnu.org/contribute.html

for details about stuff like this.

Also, the "imul3nc" change to the "length" attribute wasn't needed;
that bit of code is there to work around VR4120 DMULT errata.
(The existing "imul3" case is redundant too.  It was added as part of a
mechanical split of "imul" into "imul" and "imul3".  I'll try to fix it
sometime.)

I fixed the two problems above, tweaked a few other bits of formatting
and wrote a changelog.  I've attached the final (applied) version.

I also added a couple of testcases, also attached.

> As to the peephole2, did you mean that every mult would be
>  replace with multu.g unless it's the LO is the demanding target,
> so that no peephole2 is need for it? If so, I deleted this part.

No, what I meant was: GCC is not allowed to (or should not be allowed to)
use MULT in _any_ circumstances on a Loongson target, even without the
define_peephole2.

In other words, the peephole2 looked for the following sequence of
instructions:

  I1: a multiplication that sets LO
  I2: a move from LO to a GPR

But how would we ever get I1 for Loongson targets?  When compiling for a
Loongson target, GCC is supposed to use your new mul<mode>3_mul3_ls2ef
define_insn for _every_ multiplication.  And your define_insn requires
the target to be a GPR; it does not allow multiplications to set LO.

I think you based the define_peephole2 on this one:

  ;; If a register gets allocated to LO, and we spill to memory, the reload
  ;; will include a move from LO to a GPR.  Merge it into the multiplication
  ;; if it can set the GPR directly.
  ;;
  ;; Operand 0: LO
  ;; Operand 1: GPR (1st multiplication operand)
  ;; Operand 2: GPR (2nd multiplication operand)
  ;; Operand 3: GPR (destination)
  (define_peephole2
    [(parallel
         [(set (match_operand:SI 0 "lo_operand")
               (mult:SI (match_operand:SI 1 "d_operand")
                        (match_operand:SI 2 "d_operand")))
          (clobber (scratch:SI))])
     (set (match_operand:SI 3 "d_operand")
          (match_dup 0))]
    "ISA_HAS_MUL3 && peep2_reg_dead_p (2, operands[0])"
    [(parallel
         [(set (match_dup 3)
               (mult:SI (match_dup 1)
                        (match_dup 2)))
          (clobber (match_dup 0))])])

But this peephole2 is for the following define_insn:

  (define_insn "mul<mode>3_mul3"
    [(set (match_operand:GPR 0 "register_operand" "=d,l")
          (mult:GPR (match_operand:GPR 1 "register_operand" "d,d")
                    (match_operand:GPR 2 "register_operand" "d,d")))
     (clobber (match_scratch:GPR 3 "=l,X"))]
    "ISA_HAS_<D>MUL3"
  {
    if (which_alternative == 1)
      return "<d>mult\t%1,%2";
    if (<MODE>mode == SImode && TARGET_MIPS3900)
      return "mult\t%0,%1,%2";
    return "<d>mul\t%0,%1,%2";
  }
    [(set_attr "type" "imul3,imul")
     (set_attr "mode" "<MODE>")])

Notice how this pattern allows the destination be either a GPR (the first
alternative) _or_ LO (the second alternative).  So the define_insn allows
GCC to generate either MUL _or_ MULT.

Sometimes GCC won't make a good decision: sometimes it will choose
the MULT alternative and then move the result from LO to a GPR.
(Hopefully this is rare.)  The define_peephole2 then optimises
"MULT; MFLO" into "MUL".

On the other hand, your new Loongson pattern is:

  (define_insn "mul<mode>3_mul3_ls2ef"
    [(set (match_operand:GPR 0 "register_operand" "=d")
          (mult:GPR (match_operand:GPR 1 "register_operand" "d")
                    (match_operand:GPR 2 "register_operand" "d")))]
    "TARGET_LOONGSON_2EF"
    "<d>multu.g\t%0,%1,%2"
    [(set_attr "type" "imul3nc")
     (set_attr "mode" "<MODE>")])

This pattern has only one alternative.  The instruction must be <D>MULTU.G,
and the destination must be a GPR.  It is not possible (or should not be
possible) for GCC to decide to use <D>MULT instead.  And if it isn't possible
for GCC to use <D>MULT instead, there should be no "<D>MULT; MFLO" sequences
for the define_peephole2 to optimise.

Richard


gcc/
2008-11-13  Ruan Beihong  <ruanbeihong@gmail.com>

	* config/mips/loongson.md (<u>div<mode>3, <u>mod<mode>3): New patterns.
	* config/mips/loongson2ef.md (ls2_imult): Handle imul3nc.
	(ls2_idiv): Likewise idiv3.
	(ls2_prefetch): New reservation.
	* config/mips/mips.h (ISA_HAS_PREFETCH): Add TARGET_LOONGSON_2EF.
	* config/mips/mips.md (type): Add imul3nc and idiv3.
	(length): Handle idiv3.
	(any_mod): New code_iterator.
	(u): Handle MOD and UMOD.
	(mul<mode>3): Generate mul<mode>3_mul3_ls2ef on Loongson targets.
	(prefetch): Handle TARGET_LOONGSON_2EF.

Index: gcc/config/mips/loongson.md
===================================================================

--- gcc/config/mips/loongson.md	(révision 141833)
+++ gcc/config/mips/loongson.md	(copie de travail)
@@ -473,3 +473,23 @@ (define_insn "vec_interleave_low<mode>"
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
   "punpckl<V_stretch_half_suffix>\t%0,%1,%2"
   [(set_attr "type" "fdiv")])
+
+;; Integer division and modulus.
+
+(define_insn "<u>div<mode>3"
+  [(set (match_operand:GPR 0 "register_operand" "=&d")
+	(any_div:GPR (match_operand:GPR 1 "register_operand" "d")
+		     (match_operand:GPR 2 "register_operand" "d")))]
+  "TARGET_LOONGSON_2EF"
+  { return mips_output_division ("<d>div<u>.g\t%0,%1,%2", operands); }
+  [(set_attr "type" "idiv3")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<u>mod<mode>3"
+  [(set (match_operand:GPR 0 "register_operand" "=&d")
+	(any_mod:GPR (match_operand:GPR 1 "register_operand" "d")
+		     (match_operand:GPR 2 "register_operand" "d")))]
+  "TARGET_LOONGSON_2EF"
+  { return mips_output_division ("<d>mod<u>.g\t%0,%1,%2", operands); }
+  [(set_attr "type" "idiv3")
+   (set_attr "mode" "<MODE>")])
Index: gcc/config/mips/loongson2ef.md
===================================================================
--- gcc/config/mips/loongson2ef.md	(révision 141833)
+++ gcc/config/mips/loongson2ef.md	(copie de travail)
@@ -160,14 +160,14 @@ (define_insn_reservation "ls2_branch" 2
 ;; Reservation for integer multiplication instructions.
 (define_insn_reservation "ls2_imult" 5
   (and (eq_attr "cpu" "loongson_2e,loongson_2f")
-       (eq_attr "type" "imul,imul3"))
+       (eq_attr "type" "imul,imul3nc"))
   "ls2_alu2,ls2_alu2_core")
 
 ;; Reservation for integer division / remainder instructions.
 ;; These instructions use the SRT algorithm and hence take 2-38 cycles.
 (define_insn_reservation "ls2_idiv" 20
   (and (eq_attr "cpu" "loongson_2e,loongson_2f")
-       (eq_attr "type" "idiv"))
+       (eq_attr "type" "idiv,idiv3"))
   "ls2_alu2,ls2_alu2_core*18")
 
 ;; Reservation for memory load instructions.
@@ -176,6 +176,11 @@ (define_insn_reservation "ls2_load" 5
        (eq_attr "type" "load,fpload,mfc,mtc"))
   "ls2_mem")
 
+(define_insn_reservation "ls2_prefetch" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "prefetch,prefetchx"))
+  "ls2_mem")
+
 ;; Reservation for memory store instructions.
 ;; With stores we assume they don't alias with dependent loads.
 ;; Therefore we set the latency to zero.
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(révision 141833)
+++ gcc/config/mips/mips.h	(copie de travail)
@@ -919,6 +919,7 @@ #define ISA_HAS_ROR		((ISA_MIPS32R2				\
 
 /* ISA has data prefetch instructions.  This controls use of 'pref'.  */
 #define ISA_HAS_PREFETCH	((ISA_MIPS4				\
+				  || TARGET_LOONGSON_2EF		\
 				  || ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64				\
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(révision 141833)
+++ gcc/config/mips/mips.md	(copie de travail)
@@ -351,8 +351,10 @@ (define_attr "dword_mode" "no,yes"
 ;; trap		trap if instructions
 ;; imul		integer multiply 2 operands
 ;; imul3	integer multiply 3 operands
+;; imul3nc	integer multiply 3 operands without clobbering HI/LO
 ;; imadd	integer multiply-add
-;; idiv		integer divide
+;; idiv		integer divide 2 operands
+;; idiv3	integer divide 3 operands
 ;; move		integer register move ({,D}ADD{,U} with rt = 0)
 ;; fmove	floating point register move
 ;; fadd		floating point add/subtract
@@ -376,9 +378,9 @@ (define_attr "dword_mode" "no,yes"
 (define_attr "type"
   "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore,
    prefetch,prefetchx,condmove,mtc,mfc,mthilo,mfhilo,const,arith,logical,
-   shift,slt,signext,clz,pop,trap,imul,imul3,imadd,idiv,move,fmove,fadd,fmul,
-   fmadd,fdiv,frdiv,frdiv1,frdiv2,fabs,fneg,fcmp,fcvt,fsqrt,frsqrt,frsqrt1,
-   frsqrt2,multi,nop,ghost"
+   shift,slt,signext,clz,pop,trap,imul,imul3,imul3nc,imadd,idiv,idiv3,move,
+   fmove,fadd,fmul,fmadd,fdiv,frdiv,frdiv1,frdiv2,fabs,fneg,fcmp,fcvt,fsqrt,
+   frsqrt,frsqrt1,frsqrt2,multi,nop,ghost"
   (cond [(eq_attr "jal" "!unset") (const_string "call")
 	 (eq_attr "got" "load") (const_string "load")
 
@@ -553,7 +555,7 @@ (define_attr "length" ""
 		    (ne (symbol_ref "TARGET_FIX_VR4120") (const_int 0))))
 	  (const_int 8)
 
-	  (eq_attr "type" "idiv")
+	  (eq_attr "type" "idiv,idiv3")
 	  (symbol_ref "mips_idiv_insns () * 4")
 	  ] (const_int 4)))
 
@@ -787,6 +789,10 @@ (define_code_iterator any_shift [ashift 
 ;; from the same template.
 (define_code_iterator any_div [div udiv])
 
+;; This code iterator allows unsigned and signed modulus to be generated
+;; from the same template.
+(define_code_iterator any_mod [mod umod])
+
 ;; This code iterator allows all native floating-point comparisons to be
 ;; generated from the same template.
 (define_code_iterator fcond [unordered uneq unlt unle eq lt le])
@@ -809,6 +815,7 @@ (define_code_iterator any_le [le leu])
 ;; "u" when doing an unsigned operation.
 (define_code_attr u [(sign_extend "") (zero_extend "u")
 		     (div "") (udiv "u")
+		     (mod "") (umod "u")
 		     (gt "") (gtu "u")
 		     (ge "") (geu "u")
 		     (lt "") (ltu "u")
@@ -1357,7 +1364,10 @@ (define_expand "mul<mode>3"
 		  (match_operand:GPR 2 "register_operand")))]
   ""
 {
-  if (ISA_HAS_<D>MUL3)
+  if (TARGET_LOONGSON_2EF)
+    emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
+                                          operands[2]));
+  else if (ISA_HAS_<D>MUL3)
     emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
   else if (TARGET_FIX_R4000)
     emit_insn (gen_mul<mode>3_r4000 (operands[0], operands[1], operands[2]));
@@ -1367,6 +1377,15 @@ (define_expand "mul<mode>3"
   DONE;
 })
 
+(define_insn "mul<mode>3_mul3_ls2ef"
+  [(set (match_operand:GPR 0 "register_operand" "=d")
+        (mult:GPR (match_operand:GPR 1 "register_operand" "d")
+                  (match_operand:GPR 2 "register_operand" "d")))]
+  "TARGET_LOONGSON_2EF"
+  "<d>multu.g\t%0,%1,%2"
+  [(set_attr "type" "imul3nc")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "mul<mode>3_mul3"
   [(set (match_operand:GPR 0 "register_operand" "=d,l")
 	(mult:GPR (match_operand:GPR 1 "register_operand" "d,d")
@@ -6128,6 +6147,9 @@ (define_insn "prefetch"
 	     (match_operand 2 "const_int_operand" "n"))]
   "ISA_HAS_PREFETCH && TARGET_EXPLICIT_RELOCS"
 {
+  if (TARGET_LOONGSON_2EF)
+    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
+    return "ld\t$0,%a0";
   operands[1] = mips_prefetch_cookie (operands[1], operands[2]);
   return "pref\t%1,%a0";
 }



gcc/testsuite/
	* gcc.target/mips/loongson-muldiv-1.c: New test.
	* gcc.target/mips/loongson-muldiv-2.c: Likewise.

Index: gcc/testsuite/gcc.target/mips/loongson-muldiv-1.c
===================================================================
--- /dev/null	2008-11-13 19:44:41.580098750 +0000
+++ gcc/testsuite/gcc.target/mips/loongson-muldiv-1.c	2008-11-13 22:35:42.000000000 +0000
@@ -0,0 +1,16 @@
+/* { dg-mips-options "-O2 -march=loongson2e" } */
+
+typedef int st;
+typedef unsigned int ut;
+
+NOMIPS16 st smul (st x, st y) { return x * y; }
+NOMIPS16 st sdiv (st x, st y) { return x / y + x % y; }
+
+NOMIPS16 ut umul (ut x, ut y) { return x * y; }
+NOMIPS16 ut udiv (ut x, ut y) { return x / y + x % y; }
+
+/* { dg-final { scan-assembler-times "\tmultu.g\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tdivu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tmodu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tdiv.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tmod.g\t" 1 } } */
Index: gcc/testsuite/gcc.target/mips/loongson-muldiv-2.c
===================================================================
--- /dev/null	2008-11-13 19:44:41.580098750 +0000
+++ gcc/testsuite/gcc.target/mips/loongson-muldiv-2.c	2008-11-13 22:35:24.000000000 +0000
@@ -0,0 +1,16 @@
+/* { dg-mips-options "-O2 -march=loongson2e -mgp64" } */
+
+typedef long long st;
+typedef unsigned long long ut;
+
+NOMIPS16 st smul (st x, st y) { return x * y; }
+NOMIPS16 st sdiv (st x, st y) { return x / y + x % y; }
+
+NOMIPS16 ut umul (ut x, ut y) { return x * y; }
+NOMIPS16 ut udiv (ut x, ut y) { return x / y + x % y; }
+
+/* { dg-final { scan-assembler-times "\tdmultu.g\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tddivu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tdmodu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tddiv.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tdmod.g\t" 1 } } */