[committed] Tweak MIPS MADD/MSUB patterns

Mon Nov 17 22:52:00 GMT 2008

This patch tweaks the MIPS MADD and MSUB patterns.  It's needed to avoid
code quality regressions in the patch I'm about to post, which in turn
is needed to fix dspr2-MULT{,U}.c regressions.

The old MADD pattern was:

;; For processors that can copy the output to a general register:
;;
;; The all-d alternative is needed because the combiner will find this
;; pattern and then register alloc/reload will move registers around to
;; make them fit, and we don't want to trigger unnecessary loads to LO.
;;
;; The last alternative should be made slightly less desirable, but adding
;; "?" to the constraint is too strong, and causes values to be loaded into
;; LO even when that's more costly.  For now, using "*d" mostly does the
;; trick.
(define_insn "*mul_acc_si"
  [(set (match_operand:SI 0 "register_operand" "=l,*d,*d")
	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
			  (match_operand:SI 2 "register_operand" "d,d,d"))
		 (match_operand:SI 3 "register_operand" "0,l,*d")))
   (clobber (match_scratch:SI 4 "=X,3,l"))
   (clobber (match_scratch:SI 5 "=X,X,&d"))]
  "(TARGET_MIPS3900
   || GENERATE_MADD_MSUB)
   && !TARGET_MIPS16"
{
  static const char *const madd[] = { "madd\t%1,%2", "madd\t%0,%1,%2" };
  if (which_alternative == 2)
    return "#";
  if (GENERATE_MADD_MSUB && which_alternative != 0)
    return "#";
  return madd[which_alternative];
}
  [(set_attr "type"	"imadd")
   (set_attr "mode"	"SI")
   (set_attr "length"	"4,4,8")])

which I've never been too keen on.  We ought to expose the all-d
alternative to the register allocators.

I think the problem that the author came across was that reload
and the register allocators take a different attitude to SCRATCH
reloads.  The register allocator ignores scratches (which aren't
allocatable) but reload treats them like any other operand.

So if we removed the "*"s from the pattern above, the register
allocator would treat all alternatives as having equal cost.
OTOH, reload would think that the second alternative was one
reload more expensive than the first (because of the "3" clobber)
and that the third alternative was one reload more expensive still
(because of the "l" and "&d" clobbers).

This difference between reload and the register allocators might not
be ideal, but I think it's too dangerous to change at this stage.
And for the record, I think the old RA behaved like IRA here.

The approach I've taken in the patch is to use "*?" to counteract
the extra costs that reload adds.  "*?" is not seen by the RA,
so it uses the "natural" weighting.

I don't have ready access to a good benchmark system for this,
but FWIW, I ran CSiBE before and after the patch.[*]  I then
went through the differences by hand.  All of them seem to be
cases in which we replaced sequences like:

   mtlo
   madd
   mflo

with:
 
   mul
   addiu

which is exactly what the old code was trying (but failing) to do.
I've added an artifical example as madd-8.c.  We still used MADD in
"MULT; MADD" sequences.

  [*] CSiBE contains JPEG code, so we get some coverage of DCTs and IDCTs.

In other words, I'm simply trying to preserve the intent of the
current code and "fix" cases where it wasn't being applied properly.
I'm not trying to change the underlying principle.

m{add,sub}-{5,6,7}.c pass before and after the patch; only m{add,sub}-8.c
are fixed by the patch.

Tested on mipsisa64-elfoabi and applied.

Richard


gcc/
	* config/mips/mips.md (*mul_acc_si): Remove middle alternative
	and its associated define_split.  Expose the all-d alternative
	to the register allocator, but mark it with "?".  Mark the first
	alternative with "*?*?".
	(*mul_sub_si): Likewise.
	(*mul_acc_si_r3900): New pattern.

gcc/testsuite/
	* gcc.target/mips/madd-5.c: New test.
	* gcc.target/mips/madd-6.c: Likewise.
	* gcc.target/mips/madd-7.c: Likewise.
	* gcc.target/mips/madd-8.c: Likewise.
	* gcc.target/mips/msub-5.c: Likewise.
	* gcc.target/mips/msub-6.c: Likewise.
	* gcc.target/mips/msub-7.c: Likewise.
	* gcc.target/mips/msub-8.c: Likewise.

Index: gcc/config/mips/mips.md
===================================================================

--- gcc/config/mips/mips.md	2008-11-17 20:00:53.000000000 +0000
+++ gcc/config/mips/mips.md	2008-11-17 22:00:35.000000000 +0000
@@ -1474,34 +1474,50 @@ (define_peephole2
 
 ;; Multiply-accumulate patterns
 
-;; For processors that can copy the output to a general register:
-;;
-;; The all-d alternative is needed because the combiner will find this
-;; pattern and then register alloc/reload will move registers around to
-;; make them fit, and we don't want to trigger unnecessary loads to LO.
-;;
-;; The last alternative should be made slightly less desirable, but adding
 -;; "?" to the constraint is too strong, and causes values to be loaded into
-;; LO even when that's more costly.  For now, using "*d" mostly does the
-;; trick.
+;; This pattern is first matched by combine, which tries to use the
+;; pattern wherever it can.  We don't know until later whether it
+;; is actually profitable to use MADD over a "MUL; ADDIU" sequence,
+;; so we need to keep both options open.
+;;
+;; The second alternative has a "?" marker because it is generally
+;; one instruction more costly than the first alternative.  This "?"
+;; marker is enough to convey the relative costs to the register
+;; allocator.
+;;
+;; However, reload counts reloads of operands 4 and 5 in the same way as
+;; reloads of the other operands, even though operands 4 and 5 need no
+;; copy instructions.  Reload therefore thinks that the second alternative
+;; is two reloads more costly than the first.  We add "*?*?" to the first
+;; alternative as a counterweight.
 (define_insn "*mul_acc_si"
-  [(set (match_operand:SI 0 "register_operand" "=l,*d,*d")
+  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
+	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
+			  (match_operand:SI 2 "register_operand" "d,d"))
+		 (match_operand:SI 3 "register_operand" "0,d")))
+   (clobber (match_scratch:SI 4 "=X,l"))
+   (clobber (match_scratch:SI 5 "=X,&d"))]
+  "GENERATE_MADD_MSUB && !TARGET_MIPS16"
+  "@
+    madd\t%1,%2
+    #"
+  [(set_attr "type"	"imadd")
+   (set_attr "mode"	"SI")
+   (set_attr "length"	"4,8")])
+
+;; The same idea applies here.  The middle alternative needs one less
+;; clobber than the final alternative, so we add "*?" as a counterweight.
+(define_insn "*mul_acc_si_r3900"
+  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d*?,d?")
 	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
 			  (match_operand:SI 2 "register_operand" "d,d,d"))
-		 (match_operand:SI 3 "register_operand" "0,l,*d")))
+		 (match_operand:SI 3 "register_operand" "0,l,d")))
    (clobber (match_scratch:SI 4 "=X,3,l"))
    (clobber (match_scratch:SI 5 "=X,X,&d"))]
-  "(TARGET_MIPS3900
-   || GENERATE_MADD_MSUB)
-   && !TARGET_MIPS16"
-{
-  static const char *const madd[] = { "madd\t%1,%2", "madd\t%0,%1,%2" };
-  if (which_alternative == 2)
-    return "#";
-  if (GENERATE_MADD_MSUB && which_alternative != 0)
-    return "#";
-  return madd[which_alternative];
-}
+  "TARGET_MIPS3900 && !TARGET_MIPS16"
+  "@
+    madd\t%1,%2
+    madd\t%0,%1,%2
+    #"
   [(set_attr "type"	"imadd")
    (set_attr "mode"	"SI")
    (set_attr "length"	"4,4,8")])
@@ -1522,23 +1538,6 @@ (define_split
    (set (match_dup 0) (plus:SI (match_dup 5) (match_dup 3)))]
   "")
 
-;; Split *mul_acc_si if the destination accumulator value is in a GPR
-;; and the source accumulator value is in LO.
-(define_split
-  [(set (match_operand:SI 0 "d_operand")
-        (plus:SI (mult:SI (match_operand:SI 1 "d_operand")
-                          (match_operand:SI 2 "d_operand"))
-                 (match_operand:SI 3 "lo_operand")))
-   (clobber (match_dup 3))
-   (clobber (scratch:SI))]
-  "reload_completed"
-  [(parallel [(set (match_dup 3)
-                   (plus:SI (mult:SI (match_dup 1) (match_dup 2))
-                            (match_dup 3)))
-              (clobber (scratch:SI))
-              (clobber (scratch:SI))])
-   (set (match_dup 0) (match_dup 3))])
-
 (define_insn "*macc"
   [(set (match_operand:SI 0 "register_operand" "=l,d")
 	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
@@ -1718,21 +1717,21 @@ (define_peephole2
 				operands[2], operands[0]);
 })
 
+;; See the comment above *mul_add_si for details.
 (define_insn "*mul_sub_si"
-  [(set (match_operand:SI 0 "register_operand" "=l,*d,*d")
-        (minus:SI (match_operand:SI 1 "register_operand" "0,l,*d")
-                  (mult:SI (match_operand:SI 2 "register_operand" "d,d,d")
-                           (match_operand:SI 3 "register_operand" "d,d,d"))))
-   (clobber (match_scratch:SI 4 "=X,1,l"))
-   (clobber (match_scratch:SI 5 "=X,X,&d"))]
+  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
+        (minus:SI (match_operand:SI 1 "register_operand" "0,d")
+                  (mult:SI (match_operand:SI 2 "register_operand" "d,d")
+                           (match_operand:SI 3 "register_operand" "d,d"))))
+   (clobber (match_scratch:SI 4 "=X,l"))
+   (clobber (match_scratch:SI 5 "=X,&d"))]
   "GENERATE_MADD_MSUB"
   "@
    msub\t%2,%3
-   #
    #"
   [(set_attr "type"     "imadd")
    (set_attr "mode"     "SI")
-   (set_attr "length"   "4,8,8")])
+   (set_attr "length"   "4,8")])
 
 ;; Split *mul_sub_si if both the source and destination accumulator
 ;; values are GPRs.
@@ -1750,24 +1749,6 @@ (define_split
    (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 5)))]
   "")
 
-;; Split *mul_acc_si if the destination accumulator value is in a GPR
-;; and the source accumulator value is in LO.
-(define_split
-  [(set (match_operand:SI 0 "d_operand")
-        (minus:SI (match_operand:SI 1 "lo_operand")
-                  (mult:SI (match_operand:SI 2 "d_operand")
-                           (match_operand:SI 3 "d_operand"))))
-   (clobber (match_dup 1))
-   (clobber (scratch:SI))]
-  "reload_completed"
-  [(parallel [(set (match_dup 1)
-                   (minus:SI (match_dup 1)
-                             (mult:SI (match_dup 2) (match_dup 3))))
-              (clobber (scratch:SI))
-              (clobber (scratch:SI))])
-   (set (match_dup 0) (match_dup 1))]
-  "")
-
 (define_insn "*muls"
   [(set (match_operand:SI 0 "register_operand" "=l,d")
         (neg:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
Index: gcc/testsuite/gcc.target/mips/madd-5.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/madd-5.c	2008-11-17 20:56:27.000000000 +0000
@@ -0,0 +1,8 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler-times "\tmadd\t" 4 } } */
+/* { dg-final { scan-assembler-not "\tmtlo\t" } } */
+/* { dg-final { scan-assembler-times "\tmflo\t" 3 } } */
+
+NOMIPS16 void f1 (int *a) { a[0] = a[0] * a[1] + a[2] * a[3]; }
+NOMIPS16 void f2 (int *a) { a[0] = a[0] * a[1] + a[2] * a[3] + a[4]; }
+NOMIPS16 void f3 (int *a) { a[0] = a[0] * a[1] + a[2] * a[3] + a[4] * a[5]; }
Index: gcc/testsuite/gcc.target/mips/madd-6.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/madd-6.c	2008-11-17 20:56:27.000000000 +0000
@@ -0,0 +1,6 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler-not "\tmadd\t" } } */
+/* { dg-final { scan-assembler "\tmul\t" } } */
+/* { dg-final { scan-assembler "\taddu\t" } } */
+
+NOMIPS16 void f1 (int *a) { a[0] = a[0] * a[1] + a[2]; }
Index: gcc/testsuite/gcc.target/mips/madd-7.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/madd-7.c	2008-11-17 21:00:06.000000000 +0000
@@ -0,0 +1,14 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler-not "\tmul\t" } } */
+/* { dg-final { scan-assembler "\tmadd\t" } } */
+
+NOMIPS16 int
+f1 (int *a, int *b, int n)
+{
+  int x, i;
+
+  x = 0;
+  for (i = 0; i < n; i++)
+    x += a[i] * b[i];
+  return x;
+}
Index: gcc/testsuite/gcc.target/mips/madd-8.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/madd-8.c	2008-11-17 20:58:24.000000000 +0000
@@ -0,0 +1,15 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler "\tmul\t" } } */
+/* { dg-final { scan-assembler-not "\tmadd\t" } } */
+/* { dg-final { scan-assembler-not "\tmtlo\t" } } */
+/* { dg-final { scan-assembler-not "\tmflo\t" } } */
+
+NOMIPS16 int
+f2 (int x, int y, int z)
+{
+  asm volatile ("" ::: "$1", "$2", "$3", "$4", "$5", "$6", "$7", "$8", "$9",
+		"$10", "$11", "$12", "$13", "$14", "$15", "$16", "$17",
+		"$18", "$19", "$20", "$21", "$22", "$23", "$24", "$25",
+		"$31");
+  return x * y + z;
+}
Index: gcc/testsuite/gcc.target/mips/msub-5.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/msub-5.c	2008-11-17 22:06:54.000000000 +0000
@@ -0,0 +1,8 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler-times "\tmsub\t" 4 } } */
+/* { dg-final { scan-assembler-not "\tmtlo\t" } } */
+/* { dg-final { scan-assembler-times "\tmflo\t" 3 } } */
+
+NOMIPS16 void f1 (int *a) { a[0] = a[0] * a[1] - a[2] * a[3]; }
+NOMIPS16 void f2 (int *a) { a[0] = a[0] * a[1] - a[2] * a[3] - a[4]; }
+NOMIPS16 void f3 (int *a) { a[0] = a[0] * a[1] - a[2] * a[3] - a[4] * a[5]; }
Index: gcc/testsuite/gcc.target/mips/msub-6.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/msub-6.c	2008-11-17 22:03:20.000000000 +0000
@@ -0,0 +1,6 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler-not "\tmsub\t" } } */
+/* { dg-final { scan-assembler "\tmul\t" } } */
+/* { dg-final { scan-assembler "\tsubu\t" } } */
+
+NOMIPS16 void f1 (int *a) { a[0] = a[0] - a[1] * a[2]; }
Index: gcc/testsuite/gcc.target/mips/msub-7.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/msub-7.c	2008-11-17 22:03:32.000000000 +0000
@@ -0,0 +1,14 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler-not "\tmul\t" } } */
+/* { dg-final { scan-assembler "\tmsub\t" } } */
+
+NOMIPS16 int
+f1 (int *a, int *b, int n)
+{
+  int x, i;
+
+  x = 100;
+  for (i = 0; i < n; i++)
+    x -= a[i] * b[i];
+  return x;
+}
Index: gcc/testsuite/gcc.target/mips/msub-8.c
===================================================================
--- /dev/null	2008-11-17 19:21:05.568098000 +0000
+++ gcc/testsuite/gcc.target/mips/msub-8.c	2008-11-17 22:03:41.000000000 +0000
@@ -0,0 +1,15 @@
+/* { dg-mips-options "-O2 -march=5kc" } */
+/* { dg-final { scan-assembler "\tmul\t" } } */
+/* { dg-final { scan-assembler-not "\tmsub\t" } } */
+/* { dg-final { scan-assembler-not "\tmtlo\t" } } */
+/* { dg-final { scan-assembler-not "\tmflo\t" } } */
+
+NOMIPS16 int
+f2 (int x, int y, int z)
+{
+  asm volatile ("" ::: "$1", "$2", "$3", "$4", "$5", "$6", "$7", "$8", "$9",
+		"$10", "$11", "$12", "$13", "$14", "$15", "$16", "$17",
+		"$18", "$19", "$20", "$21", "$22", "$23", "$24", "$25",
+		"$31");
+  return x - y * z;
+}