[RS6000] bswapdi2 pattern, reload and lra

Alan Modra amodra@gmail.com
Tue Dec 17 11:50:00 GMT 2013


This patch is aimed at fixing test failures introduced by my
2013-12-07 change to bswapdi2_32bit:
FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times lwbrx 6
FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6

The 2013-12-07 change was necessary to make -m32 -mlra generate good
code for pr53199.c:reg_reverse.  Too many '?'s on the r->r alternative
result in lra never choosing that option.  Instead we get Z->r,
ie. storing the input reg to a stack temp then using lwbrx from there.
That means we have a load-hit-store flush with a severe slowdown.
(See http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00002.html for the
corresponding -m64 result, a 4x slowdown.)

A similar problem occurs with -m64 -mcpu=power7 -mlra due to
bswapdi2_ldbrx having two '?'s on r->r.

To fix this I ran into a whole lot of pain.  reload and lra are quite
different in their selection of insn alternatives.  I could not find
any combination of '?'s that generated the best code for both reload
an lra on pr53199.c.  To see why, it's necessary to look (from a great
height) at how reload chooses amongst alternatives.  A particular
alternative gets a "loser" score, with the lowest "loser" being
chosen.  "loser" is incremented
a) when an operand does not match its constraint.
b) when an alternative has a '?' in the constraint.
c) when a scratch register is required.
d) when an early clobber output clashes with one of the inputs.

a) is fairly obvious.  For example, if we have a MEM when the operand
   alternative needs a reg, then we'll require a reload.
b) is also quite obvious.  Multiple '?'s accumulate.
c) is a little more subtle.  It bites you when alternatives require
   differing numbers of scratch registers.  Take for example
   bswapdi2_64bit, which before this patch has three alternatives
   Z->r with 3 scratch regs, (Z is a subset of m)
   r->Z with 2 scratch regs,
   r->r with 3 scratch regs.
   All other things being equal, with reload you could correct this
   disparity by adding a '?' to the r->Z alternative.  We might want
   to do that so that Z->r and r->r are the same "distance" apart
   as r->Z is from r->r.  With lra it seems that scratch regs are
   weighted differently..
d) is also tricky, and a trap for anyone optimizing insn selection for
   functions like some in pr53199.c that have just one rtl insn with
   early clobbers.  PowerPC generally returns function results in the
   same register as the first argument, so these hit the early
   clobber.  Code elsewhere in larger functions probably won't.
   lra penalizes early clobbers differently to reload (a lot less).

So, putting this all together..  Point (d) test implication is covered
by the additional functions in pr53199.c.  Avoiding early clobbers
where possible is good since it reduces differences between reload and
lra insn alternative costing.  We also generate better code.  I
managed to do this for all the Z->r bswapdi patterns, but stopped
short of changing r->r as well since at that point everything looked
OK!  Avoiding extra scratch registers for one alternative is also good.
The op4 r->r scratch wasn't even used, and op4 for Z->r fairly easy to
do without.  Renaming word_high/low to word1/2 was to make it a little
easier to track lifetime of addr1/2.

Bootstrapped and regression tested powerpc64-linux.  Output of
pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
{-mlra,}.  OK to apply?

gcc/
	* config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
	Modify Z->r bswapdi splitter to use dest in place of scratch.
	In r->Z and Z->r bswapdi splitter rename word_high, word_low
	to word1, word2 and rearrange logic to suit.
	(bswapdi2_64bit): Remove early clobber on Z->r alternative.
	(bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
	(bswapdi2_32bit): Remove early clobber on Z->r alternative.
	Add one '?' on r->r.  Modify Z->r splitter to avoid need for
	early clobber.
gcc/testsuite/
	* gcc.target/powerpc/pr53199.c: Add extra functions.

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 206009)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -2344,8 +2344,7 @@
 		   (bswap:DI
 		    (match_operand:DI 1 "reg_or_mem_operand" "")))
 	      (clobber (match_scratch:DI 2 ""))
-	      (clobber (match_scratch:DI 3 ""))
-	      (clobber (match_scratch:DI 4 ""))])]
+	      (clobber (match_scratch:DI 3 ""))])]
   ""
 {
   if (!REG_P (operands[0]) && !REG_P (operands[1]))
@@ -2363,11 +2362,10 @@
 
 ;; Power7/cell has ldbrx/stdbrx, so use it directly
 (define_insn "*bswapdi2_ldbrx"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=r,Z,&r")
 	(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
    (clobber (match_scratch:DI 2 "=X,X,&r"))
-   (clobber (match_scratch:DI 3 "=X,X,&r"))
-   (clobber (match_scratch:DI 4 "=X,X,&r"))]
+   (clobber (match_scratch:DI 3 "=X,X,&r"))]
   "TARGET_POWERPC64 && TARGET_LDBRX
    && (REG_P (operands[0]) || REG_P (operands[1]))"
   "@
@@ -2379,11 +2377,10 @@
 
 ;; Non-power7/cell, fall back to use lwbrx/stwbrx
 (define_insn "*bswapdi2_64bit"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=r,Z,&r")
 	(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
    (clobber (match_scratch:DI 2 "=&b,&b,&r"))
-   (clobber (match_scratch:DI 3 "=&r,&r,&r"))
-   (clobber (match_scratch:DI 4 "=&r,X,&r"))]
+   (clobber (match_scratch:DI 3 "=&r,&r,&r"))]
   "TARGET_POWERPC64 && !TARGET_LDBRX
    && (REG_P (operands[0]) || REG_P (operands[1]))
    && !(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))
@@ -2395,8 +2392,7 @@
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(bswap:DI (match_operand:DI 1 "indexed_or_indirect_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 4 "gpc_reg_operand" ""))]
+   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
   "TARGET_POWERPC64 && !TARGET_LDBRX && reload_completed"
   [(const_int 0)]
   "
@@ -2405,15 +2401,14 @@
   rtx src    = operands[1];
   rtx op2    = operands[2];
   rtx op3    = operands[3];
-  rtx op4    = operands[4];
   rtx op3_32 = simplify_gen_subreg (SImode, op3, DImode,
 				    BYTES_BIG_ENDIAN ? 4 : 0);
-  rtx op4_32 = simplify_gen_subreg (SImode, op4, DImode,
-				    BYTES_BIG_ENDIAN ? 4 : 0);
+  rtx dest_32 = simplify_gen_subreg (SImode, dest, DImode,
+				     BYTES_BIG_ENDIAN ? 4 : 0);
   rtx addr1;
   rtx addr2;
-  rtx word_high;
-  rtx word_low;
+  rtx word1;
+  rtx word2;
 
   addr1 = XEXP (src, 0);
   if (GET_CODE (addr1) == PLUS)
@@ -2438,29 +2433,29 @@
       addr2 = gen_rtx_PLUS (Pmode, op2, addr1);
     }
 
+  word1 = change_address (src, SImode, addr1);
+  word2 = change_address (src, SImode, addr2);
+
   if (BYTES_BIG_ENDIAN)
     {
-      word_high = change_address (src, SImode, addr1);
-      word_low  = change_address (src, SImode, addr2);
+      emit_insn (gen_bswapsi2 (op3_32, word2));
+      emit_insn (gen_bswapsi2 (dest_32, word1));
     }
   else
     {
-      word_high = change_address (src, SImode, addr2);
-      word_low  = change_address (src, SImode, addr1);
+      emit_insn (gen_bswapsi2 (op3_32, word1));
+      emit_insn (gen_bswapsi2 (dest_32, word2));
     }
 
-  emit_insn (gen_bswapsi2 (op3_32, word_low));
-  emit_insn (gen_bswapsi2 (op4_32, word_high));
-  emit_insn (gen_ashldi3 (dest, op3, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op4));
+  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
+  emit_insn (gen_iordi3 (dest, dest, op3));
 }")
 
 (define_split
   [(set (match_operand:DI 0 "indexed_or_indirect_operand" "")
 	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 4 "" ""))]
+   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
   "TARGET_POWERPC64 && !TARGET_LDBRX && reload_completed"
   [(const_int 0)]
   "
@@ -2475,8 +2470,8 @@
 				    BYTES_BIG_ENDIAN ? 4 : 0);
   rtx addr1;
   rtx addr2;
-  rtx word_high;
-  rtx word_low;
+  rtx word1;
+  rtx word2;
 
   addr1 = XEXP (dest, 0);
   if (GET_CODE (addr1) == PLUS)
@@ -2501,27 +2496,28 @@
       addr2 = gen_rtx_PLUS (Pmode, op2, addr1);
     }
 
+  word1 = change_address (dest, SImode, addr1);
+  word2 = change_address (dest, SImode, addr2);
+
   emit_insn (gen_lshrdi3 (op3, src, GEN_INT (32)));
+
   if (BYTES_BIG_ENDIAN)
     {
-      word_high = change_address (dest, SImode, addr1);
-      word_low  = change_address (dest, SImode, addr2);
+      emit_insn (gen_bswapsi2 (word1, src_si));
+      emit_insn (gen_bswapsi2 (word2, op3_si));
     }
   else
     {
-      word_high = change_address (dest, SImode, addr2);
-      word_low  = change_address (dest, SImode, addr1);
+      emit_insn (gen_bswapsi2 (word2, src_si));
+      emit_insn (gen_bswapsi2 (word1, op3_si));
     }
-  emit_insn (gen_bswapsi2 (word_high, src_si));
-  emit_insn (gen_bswapsi2 (word_low, op3_si));
 }")
 
 (define_split
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 4 "" ""))]
+   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
   "TARGET_POWERPC64 && reload_completed"
   [(const_int 0)]
   "
@@ -2544,7 +2540,7 @@
 }")
 
 (define_insn "bswapdi2_32bit"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=r,Z,?&r")
 	(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
    (clobber (match_scratch:SI 2 "=&b,&b,X"))]
   "!TARGET_POWERPC64 && (REG_P (operands[0]) || REG_P (operands[1]))"
@@ -2573,7 +2569,8 @@
   if (GET_CODE (addr1) == PLUS)
     {
       emit_insn (gen_add3_insn (op2, XEXP (addr1, 0), GEN_INT (4)));
-      if (TARGET_AVOID_XFORM)
+      if (TARGET_AVOID_XFORM
+	  || REGNO (XEXP (addr1, 1)) == REGNO (dest2))
 	{
 	  emit_insn (gen_add3_insn (op2, XEXP (addr1, 1), op2));
 	  addr2 = op2;
@@ -2581,7 +2578,8 @@
       else
 	addr2 = gen_rtx_PLUS (SImode, op2, XEXP (addr1, 1));
     }
-  else if (TARGET_AVOID_XFORM)
+  else if (TARGET_AVOID_XFORM
+	   || REGNO (addr1) == REGNO (dest2))
     {
       emit_insn (gen_add3_insn (op2, addr1, GEN_INT (4)));
       addr2 = op2;
@@ -2596,6 +2594,8 @@
   word2 = change_address (src, SImode, addr2);
 
   emit_insn (gen_bswapsi2 (dest2, word1));
+  /* The REGNO (dest2) tests above ensure that addr2 has not been trashed,
+     thus allowing us to omit an early clobber on the output.  */
   emit_insn (gen_bswapsi2 (dest1, word2));
 }")
 
Index: gcc/testsuite/gcc.target/powerpc/pr53199.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr53199.c	(revision 206009)
+++ gcc/testsuite/gcc.target/powerpc/pr53199.c	(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-options "-O2 -mcpu=power6 -mavoid-indexed-addresses" } */
-/* { dg-final { scan-assembler-times "lwbrx" 6 } } */
+/* { dg-final { scan-assembler-times "lwbrx" 12 } } */
 /* { dg-final { scan-assembler-times "stwbrx" 6 } } */
 
 /* PR 51399: bswap gets an error if -mavoid-indexed-addresses was used in
@@ -25,6 +25,24 @@
   return __builtin_bswap64 (p[i]);
 }
 
+long long
+load64_reverse_4 (long long dummy __attribute__ ((unused)), long long *p)
+{
+  return __builtin_bswap64 (*p);
+}
+
+long long
+load64_reverse_5 (long long dummy __attribute__ ((unused)), long long *p)
+{
+  return __builtin_bswap64 (p[1]);
+}
+
+long long
+load64_reverse_6 (long long dummy __attribute__ ((unused)), long long *p, int i)
+{
+  return __builtin_bswap64 (p[i]);
+}
+
 void
 store64_reverse_1 (long long *p, long long x)
 {
@@ -44,7 +62,13 @@
 }
 
 long long
-reg_reverse (long long x)
+reg_reverse_1 (long long x)
 {
   return __builtin_bswap64 (x);
 }
+
+long long
+reg_reverse_2 (long long dummy __attribute__ ((unused)), long long x)
+{
+  return __builtin_bswap64 (x);
+}

-- 
Alan Modra
Australia Development Lab, IBM



More information about the Gcc-patches mailing list