This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH:[darwin] fix load of a misaligned double word


Here's the solution I found.

- Don't use mode in legitimate_offset_address_p to distinguish between
  DFmode and DImode mem accesses.  Either mode might end up in a 64-bit
  gpr.  That means we have to allow "(offset & 3) != 0" addresses here.
- Generate a reload in rs6000.c:secondary_reload_class for the ones
  that are invalid.
- Use 'Y' constraints in both movdf_hardfloat64 and movdi_internal64.

That's about all that is needed to ensure we generate correct code.

However, some extra tweaking is needed for optimal code.  Firstly, I
added a splitter to handle the m->r load since if "r" is a base
register, we can reuse "r' to generate a valid address.  This helps
reduce the number of reloads needed.  We can't do this for an r->m
store of course, so that meant using SECONDARY_INPUT_RELOAD_CLASS and
SECONDARY_OUTPUT_RELOAD_CLASS.  invalid_gpr_mem and base_reg_operand
are support functions for the splitter.

Secondly, I found I needed some tweaks in the generic part of reload.
Why?  Well, consider an instruction pattern like
    (set (mem/s:DI (plus:DI (reg:DI 3) (const_int 3))) (reg:DI 120))
This is one that has an offset that is invalid for a gpr store.

When find_reloads evaluates the alternatives available in
movdi_internal64, it will find two that are reasonable to use, the
r->Y store and the f->m store.  Both need a reload.  The former is a
"loser" for 'Y', and the latter a "loser" for 'f'.  That scores 6 points
against each.  Additionally, both alternative gain an extra two points
here:

reload.c:3431
	      if (! (GET_CODE (operand) == REG
		     && REGNO (operand) >= FIRST_PSEUDO_REGISTER)
		  && GET_CODE (operand) != SCRATCH
		  && ! (const_to_mem && constmemok))
		reject += 2;

What goes wrong now is that the r->Y store gains an extra point against
it because it needs an output reload.  The f->m store has no further
black marks.  Thus reload prefers the f->m store even though in practice
it is much worse.  Copying r->f requires a secondary memory reload.

So, we want to encourage reload to avoid the r->f input reload.  You
could do that by disparaging the f->m alternative with a '?', but that
would make reload prefer the r->Y store when storing a floating point
register, and you'd get a secondart memory copy for that case.  Thus,
I think there is a need for DISPARAGE_RELOAD_CLASS.  I chose a value
of 6 somewhat arbitrarily;  It's equivalent to an extra "loser" since
the secondary memory reload requires both a mem store and a mem load.


Here are the results for the testcase in
http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02569.html against current
mainline.  No regressions, and an improvement in f6, f7, and f8.  I'm
currently running a bootstrap and regression test on powerpc-linux and
powerpc64-linux.

--- floatice3.s.mainline	2004-02-27 12:01:59.000000000 +1030
+++ floatice3.s	2004-02-27 13:34:35.000000000 +1030
@@ -11,8 +11,8 @@
 	.type	.f1,@function
 	.globl	.f1
 .f1:
-	addi 3,3,3
 	li 0,1234
+	addi 3,3,3
 	std 0,0(3)
 	blr
 	.long 0
@@ -95,8 +95,8 @@
 	.type	.f5,@function
 	.globl	.f5
 .f5:
-	addi 3,3,3
-	ld 4,0(3)
+	addi 4,3,3
+	ld 4,0(4)
 	blr
 	.long 0
 	.byte 0,0,0,0,0,0,0,0
@@ -112,8 +112,7 @@
 	.type	.f6,@function
 	.globl	.f6
 .f6:
-	addi 3,3,3
-	lfd 1,0(3)
+	lfd 1,3(3)
 	blr
 	.long 0
 	.byte 0,0,0,0,0,0,0,0
@@ -129,9 +128,8 @@
 	.type	.f7,@function
 	.globl	.f7
 .f7:
-	std 4,-16(1)
-	lfd 0,-16(1)
-	stfd 0,3(3)
+	addi 3,3,3
+	std 4,0(3)
 	blr
 	.long 0
 	.byte 0,0,0,0,0,0,0,0
@@ -147,8 +145,7 @@
 	.type	.f8,@function
 	.globl	.f8
 .f8:
-	addi 3,3,3
-	stfd 1,0(3)
+	stfd 0,3(3)
 	blr
 	.long 0
 	.byte 0,0,0,0,0,0,0,0


	* config/rs6000/rs6000.c (invalid_gpr_mem): New function.
	(base_reg_operand): New function.
	(legitimate_offset_address_p): Don't test modes in an attempt to
	distinguish gpr vs fpr mem loads/stores.  Don't prohibit offsets
	invalid for 64-bit gpr loads/stores here.
	(secondary_reload_class): Add "inp" parameter.  Generate a reload
	for 64-bit gpr loads/stores.
	* config/rs6000/rs6000.h (SECONDARY_RELOAD_CLASS): Delete.
	(SECONDARY_INPUT_RELOAD_CLASS, SECONDARY_OUTPUT_RELOAD_CLASS): Define.
	(PREDICATE_CODES): Add invalid_gpr_mem and base_reg_operand.
	(DISPARAGE_RELOAD_CLASS): Define.
	* config/rs6000/rs6000.md (movdf_hardfloat64): Correct attrs.
	Add m->r alternative and split.
	(movdi_internal64): Add m->r alternative and split.
	* reload.c (find_reloads): Invoke DISPARAGE_RELOAD_CLASS.

diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000-protos.h gcc-reload/gcc/config/rs6000/rs6000-protos.h
--- gcc-mainline/gcc/config/rs6000/rs6000-protos.h	2004-02-06 16:05:56.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000-protos.h	2004-02-27 13:14:09.000000000 +1030
@@ -104,7 +104,7 @@ extern int registers_ok_for_quad_peep (r
 extern int addrs_ok_for_quad_peep (rtx, rtx);
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern enum reg_class secondary_reload_class (enum reg_class,
-					      enum machine_mode, rtx);
+					      enum machine_mode, rtx, int);
 extern int ccr_bit (rtx, int);
 extern int extract_MB (rtx);
 extern int extract_ME (rtx);
diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000.c gcc-reload/gcc/config/rs6000/rs6000.c
--- gcc-mainline/gcc/config/rs6000/rs6000.c	2004-02-27 11:10:19.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000.c	2004-02-27 13:14:09.000000000 +1030
@@ -2420,6 +2420,43 @@ word_offset_memref_operand (rtx op, enum
   return (off % 4) == 0;
 }
 
+/* Return true if operand is a (MEM (PLUS (REG) (offset))) where offset
+   is not divisible by four.  */
+
+int
+invalid_gpr_mem (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  rtx addr;
+  long off;
+
+  if (GET_CODE (op) != MEM)
+    return 0;
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) != PLUS
+      || GET_CODE (XEXP (addr, 0)) != REG
+      || GET_CODE (XEXP (addr, 1)) != CONST_INT)
+    return 0;
+
+  off = INTVAL (XEXP (addr, 1));
+  return (off & 3) != 0;
+}
+
+/* Return true if operand is a hard register that can be used as a base
+   register.  */
+
+int
+base_reg_operand (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  unsigned int regno;
+
+  if (!REG_P (op))
+    return 0;
+
+  regno = REGNO (op);
+  return regno != 0 && regno <= 31;
+}
+
 /* Return true if either operand is a general purpose register.  */
 
 bool
@@ -2544,18 +2581,14 @@ legitimate_offset_address_p (enum machin
 
     case DFmode:
     case DImode:
-      if (mode == DFmode || !TARGET_POWERPC64)
+      if (!TARGET_POWERPC64)
 	extra = 4;
-      else if (offset & 3)
-	return false;
       break;
 
     case TFmode:
     case TImode:
-      if (mode == TFmode || !TARGET_POWERPC64)
+      if (!TARGET_POWERPC64)
 	extra = 12;
-      else if (offset & 3)
-	return false;
       else
 	extra = 8;
       break;
@@ -8579,12 +8612,14 @@ addrs_ok_for_quad_peep (rtx addr1, rtx a
 
 /* Return the register class of a scratch register needed to copy IN into
    or out of a register in CLASS in MODE.  If it can be done directly,
-   NO_REGS is returned.  */
+   NO_REGS is returned.  INP is nonzero if we are loading the reg, zero
+   for storing.  */
 
 enum reg_class
-secondary_reload_class (enum reg_class class, 
-			enum machine_mode mode ATTRIBUTE_UNUSED,
-			rtx in)
+secondary_reload_class (enum reg_class class,
+			enum machine_mode mode,
+			rtx in,
+			int inp)
 {
   int regno;
 
@@ -8609,6 +8644,15 @@ secondary_reload_class (enum reg_class c
         return BASE_REGS;
     }
 
+  /* A 64-bit gpr load or store using an offset that isn't a multiple of
+     four needs a secondary reload.  */
+  if (TARGET_POWERPC64
+      && GET_CODE (in) == MEM
+      && GET_MODE_UNIT_SIZE (mode) >= 8
+      && (!inp || class != BASE_REGS)
+      && !word_offset_memref_operand (in, mode))
+    return BASE_REGS;
+
   if (GET_CODE (in) == REG)
     {
       regno = REGNO (in);
diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000.h gcc-reload/gcc/config/rs6000/rs6000.h
--- gcc-mainline/gcc/config/rs6000/rs6000.h	2004-02-27 11:10:19.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000.h	2004-02-27 13:14:09.000000000 +1030
@@ -1474,12 +1474,22 @@ enum reg_class
     ? GENERAL_REGS					\
     : (CLASS)))
 
+#define DISPARAGE_RELOAD_CLASS(X, CLASS)			\
+  (GET_CODE (X) == REG						\
+   && REGNO (X) < FIRST_PSEUDO_REGISTER				\
+   && SECONDARY_MEMORY_NEEDED (GET_MODE_CLASS (GET_MODE (X)),	\
+			       CLASS, GET_MODE (X))		\
+   ? 6 : 0)
+
 /* Return the register class of a scratch register needed to copy IN into
    or out of a register in CLASS in MODE.  If it can be done directly,
    NO_REGS is returned.  */
 
-#define SECONDARY_RELOAD_CLASS(CLASS,MODE,IN) \
-  secondary_reload_class (CLASS, MODE, IN)
+#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+  secondary_reload_class (CLASS, MODE, IN, 1)
+
+#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+  secondary_reload_class (CLASS, MODE, IN, 0)
 
 /* If we are copying between FP or AltiVec registers and anything
    else, we need a memory location.  */
@@ -2654,6 +2664,8 @@ extern char rs6000_reg_names[][8];	/* re
   {"lwa_operand", {SUBREG, MEM, REG}},					   \
   {"volatile_mem_operand", {MEM}},					   \
   {"offsettable_mem_operand", {MEM}},					   \
+  {"invalid_gpr_mem", {MEM}},						   \
+  {"base_reg_operand", {REG}},						   \
   {"mem_or_easy_const_operand", {SUBREG, MEM, CONST_DOUBLE}},		   \
   {"add_operand", {SUBREG, REG, CONST_INT}},				   \
   {"non_add_cint_operand", {CONST_INT}},				   \
diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000.md gcc-reload/gcc/config/rs6000/rs6000.md
--- gcc-mainline/gcc/config/rs6000/rs6000.md	2004-02-27 11:10:19.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000.md	2004-02-27 13:32:11.000000000 +1030
@@ -8151,14 +8151,15 @@
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*movdf_hardfloat64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,f,f,m,!cl,!r,!h,!r,!r,!r")
-	(match_operand:DF 1 "input_operand" "r,Y,r,f,m,f,r,h,0,G,H,F"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=r,r,Y,!r,f,f,m,!cl,!r,!h,!r,!r,!r")
+	(match_operand:DF 1 "input_operand" "Y,m,r,r,f,m,f,r,h,0,G,H,F"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS
    && (gpc_reg_operand (operands[0], DFmode)
        || gpc_reg_operand (operands[1], DFmode))"
   "@
-   std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
+   #
+   std%U0%X0 %1,%0
    mr %0,%1
    fmr %0,%1
    lfd%U1%X1 %0,%1
@@ -8169,8 +8170,21 @@
    #
    #
    #"
-  [(set_attr "type" "*,load,store,fp,fpload,fpstore,mtjmpr,*,*,*,*,*")
-   (set_attr "length" "4,4,4,4,4,4,4,4,4,8,12,16")])
+  [(set_attr "type" "load,load,store,*,fp,fpload,fpstore,mtjmpr,*,*,*,*,*")
+   (set_attr "length" "4,8,4,4,4,4,4,4,4,4,8,12,16")])
+
+(define_split
+  [(set (match_operand:DF 0 "base_reg_operand" "")
+	(match_operand:DF 1 "invalid_gpr_mem" ""))]
+  "TARGET_POWERPC64 && no_new_pseudos"
+  [(set (match_dup 3) (match_dup 2))
+   (set (match_dup 0) (match_dup 4))]
+  "
+{
+  operands[2] = XEXP (operands[1], 0);
+  operands[3] = gen_rtx_REG (DImode, REGNO (operands[0]));
+  operands[4] = replace_equiv_address (operands[1], operands[3]);
+}")
 
 (define_insn "*movdf_softfloat64"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=r,Y,r,cl,r,r,r,r,*h")
@@ -8488,15 +8502,16 @@
 }")
 
 (define_insn "*movdi_internal64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,m,r,r,r,r,?f,f,m,r,*h,*h")
-	(match_operand:DI 1 "input_operand" "r,m,r,I,L,nF,R,f,m,f,*h,r,0"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,Y,r,r,r,r,r,?f,f,m,r,*h,*h")
+	(match_operand:DI 1 "input_operand" "Y,m,r,r,I,L,nF,R,f,m,f,*h,r,0"))]
   "TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
   "@
-   mr %0,%1
    ld%U1%X1 %0,%1
+   #
    std%U0%X0 %1,%0
+   mr %0,%1
    li %0,%1
    lis %0,%v1
    #
@@ -8507,8 +8522,20 @@
    mf%1 %0
    mt%0 %1
    {cror 0,0,0|nop}"
-  [(set_attr "type" "*,load,store,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
-   (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4")])
+  [(set_attr "type" "load,load,store,*,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
+   (set_attr "length" "4,8,4,4,4,4,20,4,4,4,4,4,4,4")])
+
+(define_split
+  [(set (match_operand:DI 0 "base_reg_operand" "")
+	(match_operand:DI 1 "invalid_gpr_mem" ""))]
+  "TARGET_POWERPC64 && no_new_pseudos"
+  [(set (match_dup 0) (match_dup 2))
+   (set (match_dup 0) (match_dup 3))]
+  "
+{
+  operands[2] = XEXP (operands[1], 0);
+  operands[3] = replace_equiv_address (operands[1], operands[0]);
+}")
 
 ;; immediate value valid for a single instruction hiding in a const_double
 (define_insn ""
diff -urp -xCVS -x'*~' gcc-mainline/gcc/reload.c gcc-reload/gcc/reload.c
--- gcc-mainline/gcc/reload.c	2004-02-26 13:46:23.000000000 +1030
+++ gcc-reload/gcc/reload.c	2004-02-27 13:14:08.000000000 +1030
@@ -3414,6 +3414,12 @@ find_reloads (rtx insn, int replace, int
 		       && ! const_to_mem)
 		bad = 1;
 
+#ifdef DISPARAGE_RELOAD_CLASS
+	      reject
+		+= DISPARAGE_RELOAD_CLASS (operand,
+					   (enum reg_class) this_alternative[i]);
+#endif
+
 	      /* We prefer to reload pseudos over reloading other things,
 		 since such reloads may be able to be eliminated later.
 		 If we are reloading a SCRATCH, we won't be generating any

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]