This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH:[darwin] fix load of a misaligned double word


Differences from http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02583.html

- Fixed the cause of the linux regressions, which was that constraints
  didn't match the splitter.  ie. the split is now for an m->b
  alternative.
- Fixed the splitter to work with -m32 -mpowerpc64, where addresses are
  SImode.
- Added (modified versions of) Hartmut's reload_in and reload_out patterns.
- Tweaked condition under which secondary_reload_class says a reload is
  needed.
- The patch is generated against gcc-3.4 to make it easier to test there.

Tested with a (crippled) linux->darwin cross that the testcase in
PR13674 now compiles correctly at -O1 as well as -O0, -O2, -O3.  Also
tested with a larger testcase of Brad's.  Bootstrapped and regression
tested powerpc-linux and powerpc64-linux.

Geoff, I hope the following describes what is happening in reload well
enough to justify use of reload_out patterns (and by extension,
reload_in).


Consider the following instruction from Brad's large testcase.

(insn 3173 3172 3175 181 (set (mem:DF (plus:SI (reg/f:SI 1177)
                (const_int 3 [0x3])) [0 S8 A64])
        (reg/v:DF 133 [ ___F64V3 ])) 318 {*movdf_hardfloat64} (insn_list 3163 (insn_list 3172 (nil)))

Note that the address in the instruction is valid for reg 133 as a
pseudo or fpr, but not after reg 133 is assigned a gpr.  Thus, in the
early stage of reload's processing of this instruction, the
strict_memory_address_p check in find_reloads_address fails to detect
any addressing problem.  I believe this is what Ulrich was referring to
in http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02189.html

Unfortunately, reg 133 is assigned gpr0, and find_reloads_address isn't
called again after this assignment is made.  Instead, reload continues
on, and later rs6000.c:secondary_reload_class determines that a base reg
is needed to reload the address.  At around line 6621 (gcc-3.4 source)
reload1.c:emit_output_reload_insns does:

	  /* See if RELOADREG is to be used as a scratch register
	     or as an intermediate register.  */
	  if (rl->secondary_out_icode != CODE_FOR_nothing)
	    {
	      emit_insn ((GEN_FCN (rl->secondary_out_icode)
			  (real_old, second_reloadreg, reloadreg)));
	      special = 1;
	    }
	  else
	    {
	      /* Some irrelevant code here */
	      if (/* there is a tertiary reload */)
		{
		  /* more irrelevant code */	
		}

	      else
		/* Copy between the reload regs here and then to
		   OUT later.  */

		gen_reload (reloadreg, second_reloadreg,
			    rl->opnum, rl->when_needed);
	    }

Without the reload_out pattern, rl->secondary_out_icode will be
CODE_FOR_nothing, and we'll end up calling gen_reload above just to copy
between registers.  That's silly.  The secondary reload is marked as
RELOAD_FOR_OUTPUT_ADDRESS, yet it's used just to move the input value
around.  A little later, we call gen_reload again (line 6693)

(gdb) s
gen_reload (out=0x42eb83e4, in=0x430fc860, opnum=0, type=RELOAD_FOR_OUTPUT) at /src/gcc-3.4/gcc/reload1.c:7291
(gdb) p debug_rtx (out)
(mem:DF (plus:SI (reg/f:SI 9 r9 [1177])
        (const_int 3 [0x3])) [0 S8 A64])
$20 = void
(gdb) p debug_rtx (in)
(reg:DF 2 r2)
$21 = void

And again, gen_reload doesn't have the intelligence to recognize that
the reload is needed for the address.  Instead, it just copies "in" to
"out".  ie. we're left with the following

(gdb) p debug_rtx_list (get_insns(), 99)
(insn 12880 0 12881 (set (reg:DF 2 r2)
        (reg:DF 0 r0)) -1 (nil)
    (nil))

(insn 12881 12880 0 (set (mem:DF (plus:SI (reg/f:SI 9 r9 [1177])
                (const_int 3 [0x3])) [0 S8 A64])
        (reg:DF 2 r2)) -1 (nil)
    (nil))

Which of course has the same problem as our original instruction.

Now, it might be possible to teach emit_output_reload_insns and
gen_reload how to handle RELOAD_FOR_OUTPUT_ADDRESS reloads properly, but
IMHO it's much cleaner to use reload_out patterns.   It also might be
feasible to call find_reloads_address after assigning input regs, but
that might slow down gcc for everybody.

	* config/rs6000/rs6000.c (invalid_gpr_mem): New function.
	(base_reg_operand): New function.
	(legitimate_offset_address_p): Don't test modes in an attempt to
	distinguish gpr vs fpr mem loads/stores.  Don't prohibit offsets
	invalid for 64-bit gpr loads/stores here.  Correct offset range check.
	(secondary_reload_class): Add "inp" parameter.	Generate a reload
	for 64-bit gpr loads/stores.
	* config/rs6000/rs6000.h (SECONDARY_RELOAD_CLASS): Delete.
	(SECONDARY_INPUT_RELOAD_CLASS, SECONDARY_OUTPUT_RELOAD_CLASS): Define.
	(PREDICATE_CODES): Add invalid_gpr_mem and base_reg_operand.
	(DISPARAGE_RELOAD_CLASS): Define.
	* config/rs6000/rs6000-protos.h (secondary_reload_class): Update.
	* config/rs6000/rs6000.md (movdf_hardfloat64): Correct attrs.
	Add m->b alternative and split.
	(movdi_internal64): Replace r->m and m->r with r->Y and Y->r.
	Add m->b alternative and split.
	* reload.c (find_reloads): Invoke DISPARAGE_RELOAD_CLASS.

Index: gcc/config/rs6000/rs6000.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v
retrieving revision 1.576.2.8
diff -u -p -r1.576.2.8 rs6000.c
--- gcc/config/rs6000/rs6000.c	12 Feb 2004 10:19:28 -0000	1.576.2.8
+++ gcc/config/rs6000/rs6000.c	4 Mar 2004 11:06:15 -0000
@@ -2409,6 +2398,43 @@ word_offset_memref_operand (rtx op, enum
   return (off % 4) == 0;
 }
 
+/* Return true if operand is a (MEM (PLUS (REG) (offset))) where offset
+   is not divisible by four.  */
+
+int
+invalid_gpr_mem (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  rtx addr;
+  long off;
+
+  if (GET_CODE (op) != MEM)
+    return 0;
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) != PLUS
+      || GET_CODE (XEXP (addr, 0)) != REG
+      || GET_CODE (XEXP (addr, 1)) != CONST_INT)
+    return 0;
+
+  off = INTVAL (XEXP (addr, 1));
+  return (off & 3) != 0;
+}
+
+/* Return true if operand is a hard register that can be used as a base
+   register.  */
+
+int
+base_reg_operand (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  unsigned int regno;
+
+  if (!REG_P (op))
+    return 0;
+
+  regno = REGNO (op);
+  return regno != 0 && regno <= 31;
+}
+
 /* Return true if either operand is a general purpose register.  */
 
 bool
@@ -2533,18 +2559,16 @@ legitimate_offset_address_p (enum machin
 
     case DFmode:
     case DImode:
-      if (mode == DFmode || !TARGET_POWERPC64)
+      /* Both DFmode and DImode may end up in gprs.  If gprs are 32-bit,
+	 then we need to load/store at both offset and offset+4.  */
+      if (!TARGET_POWERPC64)
 	extra = 4;
-      else if (offset & 3)
-	return false;
       break;
 
     case TFmode:
     case TImode:
-      if (mode == TFmode || !TARGET_POWERPC64)
+      if (!TARGET_POWERPC64)
 	extra = 12;
-      else if (offset & 3)
-	return false;
       else
 	extra = 8;
       break;
@@ -2553,7 +2577,8 @@ legitimate_offset_address_p (enum machin
       break;
     }
 
-  return (offset + extra >= offset) && (offset + extra + 0x8000 < 0x10000);
+  offset += 0x8000;
+  return (offset < 0x10000) && (offset + extra < 0x10000);
 }
 
 static bool
@@ -8376,12 +8501,14 @@ addrs_ok_for_quad_peep (rtx addr1, rtx a
 
 /* Return the register class of a scratch register needed to copy IN into
    or out of a register in CLASS in MODE.  If it can be done directly,
-   NO_REGS is returned.  */
+   NO_REGS is returned.  INP is nonzero if we are loading the reg, zero
+   for storing.  */
 
 enum reg_class
-secondary_reload_class (enum reg_class class, 
-			enum machine_mode mode ATTRIBUTE_UNUSED,
-			rtx in)
+secondary_reload_class (enum reg_class class,
+			enum machine_mode mode,
+			rtx in,
+			int inp)
 {
   int regno;
 
@@ -8406,6 +8533,14 @@ secondary_reload_class (enum reg_class c
         return BASE_REGS;
     }
 
+  /* A 64-bit gpr load or store using an offset that isn't a multiple of
+     four needs a secondary reload.  */
+  if (TARGET_POWERPC64
+      && GET_MODE_UNIT_SIZE (mode) >= 8
+      && (!inp || class != BASE_REGS)
+      && invalid_gpr_mem (in, mode))
+    return BASE_REGS;
+
   if (GET_CODE (in) == REG)
     {
       regno = REGNO (in);
Index: gcc/config/rs6000/rs6000.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.h,v
retrieving revision 1.306
diff -u -p -r1.306 rs6000.h
--- gcc/config/rs6000/rs6000.h	12 Jan 2004 08:11:28 -0000	1.306
+++ gcc/config/rs6000/rs6000.h	4 Mar 2004 11:06:18 -0000
@@ -1471,12 +1472,22 @@ enum reg_class
     ? GENERAL_REGS					\
     : (CLASS)))
 
+#define DISPARAGE_RELOAD_CLASS(X, CLASS)			\
+  (GET_CODE (X) == REG						\
+   && REGNO (X) < FIRST_PSEUDO_REGISTER				\
+   && SECONDARY_MEMORY_NEEDED (GET_MODE_CLASS (GET_MODE (X)),	\
+			       CLASS, GET_MODE (X))		\
+   ? 6 : 0)
+
 /* Return the register class of a scratch register needed to copy IN into
    or out of a register in CLASS in MODE.  If it can be done directly,
    NO_REGS is returned.  */
 
-#define SECONDARY_RELOAD_CLASS(CLASS,MODE,IN) \
-  secondary_reload_class (CLASS, MODE, IN)
+#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+  secondary_reload_class (CLASS, MODE, IN, 1)
+
+#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+  secondary_reload_class (CLASS, MODE, IN, 0)
 
 /* If we are copying between FP or AltiVec registers and anything
    else, we need a memory location.  */
@@ -2666,6 +2666,8 @@ extern char rs6000_reg_names[][8];	/* re
   {"lwa_operand", {SUBREG, MEM, REG}},					   \
   {"volatile_mem_operand", {MEM}},					   \
   {"offsettable_mem_operand", {MEM}},					   \
+  {"invalid_gpr_mem", {MEM}},						   \
+  {"base_reg_operand", {REG}},						   \
   {"mem_or_easy_const_operand", {SUBREG, MEM, CONST_DOUBLE}},		   \
   {"add_operand", {SUBREG, REG, CONST_INT}},				   \
   {"non_add_cint_operand", {CONST_INT}},				   \
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000-protos.h,v
retrieving revision 1.71.2.2
diff -u -p -r1.71.2.2 rs6000-protos.h
--- gcc/config/rs6000/rs6000-protos.h	18 Jan 2004 03:50:35 -0000	1.71.2.2
+++ gcc/config/rs6000/rs6000-protos.h	4 Mar 2004 11:06:04 -0000
@@ -104,7 +104,7 @@ extern int registers_ok_for_quad_peep (r
 extern int addrs_ok_for_quad_peep (rtx, rtx);
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern enum reg_class secondary_reload_class (enum reg_class,
-					      enum machine_mode, rtx);
+					      enum machine_mode, rtx, int);
 extern int ccr_bit (rtx, int);
 extern int extract_MB (rtx);
 extern int extract_ME (rtx);
Index: gcc/config/rs6000/rs6000.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.md,v
retrieving revision 1.284.4.6
diff -u -p -r1.284.4.6 rs6000.md
--- gcc/config/rs6000/rs6000.md	4 Mar 2004 09:54:21 -0000	1.284.4.6
+++ gcc/config/rs6000/rs6000.md	4 Mar 2004 11:06:22 -0000
@@ -8163,14 +8165,15 @@
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*movdf_hardfloat64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,f,f,m,!cl,!r,!r,!r,!r")
-	(match_operand:DF 1 "input_operand" "r,Y,r,f,m,f,r,h,G,H,F"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,b,!r,f,f,m,!cl,!r,!r,!r,!r")
+	(match_operand:DF 1 "input_operand" "r,Y,m,r,f,m,f,r,h,G,H,F"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS
    && (gpc_reg_operand (operands[0], DFmode)
        || gpc_reg_operand (operands[1], DFmode))"
   "@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
+   #
    mr %0,%1
    fmr %0,%1
    lfd%U1%X1 %0,%1
@@ -8180,8 +8183,49 @@
    #
    #
    #"
-  [(set_attr "type" "*,load,store,fp,fpload,fpstore,mtjmpr,*,*,*,*")
-   (set_attr "length" "4,4,4,4,4,4,4,4,8,12,16")])
+  [(set_attr "type" "store,load,load,*,fp,fpload,fpstore,mtjmpr,*,*,*,*")
+   (set_attr "length" "4,4,8,4,4,4,4,4,4,8,12,16")])
+
+(define_split
+  [(set (match_operand:DF 0 "base_reg_operand" "")
+	(match_operand:DF 1 "invalid_gpr_mem" ""))]
+  "TARGET_POWERPC64 && no_new_pseudos"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0) (match_dup 4))]
+  "
+{
+  operands[2] = gen_rtx_REG (Pmode, REGNO (operands[0]));
+  operands[3] = XEXP (operands[1], 0);
+  operands[4] = replace_equiv_address (operands[1], operands[2]);
+}")
+
+(define_expand "reload_outdf"
+  [(parallel [(match_operand:DF 0 "invalid_gpr_mem" "")
+	      (match_operand:DF 1 "register_operand" "")
+	      (match_operand:DI 2 "register_operand" "=&b")])]
+  "TARGET_POWERPC64"
+{
+  if (!TARGET_64BIT)
+    operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+  emit_move_insn (operands[2], XEXP (operands[0], 0));
+  operands[0] = replace_equiv_address (operands[0], operands[2]);
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "reload_indf"
+  [(parallel [(match_operand:DF 0 "register_operand" "")
+	      (match_operand:DF 1 "invalid_gpr_mem" "")
+	      (match_operand:DI 2 "register_operand" "=&b")])]
+  "TARGET_POWERPC64"
+{
+  if (!TARGET_64BIT)
+    operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+  emit_move_insn (operands[2], XEXP (operands[1], 0));
+  operands[1] = replace_equiv_address (operands[1], operands[2]);
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+})
 
 (define_insn "*movdf_softfloat64"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=r,Y,r,cl,r,r,r,r,*h")
@@ -8504,15 +8548,16 @@
 }")
 
 (define_insn "*movdi_internal64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,m,r,r,r,r,?f,f,m,r,*h,*h")
-	(match_operand:DI 1 "input_operand" "r,m,r,I,L,nF,R,f,m,f,*h,r,0"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,b,r,r,r,r,r,?f,f,m,r,*h,*h")
+	(match_operand:DI 1 "input_operand" "r,Y,m,r,I,L,nF,R,f,m,f,*h,r,0"))]
   "TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
   "@
-   mr %0,%1
-   ld%U1%X1 %0,%1
    std%U0%X0 %1,%0
+   ld%U1%X1 %0,%1
+   #
+   mr %0,%1
    li %0,%1
    lis %0,%v1
    #
@@ -8523,8 +8568,51 @@
    mf%1 %0
    mt%0 %1
    {cror 0,0,0|nop}"
-  [(set_attr "type" "*,load,store,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
-   (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4")])
+  [(set_attr "type" "store,load,load,*,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
+   (set_attr "length" "4,4,8,4,4,4,20,4,4,4,4,4,4,4")])
+
+(define_split
+  [(set (match_operand:DI 0 "base_reg_operand" "")
+	(match_operand:DI 1 "invalid_gpr_mem" ""))]
+  "TARGET_POWERPC64 && no_new_pseudos"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0) (match_dup 4))]
+  "
+{
+  operands[2] = operands[0];
+  if (!TARGET_64BIT)
+    operands[2] = gen_rtx_REG (SImode, REGNO (operands[0]));
+  operands[3] = XEXP (operands[1], 0);
+  operands[4] = replace_equiv_address (operands[1], operands[2]);
+}")
+
+(define_expand "reload_outdi"
+  [(parallel [(match_operand:DI 0 "invalid_gpr_mem" "")
+              (match_operand:DI 1 "register_operand" "")
+              (match_operand:DI 2 "register_operand" "=&b")])]
+  "TARGET_POWERPC64"
+{
+  if (!TARGET_64BIT)
+    operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+  emit_move_insn (operands[2], XEXP (operands[0], 0));
+  operands[0] = replace_equiv_address (operands[0], operands[2]);
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "reload_indi"
+  [(parallel [(match_operand:DI 0 "register_operand" "")
+	      (match_operand:DI 1 "invalid_gpr_mem" "")
+	      (match_operand:DI 2 "register_operand" "=&b")])]
+  "TARGET_POWERPC64"
+{
+  if (!TARGET_64BIT)
+    operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+  emit_move_insn (operands[2], XEXP (operands[1], 0));
+  operands[1] = replace_equiv_address (operands[1], operands[2]);
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+})
 
 ;; immediate value valid for a single instruction hiding in a const_double
 (define_insn ""
Index: gcc/reload.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reload.c,v
retrieving revision 1.229.4.4
diff -u -p -r1.229.4.4 reload.c
--- gcc/reload.c	12 Feb 2004 16:07:43 -0000	1.229.4.4
+++ gcc/reload.c	4 Mar 2004 11:04:48 -0000
@@ -3412,6 +3412,12 @@ find_reloads (rtx insn, int replace, int
 		       && ! const_to_mem)
 		bad = 1;
 
+#ifdef DISPARAGE_RELOAD_CLASS
+	      reject
+		+= DISPARAGE_RELOAD_CLASS (operand,
+					   (enum reg_class) this_alternative[i]);
+#endif
+
 	      /* We prefer to reload pseudos over reloading other things,
 		 since such reloads may be able to be eliminated later.
 		 If we are reloading a SCRATCH, we won't be generating any

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]