This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH:[darwin] fix load of a misaligned double word
- From: Alan Modra <amodra at bigpond dot net dot au>
- To: gcc-patches at gcc dot gnu dot org, Geoff Keating <geoffk at geoffk dot org>,David Edelsohn <dje at watson dot ibm dot com>, lucier at math dot purdue dot edu,Ulrich Weigand <Ulrich dot Weigand at de dot ibm dot com>,Hartmut Penner <HPENNER at de dot ibm dot com>, fjahanian at apple dot com,pinskia at physics dot uc dot edu
- Date: Fri, 5 Mar 2004 11:56:19 +1030
- Subject: Re: PATCH:[darwin] fix load of a misaligned double word
- References: <OF70C30FAC.BBBD64EA-ONC1256E44.0020862B@de.ibm.com> <20040227020412.GL795@bubble.modra.org> <20040227064216.GN795@bubble.modra.org> <20040227071609.GO795@bubble.modra.org>
Differences from http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02583.html
- Fixed the cause of the linux regressions, which was that constraints
didn't match the splitter. ie. the split is now for an m->b
alternative.
- Fixed the splitter to work with -m32 -mpowerpc64, where addresses are
SImode.
- Added (modified versions of) Hartmut's reload_in and reload_out patterns.
- Tweaked condition under which secondary_reload_class says a reload is
needed.
- The patch is generated against gcc-3.4 to make it easier to test there.
Tested with a (crippled) linux->darwin cross that the testcase in
PR13674 now compiles correctly at -O1 as well as -O0, -O2, -O3. Also
tested with a larger testcase of Brad's. Bootstrapped and regression
tested powerpc-linux and powerpc64-linux.
Geoff, I hope the following describes what is happening in reload well
enough to justify use of reload_out patterns (and by extension,
reload_in).
Consider the following instruction from Brad's large testcase.
(insn 3173 3172 3175 181 (set (mem:DF (plus:SI (reg/f:SI 1177)
(const_int 3 [0x3])) [0 S8 A64])
(reg/v:DF 133 [ ___F64V3 ])) 318 {*movdf_hardfloat64} (insn_list 3163 (insn_list 3172 (nil)))
Note that the address in the instruction is valid for reg 133 as a
pseudo or fpr, but not after reg 133 is assigned a gpr. Thus, in the
early stage of reload's processing of this instruction, the
strict_memory_address_p check in find_reloads_address fails to detect
any addressing problem. I believe this is what Ulrich was referring to
in http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02189.html
Unfortunately, reg 133 is assigned gpr0, and find_reloads_address isn't
called again after this assignment is made. Instead, reload continues
on, and later rs6000.c:secondary_reload_class determines that a base reg
is needed to reload the address. At around line 6621 (gcc-3.4 source)
reload1.c:emit_output_reload_insns does:
/* See if RELOADREG is to be used as a scratch register
or as an intermediate register. */
if (rl->secondary_out_icode != CODE_FOR_nothing)
{
emit_insn ((GEN_FCN (rl->secondary_out_icode)
(real_old, second_reloadreg, reloadreg)));
special = 1;
}
else
{
/* Some irrelevant code here */
if (/* there is a tertiary reload */)
{
/* more irrelevant code */
}
else
/* Copy between the reload regs here and then to
OUT later. */
gen_reload (reloadreg, second_reloadreg,
rl->opnum, rl->when_needed);
}
Without the reload_out pattern, rl->secondary_out_icode will be
CODE_FOR_nothing, and we'll end up calling gen_reload above just to copy
between registers. That's silly. The secondary reload is marked as
RELOAD_FOR_OUTPUT_ADDRESS, yet it's used just to move the input value
around. A little later, we call gen_reload again (line 6693)
(gdb) s
gen_reload (out=0x42eb83e4, in=0x430fc860, opnum=0, type=RELOAD_FOR_OUTPUT) at /src/gcc-3.4/gcc/reload1.c:7291
(gdb) p debug_rtx (out)
(mem:DF (plus:SI (reg/f:SI 9 r9 [1177])
(const_int 3 [0x3])) [0 S8 A64])
$20 = void
(gdb) p debug_rtx (in)
(reg:DF 2 r2)
$21 = void
And again, gen_reload doesn't have the intelligence to recognize that
the reload is needed for the address. Instead, it just copies "in" to
"out". ie. we're left with the following
(gdb) p debug_rtx_list (get_insns(), 99)
(insn 12880 0 12881 (set (reg:DF 2 r2)
(reg:DF 0 r0)) -1 (nil)
(nil))
(insn 12881 12880 0 (set (mem:DF (plus:SI (reg/f:SI 9 r9 [1177])
(const_int 3 [0x3])) [0 S8 A64])
(reg:DF 2 r2)) -1 (nil)
(nil))
Which of course has the same problem as our original instruction.
Now, it might be possible to teach emit_output_reload_insns and
gen_reload how to handle RELOAD_FOR_OUTPUT_ADDRESS reloads properly, but
IMHO it's much cleaner to use reload_out patterns. It also might be
feasible to call find_reloads_address after assigning input regs, but
that might slow down gcc for everybody.
* config/rs6000/rs6000.c (invalid_gpr_mem): New function.
(base_reg_operand): New function.
(legitimate_offset_address_p): Don't test modes in an attempt to
distinguish gpr vs fpr mem loads/stores. Don't prohibit offsets
invalid for 64-bit gpr loads/stores here. Correct offset range check.
(secondary_reload_class): Add "inp" parameter. Generate a reload
for 64-bit gpr loads/stores.
* config/rs6000/rs6000.h (SECONDARY_RELOAD_CLASS): Delete.
(SECONDARY_INPUT_RELOAD_CLASS, SECONDARY_OUTPUT_RELOAD_CLASS): Define.
(PREDICATE_CODES): Add invalid_gpr_mem and base_reg_operand.
(DISPARAGE_RELOAD_CLASS): Define.
* config/rs6000/rs6000-protos.h (secondary_reload_class): Update.
* config/rs6000/rs6000.md (movdf_hardfloat64): Correct attrs.
Add m->b alternative and split.
(movdi_internal64): Replace r->m and m->r with r->Y and Y->r.
Add m->b alternative and split.
* reload.c (find_reloads): Invoke DISPARAGE_RELOAD_CLASS.
Index: gcc/config/rs6000/rs6000.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v
retrieving revision 1.576.2.8
diff -u -p -r1.576.2.8 rs6000.c
--- gcc/config/rs6000/rs6000.c 12 Feb 2004 10:19:28 -0000 1.576.2.8
+++ gcc/config/rs6000/rs6000.c 4 Mar 2004 11:06:15 -0000
@@ -2409,6 +2398,43 @@ word_offset_memref_operand (rtx op, enum
return (off % 4) == 0;
}
+/* Return true if operand is a (MEM (PLUS (REG) (offset))) where offset
+ is not divisible by four. */
+
+int
+invalid_gpr_mem (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+ rtx addr;
+ long off;
+
+ if (GET_CODE (op) != MEM)
+ return 0;
+
+ addr = XEXP (op, 0);
+ if (GET_CODE (addr) != PLUS
+ || GET_CODE (XEXP (addr, 0)) != REG
+ || GET_CODE (XEXP (addr, 1)) != CONST_INT)
+ return 0;
+
+ off = INTVAL (XEXP (addr, 1));
+ return (off & 3) != 0;
+}
+
+/* Return true if operand is a hard register that can be used as a base
+ register. */
+
+int
+base_reg_operand (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+ unsigned int regno;
+
+ if (!REG_P (op))
+ return 0;
+
+ regno = REGNO (op);
+ return regno != 0 && regno <= 31;
+}
+
/* Return true if either operand is a general purpose register. */
bool
@@ -2533,18 +2559,16 @@ legitimate_offset_address_p (enum machin
case DFmode:
case DImode:
- if (mode == DFmode || !TARGET_POWERPC64)
+ /* Both DFmode and DImode may end up in gprs. If gprs are 32-bit,
+ then we need to load/store at both offset and offset+4. */
+ if (!TARGET_POWERPC64)
extra = 4;
- else if (offset & 3)
- return false;
break;
case TFmode:
case TImode:
- if (mode == TFmode || !TARGET_POWERPC64)
+ if (!TARGET_POWERPC64)
extra = 12;
- else if (offset & 3)
- return false;
else
extra = 8;
break;
@@ -2553,7 +2577,8 @@ legitimate_offset_address_p (enum machin
break;
}
- return (offset + extra >= offset) && (offset + extra + 0x8000 < 0x10000);
+ offset += 0x8000;
+ return (offset < 0x10000) && (offset + extra < 0x10000);
}
static bool
@@ -8376,12 +8501,14 @@ addrs_ok_for_quad_peep (rtx addr1, rtx a
/* Return the register class of a scratch register needed to copy IN into
or out of a register in CLASS in MODE. If it can be done directly,
- NO_REGS is returned. */
+ NO_REGS is returned. INP is nonzero if we are loading the reg, zero
+ for storing. */
enum reg_class
-secondary_reload_class (enum reg_class class,
- enum machine_mode mode ATTRIBUTE_UNUSED,
- rtx in)
+secondary_reload_class (enum reg_class class,
+ enum machine_mode mode,
+ rtx in,
+ int inp)
{
int regno;
@@ -8406,6 +8533,14 @@ secondary_reload_class (enum reg_class c
return BASE_REGS;
}
+ /* A 64-bit gpr load or store using an offset that isn't a multiple of
+ four needs a secondary reload. */
+ if (TARGET_POWERPC64
+ && GET_MODE_UNIT_SIZE (mode) >= 8
+ && (!inp || class != BASE_REGS)
+ && invalid_gpr_mem (in, mode))
+ return BASE_REGS;
+
if (GET_CODE (in) == REG)
{
regno = REGNO (in);
Index: gcc/config/rs6000/rs6000.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.h,v
retrieving revision 1.306
diff -u -p -r1.306 rs6000.h
--- gcc/config/rs6000/rs6000.h 12 Jan 2004 08:11:28 -0000 1.306
+++ gcc/config/rs6000/rs6000.h 4 Mar 2004 11:06:18 -0000
@@ -1471,12 +1472,22 @@ enum reg_class
? GENERAL_REGS \
: (CLASS)))
+#define DISPARAGE_RELOAD_CLASS(X, CLASS) \
+ (GET_CODE (X) == REG \
+ && REGNO (X) < FIRST_PSEUDO_REGISTER \
+ && SECONDARY_MEMORY_NEEDED (GET_MODE_CLASS (GET_MODE (X)), \
+ CLASS, GET_MODE (X)) \
+ ? 6 : 0)
+
/* Return the register class of a scratch register needed to copy IN into
or out of a register in CLASS in MODE. If it can be done directly,
NO_REGS is returned. */
-#define SECONDARY_RELOAD_CLASS(CLASS,MODE,IN) \
- secondary_reload_class (CLASS, MODE, IN)
+#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+ secondary_reload_class (CLASS, MODE, IN, 1)
+
+#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+ secondary_reload_class (CLASS, MODE, IN, 0)
/* If we are copying between FP or AltiVec registers and anything
else, we need a memory location. */
@@ -2666,6 +2666,8 @@ extern char rs6000_reg_names[][8]; /* re
{"lwa_operand", {SUBREG, MEM, REG}}, \
{"volatile_mem_operand", {MEM}}, \
{"offsettable_mem_operand", {MEM}}, \
+ {"invalid_gpr_mem", {MEM}}, \
+ {"base_reg_operand", {REG}}, \
{"mem_or_easy_const_operand", {SUBREG, MEM, CONST_DOUBLE}}, \
{"add_operand", {SUBREG, REG, CONST_INT}}, \
{"non_add_cint_operand", {CONST_INT}}, \
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000-protos.h,v
retrieving revision 1.71.2.2
diff -u -p -r1.71.2.2 rs6000-protos.h
--- gcc/config/rs6000/rs6000-protos.h 18 Jan 2004 03:50:35 -0000 1.71.2.2
+++ gcc/config/rs6000/rs6000-protos.h 4 Mar 2004 11:06:04 -0000
@@ -104,7 +104,7 @@ extern int registers_ok_for_quad_peep (r
extern int addrs_ok_for_quad_peep (rtx, rtx);
extern bool gpr_or_gpr_p (rtx, rtx);
extern enum reg_class secondary_reload_class (enum reg_class,
- enum machine_mode, rtx);
+ enum machine_mode, rtx, int);
extern int ccr_bit (rtx, int);
extern int extract_MB (rtx);
extern int extract_ME (rtx);
Index: gcc/config/rs6000/rs6000.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.md,v
retrieving revision 1.284.4.6
diff -u -p -r1.284.4.6 rs6000.md
--- gcc/config/rs6000/rs6000.md 4 Mar 2004 09:54:21 -0000 1.284.4.6
+++ gcc/config/rs6000/rs6000.md 4 Mar 2004 11:06:22 -0000
@@ -8163,14 +8165,15 @@
; ld/std require word-aligned displacements -> 'Y' constraint.
; List Y->r and r->Y before r->r for reload.
(define_insn "*movdf_hardfloat64"
- [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,f,f,m,!cl,!r,!r,!r,!r")
- (match_operand:DF 1 "input_operand" "r,Y,r,f,m,f,r,h,G,H,F"))]
+ [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,b,!r,f,f,m,!cl,!r,!r,!r,!r")
+ (match_operand:DF 1 "input_operand" "r,Y,m,r,f,m,f,r,h,G,H,F"))]
"TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS
&& (gpc_reg_operand (operands[0], DFmode)
|| gpc_reg_operand (operands[1], DFmode))"
"@
std%U0%X0 %1,%0
ld%U1%X1 %0,%1
+ #
mr %0,%1
fmr %0,%1
lfd%U1%X1 %0,%1
@@ -8180,8 +8183,49 @@
#
#
#"
- [(set_attr "type" "*,load,store,fp,fpload,fpstore,mtjmpr,*,*,*,*")
- (set_attr "length" "4,4,4,4,4,4,4,4,8,12,16")])
+ [(set_attr "type" "store,load,load,*,fp,fpload,fpstore,mtjmpr,*,*,*,*")
+ (set_attr "length" "4,4,8,4,4,4,4,4,4,8,12,16")])
+
+(define_split
+ [(set (match_operand:DF 0 "base_reg_operand" "")
+ (match_operand:DF 1 "invalid_gpr_mem" ""))]
+ "TARGET_POWERPC64 && no_new_pseudos"
+ [(set (match_dup 2) (match_dup 3))
+ (set (match_dup 0) (match_dup 4))]
+ "
+{
+ operands[2] = gen_rtx_REG (Pmode, REGNO (operands[0]));
+ operands[3] = XEXP (operands[1], 0);
+ operands[4] = replace_equiv_address (operands[1], operands[2]);
+}")
+
+(define_expand "reload_outdf"
+ [(parallel [(match_operand:DF 0 "invalid_gpr_mem" "")
+ (match_operand:DF 1 "register_operand" "")
+ (match_operand:DI 2 "register_operand" "=&b")])]
+ "TARGET_POWERPC64"
+{
+ if (!TARGET_64BIT)
+ operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+ emit_move_insn (operands[2], XEXP (operands[0], 0));
+ operands[0] = replace_equiv_address (operands[0], operands[2]);
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+})
+
+(define_expand "reload_indf"
+ [(parallel [(match_operand:DF 0 "register_operand" "")
+ (match_operand:DF 1 "invalid_gpr_mem" "")
+ (match_operand:DI 2 "register_operand" "=&b")])]
+ "TARGET_POWERPC64"
+{
+ if (!TARGET_64BIT)
+ operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+ emit_move_insn (operands[2], XEXP (operands[1], 0));
+ operands[1] = replace_equiv_address (operands[1], operands[2]);
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+})
(define_insn "*movdf_softfloat64"
[(set (match_operand:DF 0 "nonimmediate_operand" "=r,Y,r,cl,r,r,r,r,*h")
@@ -8504,15 +8548,16 @@
}")
(define_insn "*movdi_internal64"
- [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,m,r,r,r,r,?f,f,m,r,*h,*h")
- (match_operand:DI 1 "input_operand" "r,m,r,I,L,nF,R,f,m,f,*h,r,0"))]
+ [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,b,r,r,r,r,r,?f,f,m,r,*h,*h")
+ (match_operand:DI 1 "input_operand" "r,Y,m,r,I,L,nF,R,f,m,f,*h,r,0"))]
"TARGET_POWERPC64
&& (gpc_reg_operand (operands[0], DImode)
|| gpc_reg_operand (operands[1], DImode))"
"@
- mr %0,%1
- ld%U1%X1 %0,%1
std%U0%X0 %1,%0
+ ld%U1%X1 %0,%1
+ #
+ mr %0,%1
li %0,%1
lis %0,%v1
#
@@ -8523,8 +8568,51 @@
mf%1 %0
mt%0 %1
{cror 0,0,0|nop}"
- [(set_attr "type" "*,load,store,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
- (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4")])
+ [(set_attr "type" "store,load,load,*,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
+ (set_attr "length" "4,4,8,4,4,4,20,4,4,4,4,4,4,4")])
+
+(define_split
+ [(set (match_operand:DI 0 "base_reg_operand" "")
+ (match_operand:DI 1 "invalid_gpr_mem" ""))]
+ "TARGET_POWERPC64 && no_new_pseudos"
+ [(set (match_dup 2) (match_dup 3))
+ (set (match_dup 0) (match_dup 4))]
+ "
+{
+ operands[2] = operands[0];
+ if (!TARGET_64BIT)
+ operands[2] = gen_rtx_REG (SImode, REGNO (operands[0]));
+ operands[3] = XEXP (operands[1], 0);
+ operands[4] = replace_equiv_address (operands[1], operands[2]);
+}")
+
+(define_expand "reload_outdi"
+ [(parallel [(match_operand:DI 0 "invalid_gpr_mem" "")
+ (match_operand:DI 1 "register_operand" "")
+ (match_operand:DI 2 "register_operand" "=&b")])]
+ "TARGET_POWERPC64"
+{
+ if (!TARGET_64BIT)
+ operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+ emit_move_insn (operands[2], XEXP (operands[0], 0));
+ operands[0] = replace_equiv_address (operands[0], operands[2]);
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+})
+
+(define_expand "reload_indi"
+ [(parallel [(match_operand:DI 0 "register_operand" "")
+ (match_operand:DI 1 "invalid_gpr_mem" "")
+ (match_operand:DI 2 "register_operand" "=&b")])]
+ "TARGET_POWERPC64"
+{
+ if (!TARGET_64BIT)
+ operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+ emit_move_insn (operands[2], XEXP (operands[1], 0));
+ operands[1] = replace_equiv_address (operands[1], operands[2]);
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+})
;; immediate value valid for a single instruction hiding in a const_double
(define_insn ""
Index: gcc/reload.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reload.c,v
retrieving revision 1.229.4.4
diff -u -p -r1.229.4.4 reload.c
--- gcc/reload.c 12 Feb 2004 16:07:43 -0000 1.229.4.4
+++ gcc/reload.c 4 Mar 2004 11:04:48 -0000
@@ -3412,6 +3412,12 @@ find_reloads (rtx insn, int replace, int
&& ! const_to_mem)
bad = 1;
+#ifdef DISPARAGE_RELOAD_CLASS
+ reject
+ += DISPARAGE_RELOAD_CLASS (operand,
+ (enum reg_class) this_alternative[i]);
+#endif
+
/* We prefer to reload pseudos over reloading other things,
since such reloads may be able to be eliminated later.
If we are reloading a SCRATCH, we won't be generating any
--
Alan Modra
IBM OzLabs - Linux Technology Centre