This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH:[darwin] fix load of a misaligned double word
- From: Alan Modra <amodra at bigpond dot net dot au>
- To: gcc-patches at gcc dot gnu dot org, Geoff Keating <geoffk at geoffk dot org>,David Edelsohn <dje at watson dot ibm dot com>
- Cc: lucier at math dot purdue dot edu,Ulrich Weigand <Ulrich dot Weigand at de dot ibm dot com>,Hartmut Penner <HPENNER at de dot ibm dot com>, fjahanian at apple dot com,pinskia at physics dot uc dot edu
- Date: Fri, 27 Feb 2004 17:12:16 +1030
- Subject: Re: PATCH:[darwin] fix load of a misaligned double word
- References: <OF70C30FAC.BBBD64EA-ONC1256E44.0020862B@de.ibm.com> <20040227020412.GL795@bubble.modra.org>
Here's the solution I found.
- Don't use mode in legitimate_offset_address_p to distinguish between
DFmode and DImode mem accesses. Either mode might end up in a 64-bit
gpr. That means we have to allow "(offset & 3) != 0" addresses here.
- Generate a reload in rs6000.c:secondary_reload_class for the ones
that are invalid.
- Use 'Y' constraints in both movdf_hardfloat64 and movdi_internal64.
That's about all that is needed to ensure we generate correct code.
However, some extra tweaking is needed for optimal code. Firstly, I
added a splitter to handle the m->r load since if "r" is a base
register, we can reuse "r' to generate a valid address. This helps
reduce the number of reloads needed. We can't do this for an r->m
store of course, so that meant using SECONDARY_INPUT_RELOAD_CLASS and
SECONDARY_OUTPUT_RELOAD_CLASS. invalid_gpr_mem and base_reg_operand
are support functions for the splitter.
Secondly, I found I needed some tweaks in the generic part of reload.
Why? Well, consider an instruction pattern like
(set (mem/s:DI (plus:DI (reg:DI 3) (const_int 3))) (reg:DI 120))
This is one that has an offset that is invalid for a gpr store.
When find_reloads evaluates the alternatives available in
movdi_internal64, it will find two that are reasonable to use, the
r->Y store and the f->m store. Both need a reload. The former is a
"loser" for 'Y', and the latter a "loser" for 'f'. That scores 6 points
against each. Additionally, both alternative gain an extra two points
here:
reload.c:3431
if (! (GET_CODE (operand) == REG
&& REGNO (operand) >= FIRST_PSEUDO_REGISTER)
&& GET_CODE (operand) != SCRATCH
&& ! (const_to_mem && constmemok))
reject += 2;
What goes wrong now is that the r->Y store gains an extra point against
it because it needs an output reload. The f->m store has no further
black marks. Thus reload prefers the f->m store even though in practice
it is much worse. Copying r->f requires a secondary memory reload.
So, we want to encourage reload to avoid the r->f input reload. You
could do that by disparaging the f->m alternative with a '?', but that
would make reload prefer the r->Y store when storing a floating point
register, and you'd get a secondart memory copy for that case. Thus,
I think there is a need for DISPARAGE_RELOAD_CLASS. I chose a value
of 6 somewhat arbitrarily; It's equivalent to an extra "loser" since
the secondary memory reload requires both a mem store and a mem load.
Here are the results for the testcase in
http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02569.html against current
mainline. No regressions, and an improvement in f6, f7, and f8. I'm
currently running a bootstrap and regression test on powerpc-linux and
powerpc64-linux.
--- floatice3.s.mainline 2004-02-27 12:01:59.000000000 +1030
+++ floatice3.s 2004-02-27 13:34:35.000000000 +1030
@@ -11,8 +11,8 @@
.type .f1,@function
.globl .f1
.f1:
- addi 3,3,3
li 0,1234
+ addi 3,3,3
std 0,0(3)
blr
.long 0
@@ -95,8 +95,8 @@
.type .f5,@function
.globl .f5
.f5:
- addi 3,3,3
- ld 4,0(3)
+ addi 4,3,3
+ ld 4,0(4)
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -112,8 +112,7 @@
.type .f6,@function
.globl .f6
.f6:
- addi 3,3,3
- lfd 1,0(3)
+ lfd 1,3(3)
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -129,9 +128,8 @@
.type .f7,@function
.globl .f7
.f7:
- std 4,-16(1)
- lfd 0,-16(1)
- stfd 0,3(3)
+ addi 3,3,3
+ std 4,0(3)
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -147,8 +145,7 @@
.type .f8,@function
.globl .f8
.f8:
- addi 3,3,3
- stfd 1,0(3)
+ stfd 0,3(3)
blr
.long 0
.byte 0,0,0,0,0,0,0,0
* config/rs6000/rs6000.c (invalid_gpr_mem): New function.
(base_reg_operand): New function.
(legitimate_offset_address_p): Don't test modes in an attempt to
distinguish gpr vs fpr mem loads/stores. Don't prohibit offsets
invalid for 64-bit gpr loads/stores here.
(secondary_reload_class): Add "inp" parameter. Generate a reload
for 64-bit gpr loads/stores.
* config/rs6000/rs6000.h (SECONDARY_RELOAD_CLASS): Delete.
(SECONDARY_INPUT_RELOAD_CLASS, SECONDARY_OUTPUT_RELOAD_CLASS): Define.
(PREDICATE_CODES): Add invalid_gpr_mem and base_reg_operand.
(DISPARAGE_RELOAD_CLASS): Define.
* config/rs6000/rs6000.md (movdf_hardfloat64): Correct attrs.
Add m->r alternative and split.
(movdi_internal64): Add m->r alternative and split.
* reload.c (find_reloads): Invoke DISPARAGE_RELOAD_CLASS.
diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000-protos.h gcc-reload/gcc/config/rs6000/rs6000-protos.h
--- gcc-mainline/gcc/config/rs6000/rs6000-protos.h 2004-02-06 16:05:56.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000-protos.h 2004-02-27 13:14:09.000000000 +1030
@@ -104,7 +104,7 @@ extern int registers_ok_for_quad_peep (r
extern int addrs_ok_for_quad_peep (rtx, rtx);
extern bool gpr_or_gpr_p (rtx, rtx);
extern enum reg_class secondary_reload_class (enum reg_class,
- enum machine_mode, rtx);
+ enum machine_mode, rtx, int);
extern int ccr_bit (rtx, int);
extern int extract_MB (rtx);
extern int extract_ME (rtx);
diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000.c gcc-reload/gcc/config/rs6000/rs6000.c
--- gcc-mainline/gcc/config/rs6000/rs6000.c 2004-02-27 11:10:19.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000.c 2004-02-27 13:14:09.000000000 +1030
@@ -2420,6 +2420,43 @@ word_offset_memref_operand (rtx op, enum
return (off % 4) == 0;
}
+/* Return true if operand is a (MEM (PLUS (REG) (offset))) where offset
+ is not divisible by four. */
+
+int
+invalid_gpr_mem (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+ rtx addr;
+ long off;
+
+ if (GET_CODE (op) != MEM)
+ return 0;
+
+ addr = XEXP (op, 0);
+ if (GET_CODE (addr) != PLUS
+ || GET_CODE (XEXP (addr, 0)) != REG
+ || GET_CODE (XEXP (addr, 1)) != CONST_INT)
+ return 0;
+
+ off = INTVAL (XEXP (addr, 1));
+ return (off & 3) != 0;
+}
+
+/* Return true if operand is a hard register that can be used as a base
+ register. */
+
+int
+base_reg_operand (rtx op, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+ unsigned int regno;
+
+ if (!REG_P (op))
+ return 0;
+
+ regno = REGNO (op);
+ return regno != 0 && regno <= 31;
+}
+
/* Return true if either operand is a general purpose register. */
bool
@@ -2544,18 +2581,14 @@ legitimate_offset_address_p (enum machin
case DFmode:
case DImode:
- if (mode == DFmode || !TARGET_POWERPC64)
+ if (!TARGET_POWERPC64)
extra = 4;
- else if (offset & 3)
- return false;
break;
case TFmode:
case TImode:
- if (mode == TFmode || !TARGET_POWERPC64)
+ if (!TARGET_POWERPC64)
extra = 12;
- else if (offset & 3)
- return false;
else
extra = 8;
break;
@@ -8579,12 +8612,14 @@ addrs_ok_for_quad_peep (rtx addr1, rtx a
/* Return the register class of a scratch register needed to copy IN into
or out of a register in CLASS in MODE. If it can be done directly,
- NO_REGS is returned. */
+ NO_REGS is returned. INP is nonzero if we are loading the reg, zero
+ for storing. */
enum reg_class
-secondary_reload_class (enum reg_class class,
- enum machine_mode mode ATTRIBUTE_UNUSED,
- rtx in)
+secondary_reload_class (enum reg_class class,
+ enum machine_mode mode,
+ rtx in,
+ int inp)
{
int regno;
@@ -8609,6 +8644,15 @@ secondary_reload_class (enum reg_class c
return BASE_REGS;
}
+ /* A 64-bit gpr load or store using an offset that isn't a multiple of
+ four needs a secondary reload. */
+ if (TARGET_POWERPC64
+ && GET_CODE (in) == MEM
+ && GET_MODE_UNIT_SIZE (mode) >= 8
+ && (!inp || class != BASE_REGS)
+ && !word_offset_memref_operand (in, mode))
+ return BASE_REGS;
+
if (GET_CODE (in) == REG)
{
regno = REGNO (in);
diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000.h gcc-reload/gcc/config/rs6000/rs6000.h
--- gcc-mainline/gcc/config/rs6000/rs6000.h 2004-02-27 11:10:19.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000.h 2004-02-27 13:14:09.000000000 +1030
@@ -1474,12 +1474,22 @@ enum reg_class
? GENERAL_REGS \
: (CLASS)))
+#define DISPARAGE_RELOAD_CLASS(X, CLASS) \
+ (GET_CODE (X) == REG \
+ && REGNO (X) < FIRST_PSEUDO_REGISTER \
+ && SECONDARY_MEMORY_NEEDED (GET_MODE_CLASS (GET_MODE (X)), \
+ CLASS, GET_MODE (X)) \
+ ? 6 : 0)
+
/* Return the register class of a scratch register needed to copy IN into
or out of a register in CLASS in MODE. If it can be done directly,
NO_REGS is returned. */
-#define SECONDARY_RELOAD_CLASS(CLASS,MODE,IN) \
- secondary_reload_class (CLASS, MODE, IN)
+#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+ secondary_reload_class (CLASS, MODE, IN, 1)
+
+#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, IN) \
+ secondary_reload_class (CLASS, MODE, IN, 0)
/* If we are copying between FP or AltiVec registers and anything
else, we need a memory location. */
@@ -2654,6 +2664,8 @@ extern char rs6000_reg_names[][8]; /* re
{"lwa_operand", {SUBREG, MEM, REG}}, \
{"volatile_mem_operand", {MEM}}, \
{"offsettable_mem_operand", {MEM}}, \
+ {"invalid_gpr_mem", {MEM}}, \
+ {"base_reg_operand", {REG}}, \
{"mem_or_easy_const_operand", {SUBREG, MEM, CONST_DOUBLE}}, \
{"add_operand", {SUBREG, REG, CONST_INT}}, \
{"non_add_cint_operand", {CONST_INT}}, \
diff -urp -xCVS -x'*~' gcc-mainline/gcc/config/rs6000/rs6000.md gcc-reload/gcc/config/rs6000/rs6000.md
--- gcc-mainline/gcc/config/rs6000/rs6000.md 2004-02-27 11:10:19.000000000 +1030
+++ gcc-reload/gcc/config/rs6000/rs6000.md 2004-02-27 13:32:11.000000000 +1030
@@ -8151,14 +8151,15 @@
; ld/std require word-aligned displacements -> 'Y' constraint.
; List Y->r and r->Y before r->r for reload.
(define_insn "*movdf_hardfloat64"
- [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,f,f,m,!cl,!r,!h,!r,!r,!r")
- (match_operand:DF 1 "input_operand" "r,Y,r,f,m,f,r,h,0,G,H,F"))]
+ [(set (match_operand:DF 0 "nonimmediate_operand" "=r,r,Y,!r,f,f,m,!cl,!r,!h,!r,!r,!r")
+ (match_operand:DF 1 "input_operand" "Y,m,r,r,f,m,f,r,h,0,G,H,F"))]
"TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS
&& (gpc_reg_operand (operands[0], DFmode)
|| gpc_reg_operand (operands[1], DFmode))"
"@
- std%U0%X0 %1,%0
ld%U1%X1 %0,%1
+ #
+ std%U0%X0 %1,%0
mr %0,%1
fmr %0,%1
lfd%U1%X1 %0,%1
@@ -8169,8 +8170,21 @@
#
#
#"
- [(set_attr "type" "*,load,store,fp,fpload,fpstore,mtjmpr,*,*,*,*,*")
- (set_attr "length" "4,4,4,4,4,4,4,4,4,8,12,16")])
+ [(set_attr "type" "load,load,store,*,fp,fpload,fpstore,mtjmpr,*,*,*,*,*")
+ (set_attr "length" "4,8,4,4,4,4,4,4,4,4,8,12,16")])
+
+(define_split
+ [(set (match_operand:DF 0 "base_reg_operand" "")
+ (match_operand:DF 1 "invalid_gpr_mem" ""))]
+ "TARGET_POWERPC64 && no_new_pseudos"
+ [(set (match_dup 3) (match_dup 2))
+ (set (match_dup 0) (match_dup 4))]
+ "
+{
+ operands[2] = XEXP (operands[1], 0);
+ operands[3] = gen_rtx_REG (DImode, REGNO (operands[0]));
+ operands[4] = replace_equiv_address (operands[1], operands[3]);
+}")
(define_insn "*movdf_softfloat64"
[(set (match_operand:DF 0 "nonimmediate_operand" "=r,Y,r,cl,r,r,r,r,*h")
@@ -8488,15 +8502,16 @@
}")
(define_insn "*movdi_internal64"
- [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,m,r,r,r,r,?f,f,m,r,*h,*h")
- (match_operand:DI 1 "input_operand" "r,m,r,I,L,nF,R,f,m,f,*h,r,0"))]
+ [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,Y,r,r,r,r,r,?f,f,m,r,*h,*h")
+ (match_operand:DI 1 "input_operand" "Y,m,r,r,I,L,nF,R,f,m,f,*h,r,0"))]
"TARGET_POWERPC64
&& (gpc_reg_operand (operands[0], DImode)
|| gpc_reg_operand (operands[1], DImode))"
"@
- mr %0,%1
ld%U1%X1 %0,%1
+ #
std%U0%X0 %1,%0
+ mr %0,%1
li %0,%1
lis %0,%v1
#
@@ -8507,8 +8522,20 @@
mf%1 %0
mt%0 %1
{cror 0,0,0|nop}"
- [(set_attr "type" "*,load,store,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
- (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4")])
+ [(set_attr "type" "load,load,store,*,*,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*")
+ (set_attr "length" "4,8,4,4,4,4,20,4,4,4,4,4,4,4")])
+
+(define_split
+ [(set (match_operand:DI 0 "base_reg_operand" "")
+ (match_operand:DI 1 "invalid_gpr_mem" ""))]
+ "TARGET_POWERPC64 && no_new_pseudos"
+ [(set (match_dup 0) (match_dup 2))
+ (set (match_dup 0) (match_dup 3))]
+ "
+{
+ operands[2] = XEXP (operands[1], 0);
+ operands[3] = replace_equiv_address (operands[1], operands[0]);
+}")
;; immediate value valid for a single instruction hiding in a const_double
(define_insn ""
diff -urp -xCVS -x'*~' gcc-mainline/gcc/reload.c gcc-reload/gcc/reload.c
--- gcc-mainline/gcc/reload.c 2004-02-26 13:46:23.000000000 +1030
+++ gcc-reload/gcc/reload.c 2004-02-27 13:14:08.000000000 +1030
@@ -3414,6 +3414,12 @@ find_reloads (rtx insn, int replace, int
&& ! const_to_mem)
bad = 1;
+#ifdef DISPARAGE_RELOAD_CLASS
+ reject
+ += DISPARAGE_RELOAD_CLASS (operand,
+ (enum reg_class) this_alternative[i]);
+#endif
+
/* We prefer to reload pseudos over reloading other things,
since such reloads may be able to be eliminated later.
If we are reloading a SCRATCH, we won't be generating any
--
Alan Modra
IBM OzLabs - Linux Technology Centre