This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH RFC: Fix ARM bug by splitting up iwmmxt_movsi_insn


On Fri, Oct 10, 2003 at 11:11:10AM +0100, Richard Earnshaw wrote:
> > This simple C++ program breaks the mainline C++ compiler when compiled
> > with
> >     -mcpu=iwmmxt -O2
> > 
> > inline const int&
> > min(const int& __a, const int& __b)
> > { if (__b < __a) return __b; return __a; }
> > int foo (const char* f, const char* e, int max)
> > { return min(max, e - f); }
> > 
> > The result is a crash with "could not split insn".
> > 
> > This happens because the ifcompare_plus_move insn requires a split,
> > but the condition for the corresponding define_split has
> > !TARGET_IWMMXT, with this comment:
> >     ;; Note we have to suppress this split for the iwmmxt because it
> >     ;; creates a conditional movsi and the iwmmxt_movsi_insn pattern
> >     ;; is not predicable.  This sucks.
> > 
> > I was unable to find a simple C test case, although I didn't look too
> > hard.  Note that even minor changes to the C++ test case cause it to
> > compile successfully.
> > 
> > The simple way out of this is to change ifcompare_plus_move and
> > friends to have a condition of !TARGET_IWMMXT.
> > 
> > However, it seems to me that the basic issue here may be wrong.
> > Currently arm_movsi_insn has a condition of !TARGET_IWMMXT, and on the
> > IWMMXT all moves are handled by iwmmxt_movsi_insn, which can not be
> > predicable because (according to the comments) the alternative to
> > store a IWMMXT general register to memory is not predicable.
> > 
> > But perhaps moves for ordinary registers can not continue to be
> > handled by arm_movsi_insn, which is predicable, while moves for IWMMXT
> > registers are handled by iwmmxt_movsi_insn.
> > 
> > That leads to the following patch.  This patch does not introduce any
> > new testsuite compilation failures.  However, most ports don't seem to
> > work this way.  In particular, it's possible that with the original
> > code, in some cases, reload will sometimes choose a register in
> > IWMMXT_GR_REGS, whereas with the patched code, it no longer will.  I'm
> > not sure, and I'd like to hear any feedback which others may have.
> > 
> > (This patch includes a patch to arm_regno_class() which is really a
> > simple bug fix.)
> 
> I'm concerned that the overall approach is wrong here.  In general you 
> can't have two patterns in the MD that match the same pattern.  The 
> compiler will just select the first and ignore the second.  Before 
> register allocation nothing will be using wmmx registers explicitly 
> (unless it's already matching a hard reg) so all the compiler will see 
> when selecting register classes is those classes available for a non wmmx 
> processor.
> 
> I think perhaps the best way to handle this is to force IF conversion to 
> fail by defining a macro that will force the conditionalization to fail 
> (for example, to make IFCVT_MODIFY_INSN return a NULL pattern for the 
> problem cases).  Another alternative is to provide the predicated version 
> of the move insn explicitly, and then to ensure that that version rejects 
> the impossible cases (the predicable attribute simply means that the code 
> generator will generate the predicated version of the pattern directly 
> from the non-predicated version).

Here's another alternative.  I am a bit behind; I did a lot of iWMMXt
work two months ago, and haven't had the chance to clean up, retest,
and submit the patches yet.  There are several other critical bugs,
however, so I'll try to clean them up before more of other people's
time is wasted.

I have _not_ tested this patch on its own, so it's entirely possible
some of my other fixes were needed because of it.  Take it with a grain
of salt or ask me for the rest.  Known issues offhand: stack alignment
violations, esp. involving sibcalls, which could corrupt saved regs;
and a subreg:V2SI buglet.

The basic approach is, we can predicate tmrc/tmcr, so use those instead
of wstr directly.

This patch obviously does in some cases hurt performance, but in other
cases it helps - particularly, it re-enables predicated moves. 
Reviewing it now I remember three real problems:

  - The use of CANNOT_CHANGE_MODE_CLASS is almost 100% guaranteed
    wrong.  I spent two days trying to find another way to avoid the
    resulting crash in reload, though, and couldn't.

  [The problem, IIRC: reload would see that we could store an SImode subreg
   of a DImode value in such a register, and generate something like
     (move:SI (reg:DI wr0) (subreg:SI (reg:DI) 4))
   that we were completely unprepared for.  At least I think that's
   what they looked like.  I don't have a testcase but something in
   the gcc testsuite did it.]

  - I meant to add a peephole or similar to reconstruct un-predicated
    wstr instructions, but never did.

  - Notice that I both fixed arm_output_load_gr and removed the only
    use of it.  I couldn't get leaving that case in the movsi to work,
    but I have completely forgotten what the problem was.  Again,
    something in reload.

Patch is against 3.3 but has not changed much; there's a trivial
conflict at CANNOT_CHANGE_MODE_CLASS.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

diff -Nurp -x gcc.1 -x arm.md.orig gcc-3.3.backport/gcc/config/arm/arm.c gcc-3.3/gcc/config/arm/arm.c
--- gcc-3.3.backport/gcc/config/arm/arm.c	2003-07-28 11:05:23.000000000 -0400
+++ gcc-3.3/gcc/config/arm/arm.c	2003-07-28 11:11:26.000000000 -0400
@@ -9804,6 +9838,9 @@ arm_regno_class (regno)
   if (IS_IWMMXT_REGNUM (regno))
     return IWMMXT_REGS;
 
+  if (IS_IWMMXT_GR_REGNUM (regno))
+    return IWMMXT_GR_REGS;
+
   return FPU_REGS;
 }
 
@@ -12627,28 +12664,20 @@ const char *
 arm_output_load_gr (operands)
      rtx * operands;
 {
-  rtx reg;
   rtx offset;
-  rtx wcgr;
   rtx sum;
   
-  if (GET_CODE (operands [1]) != MEM
-      || GET_CODE (sum = XEXP (operands [1], 0)) != PLUS
-      || GET_CODE (reg = XEXP (sum, 0)) != REG
-      || GET_CODE (offset = XEXP (sum, 1)) != CONST_INT
-      || ((INTVAL (offset) < 1024) && (INTVAL (offset) > -1024)))
-    return "wldrw%?\t%0, %1";
+  if (current_insn_predicate == NULL
+      && (GET_CODE (operands [1]) != MEM
+	  || GET_CODE (sum = XEXP (operands [1], 0)) != PLUS
+	  || GET_CODE (XEXP (sum, 0)) != REG
+	  || GET_CODE (offset = XEXP (sum, 1)) != CONST_INT
+	  || ((INTVAL (offset) < 1024) && (INTVAL (offset) > -1024))))
+    return "wldrw\t%0, %1";
   
   /* Fix up an out-of-range load of a GR register.  */  
-  output_asm_insn ("str%?\t%0, [sp, #-4]!\t@ Start of GR load expansion", & reg);
-  wcgr = operands[0];
-  operands[0] = reg;
-  output_asm_insn ("ldr%?\t%0, %1", operands);
-
-  operands[0] = wcgr;
-  operands[1] = reg;
-  output_asm_insn ("tmcr%?\t%0, %1", operands);
-  output_asm_insn ("ldr%?\t%0, [sp], #4\t@ End of GR load expansion", & reg);
+  output_asm_insn ("ldr%?\t%2, %1", operands);
+  output_asm_insn ("tmcr%?\t%0, %2", operands);
 
   return "";
 }
diff -Nurp -x gcc.1 -x arm.md.orig gcc-3.3.backport/gcc/config/arm/arm.h gcc-3.3/gcc/config/arm/arm.h
--- gcc-3.3.backport/gcc/config/arm/arm.h	2003-07-28 11:05:23.000000000 -0400
+++ gcc-3.3/gcc/config/arm/arm.h	2003-07-28 11:11:26.000000000 -0400
@@ -1195,6 +1202,14 @@ enum reg_class
    or could index an array.  */
 #define REGNO_REG_CLASS(REGNO)  arm_regno_class (REGNO)
 
+/* FPA registers can't do dubreg as all values are reformatted to internal
+   precision.  IWMMXT_GR_REGS is here to prevent reload from loading
+   (SImode subregs of) DImode values into them.  */
+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
+   ? (IWMMXT_GR_REGS == CLASS) ? 1 :			\
+     reg_classes_intersect_p (FPU_REGS, (CLASS)) : 0)
+
 /* The class value for index registers, and the one for base regs.  */
 #define INDEX_REG_CLASS  (TARGET_THUMB ? LO_REGS : GENERAL_REGS)
 #define BASE_REG_CLASS   (TARGET_THUMB ? LO_REGS : GENERAL_REGS)
@@ -1318,21 +1333,33 @@ enum reg_class
        : NO_REGS)) 							\
    : NO_REGS)
 
-/* Return the register class of a scratch register needed to copy IN into
-   or out of a register in CLASS in MODE.  If it can be done directly,
-   NO_REGS is returned.  */
+/* Return the register class of a scratch register needed to copy a
+   register CLASS in MODE to X.  If it can be done directly, NO_REGS
+   is returned.  */
 #define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)		\
   (TARGET_ARM ?							\
    (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1)	\
+    ? GENERAL_REGS :						\
+    CLASS == IWMMXT_GR_REGS					\
+    && (!REG_P (X) || !HARD_REGISTER_P (X)			\
+	|| REGNO_REG_CLASS (REGNO (X)) != GENERAL_REGS)		\
     ? GENERAL_REGS : NO_REGS)					\
    : THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X))
    
+/* Return the register class of a scratch register needed to copy X to
+   a register CLASS in MODE.  If it can be done directly, NO_REGS
+   is returned.  */
 /* If we need to load shorts byte-at-a-time, then we need a scratch. */
 #define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)		\
   (TARGET_ARM ?							\
    (((CLASS) == IWMMXT_REGS || (CLASS) == IWMMXT_GR_REGS)	\
       && CONSTANT_P (X))					\
    ? GENERAL_REGS :						\
+   CLASS == IWMMXT_GR_REGS					\
+   && (!REG_P (X) || !HARD_REGISTER_P (X)			\
+       || (REGNO_REG_CLASS (REGNO (X)) != GENERAL_REGS		\
+           && REGNO_REG_CLASS (REGNO (X)) != IWMMXT_GR_REGS))	\
+   ? GENERAL_REGS :						\
    (((MODE) == HImode && ! arm_arch4 && TARGET_MMU_TRAPS	\
      && (GET_CODE (X) == MEM					\
 	 || ((GET_CODE (X) == REG || GET_CODE (X) == SUBREG)	\
diff -Nurp -x gcc.1 -x arm.md.orig gcc-3.3.backport/gcc/config/arm/arm.md gcc-3.3/gcc/config/arm/arm.md
--- gcc-3.3.backport/gcc/config/arm/arm.md	2003-07-28 11:05:23.000000000 -0400
+++ gcc-3.3/gcc/config/arm/arm.md	2003-07-28 11:11:26.000000000 -0400
@@ -8806,10 +8806,7 @@
 			 (match_dup 0)
 			 (match_operand 4 "" "")))
    (clobber (reg:CC CC_REGNUM))]
-  ;; Note we have to suppress this split for the iwmmxt because it
-  ;; creates a conditional movsi and the iwmmxt_movsi_insn pattern
-  ;; is not predicable.  This sucks.
-  "TARGET_ARM && reload_completed && ! TARGET_IWMMXT"
+  "TARGET_ARM && reload_completed"
   [(set (match_dup 5) (match_dup 6))
    (cond_exec (match_dup 7)
 	      (set (match_dup 0) (match_dup 4)))]
@@ -8837,10 +8834,7 @@
 			 (match_operand 4 "" "")
 			 (match_dup 0)))
    (clobber (reg:CC CC_REGNUM))]
-  ;; Note we have to suppress this split for the iwmmxt because it
-  ;; creates a conditional movsi and the iwmmxt_movsi_insn pattern
-  ;; is not predicable.  This sucks.
-  "TARGET_ARM && reload_completed && ! TARGET_IWMMXT"
+  "TARGET_ARM && reload_completed"
   [(set (match_dup 5) (match_dup 6))
    (cond_exec (match_op_dup 1 [(match_dup 5) (const_int 0)])
 	      (set (match_dup 0) (match_dup 4)))]
@@ -8861,10 +8855,7 @@
 			 (match_operand 4 "" "")
 			 (match_operand 5 "" "")))
    (clobber (reg:CC CC_REGNUM))]
-  ;; Note we have to suppress this split for the iwmmxt because it
-  ;; creates a conditional movsi and the iwmmxt_movsi_insn pattern
-  ;; is not predicable.  This sucks.
-  "TARGET_ARM && reload_completed && ! TARGET_IWMMXT"
+  "TARGET_ARM && reload_completed"
   [(set (match_dup 6) (match_dup 7))
    (cond_exec (match_op_dup 1 [(match_dup 6) (const_int 0)])
 	      (set (match_dup 0) (match_dup 4)))
@@ -8896,10 +8887,7 @@
 			 (not:SI
 			  (match_operand:SI 5 "s_register_operand" ""))))
    (clobber (reg:CC CC_REGNUM))]
-  ;; Note we have to suppress this split for the iwmmxt because it
-  ;; creates a conditional movsi and the iwmmxt_movsi_insn pattern
-  ;; is not predicable.  This sucks.
-  "TARGET_ARM && reload_completed && ! TARGET_IWMMXT"
+  "TARGET_ARM && reload_completed"
   [(set (match_dup 6) (match_dup 7))
    (cond_exec (match_op_dup 1 [(match_dup 6) (const_int 0)])
 	      (set (match_dup 0) (match_dup 4)))
diff -Nurp -x gcc.1 -x arm.md.orig gcc-3.3.backport/gcc/config/arm/iwmmxt.md gcc-3.3/gcc/config/arm/iwmmxt.md
--- gcc-3.3.backport/gcc/config/arm/iwmmxt.md	2003-07-28 11:05:23.000000000 -0400
+++ gcc-3.3/gcc/config/arm/iwmmxt.md	2003-07-28 11:11:26.000000000 -0400
@@ -91,37 +91,51 @@
    (set_attr "neg_pool_range" "*,1012,*,*,*,*,*,*")]
 )
 
+(define_expand "reload_outsi"
+  [(parallel [(match_operand:SI 0 "general_operand" "=zm")
+	      (match_operand:SI 1 "general_operand" "r")
+	      (match_operand:SI 2 "register_operand" "=&r")])]
+  "TARGET_REALLY_IWMMXT"
+  "
+  emit_insn (gen_movsi (operands[2], operands[1]));
+  emit_insn (gen_movsi (operands[0], operands[2]));
+  DONE;
+  "
+)
+
+(define_expand "reload_insi"
+  [(parallel [(match_operand:SI 0 "general_operand" "=z")
+	      (match_operand:SI 1 "general_operand" "zm")
+	      (match_operand:SI 2 "register_operand" "=&r")])]
+  "TARGET_REALLY_IWMMXT"
+  "
+  emit_insn (gen_movsi (operands[2], operands[1]));
+  emit_insn (gen_movsi (operands[0], operands[2]));
+  DONE;
+  "
+)
+
 (define_insn "*iwmmxt_movsi_insn"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r, m,z,r,?z,m,z")
-	(match_operand:SI 1 "general_operand"      "rI,K,mi,r,r,z,m,z,z"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r, m,z,r")
+	(match_operand:SI 1 "general_operand"      "rI,K,mi,r,r,z"))]
   "TARGET_REALLY_IWMMXT
    && (   register_operand (operands[0], SImode)
        || register_operand (operands[1], SImode))"
   "*
    switch (which_alternative)
    {
-   case 0: return \"mov\\t%0, %1\";
-   case 1: return \"mvn\\t%0, #%B1\";
-   case 2: return \"ldr\\t%0, %1\";
-   case 3: return \"str\\t%1, %0\";
-   case 4: return \"tmcr\\t%0, %1\";
-   case 5: return \"tmrc\\t%0, %1\";
-   case 6: return arm_output_load_gr (operands);
-   case 7: return \"wstrw\\t%1, %0\";
-   default:return \"wstrw\\t%1, [sp, #-4]!\;wldrw\\t%0, [sp], #4\\t@move CG reg\";
+   case 0: return \"mov%?\\t%0, %1\";
+   case 1: return \"mvn%?\\t%0, #%B1\";
+   case 2: return \"ldr%?\\t%0, %1\";
+   case 3: return \"str%?\\t%1, %0\";
+   case 4: return \"tmcr%?\\t%0, %1\";
+   case 5: return \"tmrc%?\\t%0, %1\";
   }"
-  [(set_attr "type"           "*,*,load,store1,*,*,load,store1,*")
-   (set_attr "length"         "*,*,*,        *,*,*,  16,     *,8")
-   (set_attr "pool_range"     "*,*,4096,     *,*,*,1024,     *,*")
-   (set_attr "neg_pool_range" "*,*,4084,     *,*,*,   *,  1012,*")
-   ;; Note - the "predicable" attribute is not allowed to have alternatives.
-   ;; Since the wSTRw wCx instruction is not predicable, we cannot support
-   ;; predicating any of the alternatives in this template.  This sucks.
-   (set_attr "predicable"     "no")
-   ;; Also - we have to pretend that these insns clobber the condition code
-   ;; bits as otherwise arm_final_prescan_insn() will try to conditionalize
-   ;; them.
-   (set_attr "conds" "clob")]
+  [(set_attr "type"           "*,*,load,store1,*,*")
+   (set_attr "length"         "*,*,*,        *,*,*")
+   (set_attr "pool_range"     "*,*,4096,     *,*,*")
+   (set_attr "neg_pool_range" "*,*,4084,     *,*,*")
+   (set_attr "predicable"     "yes")]
 )
 
 (define_insn "movv8qi_internal"


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]