This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
PR 28982: reload deals incorrectly with automodified addresses

From: Richard Sandiford <richard at codesourcery dot com>
To: gcc-patches at gcc dot gnu dot org
Date: Fri, 08 Sep 2006 10:50:49 +0100
Subject: PR 28982: reload deals incorrectly with automodified addresses
If reload decides to create an automodification reload -- for example,
if it decides to reload a POST_MODIFY into a base register --
inc_for_reload will use the original base and index values.  However,
if the base or index has been reloaded, inc_for_reload should use the
associated reload register instead.

The symptoms are twofold:

  (1) If there's a reload for the index register, you'll get an ICE such as:

        error: unrecognizable insn:
        (insn 481 479 482 3 (set (reg:SI 1 r1)
                (plus:SI (reg:SI 1 r1)
                    (mem/c:SI (plus:SI (reg/f:SI 13 sp)
                            (const_int 176 [0xb0])) [25 pretmp.56+0 S4 A32]))) -1 (nil)
            (nil))

  (2) If there's a reload for the base register, the modification
      performed by inc_for_reload will be lost by the copy-out for the
      base register reload.

The patch below fixes this by calling find_replacement in inc_for_reload.

Unfortunately, there's a further problem.  Suppose we have a
have a PRE_MODIFY or POST_MODIFY expression (E) and two reloads
(A) and (B) where:

  (A) is a RELOAD_FOR_INPUT for (E)
  (B) reloads (E)'s index

(B) will then be a RELOAD_FOR_INPUT_ADDRESS.  This is incorrect, as
inc_for_reload might only read the index _after_ setting (A)'s reload
register.  (B)'s reload register must therefore live longer than a
normal RELOAD_FOR_INPUT_ADDRESS is required to.

I toyed with various ways of changing the RELOAD_FOR_INPUT_ADDRESS
reloads after the fact, or generating entirely new reloads, but I think
that's too complicated and dangerous.  It gets particularly bad if the
address in the index reload itself needs reloading, or if the index
reload is shared between two address reloads.  I think any such approach
goes too much against the basic design.

I think the best fix is therefore to be conservative and treat the index
reload for a PRE_MODIFY or POST_MODIFY as RELOAD_OTHER.  I think the
only downside is that reload will be overly conservative about the live
range of the index register in some cases.

I've included two testcases, both of them brute-force ways of triggering
base and index spills.  In the first test, the spills themselves do not
need reloads, but in the second, they do.

Regression-tested on arm-linux-gnueabi.  Also bootstrapped & regression
tested on x86_64-linux-gnu as a sanity check.  OK for trunk?

Richard


gcc/
	* reload.c (find_reloads_address_1): Use RELOAD_OTHER for the
	index of a PRE_MODIFY or POST_MODIFY address.
	* reload1.c (inc_for_reload): Use find_replacement on the original
	base and index registers.

gcc/testsuite/
	* gcc.c-torture/execute/pr28982a.c: New test.
	* gcc.c-torture/execute/pr28982b.c: Likewise.

Index: gcc/reload.c
===================================================================
--- gcc/reload.c	(revision 116748)
+++ gcc/reload.c	(working copy)
@@ -5541,12 +5541,18 @@ #define REG_OK_FOR_CONTEXT(CONTEXT, REGN
 	/* Require index register (or constant).  Let's just handle the
 	   register case in the meantime... If the target allows
 	   auto-modify by a constant then we could try replacing a pseudo
-	   register with its equivalent constant where applicable.  */
+	   register with its equivalent constant where applicable.
+
+	   If we later decide to reload the whole PRE_MODIFY or
+	   POST_MODIFY, inc_for_reload might clobber the reload register
+	   before reading the index.  The index register might therefore
+	   need to live longer than a TYPE reload normally would, so be
+	   conservative and class it as RELOAD_OTHER.  */
 	if (REG_P (XEXP (op1, 1)))
 	  if (!REGNO_OK_FOR_INDEX_P (REGNO (XEXP (op1, 1))))
 	    find_reloads_address_1 (mode, XEXP (op1, 1), 1, code, SCRATCH,
-				    &XEXP (op1, 1), opnum, type, ind_levels,
-				    insn);
+				    &XEXP (op1, 1), opnum, RELOAD_OTHER,
+				    ind_levels, insn);
 
 	gcc_assert (REG_P (XEXP (op1, 0)));
 
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	(revision 116748)
+++ gcc/reload1.c	(working copy)
@@ -8177,7 +8177,7 @@ delete_address_reloads_1 (rtx dead_insn,
 inc_for_reload (rtx reloadreg, rtx in, rtx value, int inc_amount)
 {
   /* REG or MEM to be copied and incremented.  */
-  rtx incloc = XEXP (value, 0);
+  rtx incloc = find_replacement (&XEXP (value, 0));
   /* Nonzero if increment after copying.  */
   int post = (GET_CODE (value) == POST_DEC || GET_CODE (value) == POST_INC
 	      || GET_CODE (value) == POST_MODIFY);
@@ -8186,7 +8186,7 @@ inc_for_reload (rtx reloadreg, rtx in, r
   rtx add_insn;
   int code;
   rtx store;
-  rtx real_in = in == value ? XEXP (in, 0) : in;
+  rtx real_in = in == value ? incloc : in;
 
   /* No hard register is equivalent to this register after
      inc/dec operation.  If REG_LAST_RELOAD_REG were nonzero,
@@ -8198,7 +8198,7 @@ inc_for_reload (rtx reloadreg, rtx in, r
   if (GET_CODE (value) == PRE_MODIFY || GET_CODE (value) == POST_MODIFY)
     {
       gcc_assert (GET_CODE (XEXP (value, 1)) == PLUS);
-      inc = XEXP (XEXP (value, 1), 1);
+      inc = find_replacement (&XEXP (XEXP (value, 1), 1));
     }
   else
     {
Index: gcc/testsuite/gcc.c-torture/execute/pr28982a.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr28982a.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr28982a.c	(revision 0)
@@ -0,0 +1,65 @@
+/* PR rtl-optimization/28982.  Function foo() does the equivalent of:
+
+     float tmp_results[NVARS];
+     for (int i = 0; i < NVARS; i++)
+       {
+	 int inc = incs[i];
+	 float *ptr = ptrs[i], result = 0;
+	 for (int j = 0; j < n; j++)
+	   result += *ptr, ptr += inc;
+	 tmp_results[i] = result;
+       }
+     memcpy (results, tmp_results, sizeof (results));
+
+   but without the outermost loop.  The idea is to create high register
+   pressure and ensure that some INC and PTR variables are spilled.
+
+   On ARM targets, sequences like "result += *ptr, ptr += inc" can
+   usually be implemented using (mem (post_modify ...)), and we do
+   indeed create such MEMs before reload for this testcase.  However,
+   (post_modify ...) is not a valid address for coprocessor loads, so
+   for -mfloat-abi=softfp, reload reloads the POST_MODIFY into a base
+   register.  GCC did not deal correctly with cases where the base and
+   index of the POST_MODIFY are themselves reloaded.  */
+#define NITER 4
+#define NVARS 20
+#define MULTI(X) \
+  X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \
+  X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19)
+
+#define DECLAREI(INDEX) inc##INDEX = incs[INDEX]
+#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 0
+#define LOOP(INDEX) result##INDEX += *ptr##INDEX, ptr##INDEX += inc##INDEX
+#define COPYOUT(INDEX) results[INDEX] = result##INDEX
+
+float *ptrs[NVARS];
+float results[NVARS];
+int incs[NVARS];
+
+void __attribute__((noinline))
+foo (int n)
+{
+  int MULTI (DECLAREI);
+  float MULTI (DECLAREF);
+  while (n--)
+    MULTI (LOOP);
+  MULTI (COPYOUT);
+}
+
+float input[NITER * NVARS];
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < NVARS; i++)
+    ptrs[i] = input + i, incs[i] = i;
+  for (i = 0; i < NITER * NVARS; i++)
+    input[i] = i;
+  foo (NITER);
+  for (i = 0; i < NVARS; i++)
+    if (results[i] != i * NITER * (NITER + 1) / 2)
+      return 1;
+  return 0;
+}
Index: gcc/testsuite/gcc.c-torture/execute/pr28982b.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr28982b.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr28982b.c	(revision 0)
@@ -0,0 +1,58 @@
+/* Like pr28982a.c, but with the spill slots outside the range of
+   a single sp-based load on ARM.  This test tests for cases where
+   the addresses in the base and index reloads require further reloads.  */
+#if defined(STACK_SIZE) && STACK_SIZE <= 0x80100
+int main (void) { return 0; }
+#else
+#define NITER 4
+#define NVARS 20
+#define MULTI(X) \
+  X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \
+  X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19)
+
+#define DECLAREI(INDEX) inc##INDEX = incs[INDEX]
+#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 0
+#define LOOP(INDEX) result##INDEX += *ptr##INDEX, ptr##INDEX += inc##INDEX
+#define COPYOUT(INDEX) results[INDEX] = result##INDEX
+
+float *ptrs[NVARS];
+float results[NVARS];
+int incs[NVARS];
+
+struct big { int i[0x10000]; };
+void __attribute__((noinline))
+bar (struct big b)
+{
+  incs[0] += b.i[0];
+}
+
+void __attribute__((noinline))
+foo (int n)
+{
+  struct big b = {};
+  int MULTI (DECLAREI);
+  float MULTI (DECLAREF);
+  while (n--)
+    MULTI (LOOP);
+  MULTI (COPYOUT);
+  bar (b);
+}
+
+float input[NITER * NVARS];
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < NVARS; i++)
+    ptrs[i] = input + i, incs[i] = i;
+  for (i = 0; i < NITER * NVARS; i++)
+    input[i] = i;
+  foo (NITER);
+  for (i = 0; i < NVARS; i++)
+    if (results[i] != i * NITER * (NITER + 1) / 2)
+      return 1;
+  return 0;
+}
+#endif
Follow-Ups:
- Re: PR 28982: reload deals incorrectly with automodified addresses
  - From: Ian Lance Taylor
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]