This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Painful problems with -fpic implementation on powerpc-sysv



This is an analysis of two very nasty bugs with -fpic/-fPIC and
optimisation on powerpc SVR4 ABI targets.

Two testcases for the bugs are attached below, called 'test3.c' and
'test4.c'.  The original files were from TclX 8.0.

If these are compiled with 'gcc -O2 -fpic test3.c -S', gcc fails with
an internal compiler error.  For test3.c, it's:

test3.c: In function `TclX_MaxObjCmd':
test3.c:12: internal error--insn does not satisfy its constraints:

(insn 48 11 13 (set (reg:SI 11 r11)
        (unspec[ 
                (symbol_ref/u:SI ("*.LC2"))
                (reg:SI 65 lr)
            ]  8)) 398 {*movsi_got_internal} (nil)
    (nil))
toplev.c:1360: Internal compiler error in function fatal_insn


For test4.c:

test4.c: In function `TclX_MaxObjCmd':
test4.c:8: internal error -- needed new GOT register during reload phase to load:

(symbol_ref/u:SI ("*.LC0"))
toplev.c:1360: Internal compiler error in function fatal_insn


The underlying problem for both of these is the same.

The powerpc -fpic implementation works by allocating a pseudo to hold
the GOT pointer, when required, (in the routine rs6000_got_register).
In rs6000_finalise_pic, the pseudo is initialised.  The result is used
in the movsi_got_internal insn.

There are two problems with this:

1. rs6000_finalise_pic is run before reload.  But sometimes reload can
   create a new memory reference (see, for instance, test4.c).  So
   rs6000_finalise_pic does not know where all the memory references
   are, or even if there are any at all.

2. Worse, rs6000_finalise_pic is actually run before scheduling, and
   so the scheduler can get the dependencies wrong if new memory
   references appear later.  For instance, in test3.c after local
   register allocation, there is:

(insn 13 11 38 (set (reg:DI 84)
        (const_double (const_int 0) 0 2146435072)) 425 {*movdi_32} (nil)
    (expr_list:REG_EQUIV (const_double (const_int 0) 0 2146435072)
        (nil)))

(insn 38 13 14 (set (reg:SI 88)
        (unspec[ 
                (const_int 0)
            ]  7)) 512 {init_v4_pic} (nil)
    (nil))

But the first insn, number 13, is a memory reference.  It doesn't look
like it now, but pseudo 84 will be allocated to a floating-point
register, and the only way to load a constant into a FP register is
from memory.  So reload will push the const_double out to a constant
memory location.  The load from memory will require the GOT pointer.
But the GOT pointer is not initialised until the _next_ insn.

Worse, because reload doesn't know any better at this point, it will
decide that pseudo 88 can be left in the link register (which is where
init_v4_pic puts it initially), but it has to be moved to a general
register to be used in movsi_got_internal.

test4.c is similar, but here no init_v4_pic insn is ever emitted, and
this case is caught earlier.


How to fix?  Well, clearly, if the scheduler is going to move insns
around based on what registers they use, then it is impermissible to
add new occurrences of the GOT pseudo after init_v4_pic has been
emitted; they will confuse sched's dependency checking.

A partial patch to try to (1) ensure and (2) enforce this is attached
below.  It is much better at enforcement, so much so that
'make LANGUAGES=c' fails when it tries to compile crtbegin.o with the
patched egcs :-(.  It does however let the tests below compile, and if
you remove
+  if (too_late_for_got)
+    abort();
from the patch, it's somewhat usable---at least, no worse than before.

The problem here is that half the compiler happens after FINALIZE_PIC.
Perhaps init_v4_pic should be shifted down to just before reload?  Or
maybe the (unspec[(const_int 0)] 7) should be inserted directly into
the insns if we need to make a new GOT reference between FINALIZE_PIC
and reload (this happens often so it might hurt performance), then the
patch below can try to avoid creating new GOT references during and
after reload.

Can anyone think of a better way?  The other ports use a fixed
register for the GOT.  The advantage of the rs6000 technique is that
it lets the GOT pointer be spilled to memory, which actually happens
when compiling the internals of printf (among other places); and it
allows reuse of the GOT register if the procedure doesn't actually
access global data.


PS: The change to PREFERRED_RELOAD_CLASS should probably go in no
matter how the -fpic problems are fixed.

-- 
Geoffrey Keating <geoffk@ozemail.com.au>

===File ~/gcc-bugs/test3.c==================================
double
TclX_MaxObjCmd (int objc, double value)
{
    double maxValue = - (__extension__	((union { 
	   unsigned __l __attribute__((__mode__(__DI__)));
	   double __d;})
	{ __l: 0x7ff0000000000000ULL }).__d);
    if (value > maxValue) {
      maxValue = value;
    }
    return 0 ;
}
============================================================

===File ~/gcc-bugs/test4.c==================================
double
TclX_MaxObjCmd (int objc, double value)
{
    return - (__extension__ ((union {
	   unsigned __l __attribute__((__mode__(__DI__)));
	   double __d; })
	{ __l: 0x7ff0000000000000ULL }).__d)  ;
}
============================================================

===File ~/patches/egcs-3.diff===============================
Wed Aug 19 04:26:35 1998  Geoff Keating  <geoffk@ozemail.com.au>

	* config/rs6000/rs6000.c (too_late_for_got): New flag.
	(rs6000_init_expanders): Initialise it.
	(rs6000_finalize_pic): Set it.
	(rs6000_got_register): Check it.
	* config/rs6000/rs6000.h (AVOID_RELOAD_CONST_MEM): Use new flag.
	(PREFERRED_RELOAD_CLASS): Correct definition, to be like sparc.h.
	* reload.c (find_reloads): Use AVOID_RELOAD_CONST_MEM.

--- reload.c~	Wed Aug 19 04:13:00 1998
+++ reload.c	Wed Aug 19 04:21:35 1998
@@ -113,6 +113,10 @@ a register with any other reload.  */
 #ifndef REG_MODE_OK_FOR_BASE_P
 #define REG_MODE_OK_FOR_BASE_P(REGNO, MODE) REG_OK_FOR_BASE_P (REGNO)
 #endif
+
+#ifndef AVOID_RELOAD_CONST_MEM
+#define AVOID_RELOAD_CONST_MEM(X) 0
+#endif
 
 /* The variables set up by `find_reloads' are:
 
@@ -2995,7 +2999,8 @@ find_reloads (insn, replace, ind_levels,
 		  win = 1;
 		if (CONSTANT_P (operand)
 		    /* force_const_mem does not accept HIGH.  */
-		    && GET_CODE (operand) != HIGH)
+		    && GET_CODE (operand) != HIGH
+		    && !AVOID_RELOAD_CONST_MEM (operand))
 		  badop = 0;
 		constmemok = 1;
 		break;
@@ -3071,7 +3076,9 @@ find_reloads (insn, replace, ind_levels,
 			    || (reg_equiv_address[REGNO (operand)] != 0))))
 		  win = 1;
 		/* force_const_mem does not accept HIGH.  */
-		if ((CONSTANT_P (operand) && GET_CODE (operand) != HIGH)
+		if ((CONSTANT_P (operand)
+		     && GET_CODE (operand) != HIGH
+		     && !AVOID_RELOAD_CONST_MEM (operand))
 		    || GET_CODE (operand) == MEM)
 		  badop = 0;
 		constmemok = 1;
--- config/rs6000/rs6000.h~	Mon Aug 17 17:19:30 1998
+++ config/rs6000/rs6000.h	Wed Aug 19 04:25:31 1998
@@ -1105,8 +1105,11 @@ enum reg_class
    floating-point CONST_DOUBLE to force it to be copied to memory.  */
 
 #define PREFERRED_RELOAD_CLASS(X,CLASS)			\
-  ((GET_CODE (X) == CONST_DOUBLE			\
-    && GET_MODE_CLASS (GET_MODE (X)) == MODE_FLOAT)	\
+  (CONSTANT_P (X)					\
+   && ((CLASS) == FLOAT_REGS				\
+       || (GET_MODE_CLASS (GET_MODE (X)) == MODE_FLOAT	\
+	   && (HOST_FLOAT_FORMAT != IEEE_FLOAT_FORMAT	\
+	       || HOST_BITS_PER_INT != BITS_PER_WORD)))	\
    ? NO_REGS : (CLASS))
 
 /* Return the register class of a scratch register needed to copy IN into
@@ -1122,6 +1125,11 @@ enum reg_class
 #define SECONDARY_MEMORY_NEEDED(CLASS1,CLASS2,MODE) \
  ((CLASS1) != (CLASS2) && ((CLASS1) == FLOAT_REGS || (CLASS2) == FLOAT_REGS))
 
+/* There are some times when it is inconvenient to force a constant
+   to memory.  */
+
+#define AVOID_RELOAD_CONST_MEM(X) (flag_pic && too_late_for_got)
+
 /* Return the maximum number of consecutive registers
    needed to represent mode MODE in a register of class CLASS.
 
@@ -3165,6 +3173,7 @@ extern int flag_pic;
 extern int optimize;
 extern int flag_expensive_optimizations;
 extern int frame_pointer_needed;
+extern int too_late_for_got;
 
 /* Declare functions in rs6000.c */
 extern void output_options ();
--- config/rs6000/rs6000.c~	Mon Aug 17 17:19:48 1998
+++ config/rs6000/rs6000.c	Wed Aug 19 04:26:16 1998
@@ -2263,16 +2263,22 @@ ccr_bit (op, scc_p)
     }
 }
 
+/* When this is 1, we cannot reliably return the GOT register.  */
+int too_late_for_got = 0;
+
 /* Return the GOT register, creating it if needed.  */
 
 struct rtx_def *
 rs6000_got_register (value)
      rtx value;
 {
+  if (too_late_for_got)
+    abort();
+
   if (!current_function_uses_pic_offset_table || !pic_offset_table_rtx)
     {
       if (reload_in_progress || reload_completed)
-	fatal_insn ("internal error -- needed new GOT register during reload phase to load:", value);
+	abort();
 
       current_function_uses_pic_offset_table = 1;
       pic_offset_table_rtx = gen_rtx_REG (Pmode, GOT_TOC_REGNUM);
@@ -2391,6 +2397,7 @@ rs6000_finalize_pic ()
 	    last_insn = insn;
 	}
 
+      too_late_for_got = 1;
       if (reg)
 	{
 	  rtx init = gen_init_v4_pic (reg);
@@ -2473,6 +2480,7 @@ rs6000_init_expanders ()
   rs6000_fpmem_size = 0;
   rs6000_fpmem_offset = 0;
   pic_offset_table_rtx = (rtx)0;
+  too_late_for_got = 0;
 
   /* Arrange to save and restore machine status around nested functions.  */
   save_machine_status = rs6000_save_machine_status;
============================================================


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]