This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFA: Prevent double execution of MIPS call stubs

Richard, this is the result of our conversation a few weeks ago.  You
aren't going to like it, I suspect :-)

MIPS lazy binding stubs are similar to PLT entries on other platforms,
though not quite the same.  Every call site loads the address of the
function to call from the GOT into $t9 and jumps to it.  Initially the
GOT entry points to a unique lazy binding stub, which invokes the
dynamic linker to fill in the GOT entry.  Later on the entry points
to the target function in some loaded shared library and the stub is
completely bypassed.

One consequence of this is that hoisting that load from the GOT out
of a loop can do any of these things:

  - Go fast.  If the stub has already run once, then we'll get the
    final procedure address as expected.

  - Crash.  The lazy binding stubs aren't supposed to be called twice.
    This is what happened originally, when Richard first fixed this

  - Go really really REALLY slow.  We will invoke the stub every time
    through the loop.  Some of the libstdc++ tests trigger this, since
    they consist of a single loop in main testing some library
    routine.  They go from thirty seconds to ten minutes.

So what GCC used to do was define every GOT load as an unspec which
referenced a call-clobbered pseudo-register named FAKE_CALL_REGNO.
This let the GOT load be moved freely past other memory references,
which it would never alias, but never past a call.

That stopped working between 4.1 and 4.2 with part of the new dataflow
infrastructure.  It deliberately treats clobbers as weaker than sets,
so it detected that the load used an unspecified value of
FAKE_CALL_REGNO.  It then showed no reservations of changing which
clobbered value reached the load; one from the previous call was fine,
or one from a call later down the loop was fine too.  Good for it!
Bad for MIPS though.

I tried a couple of other things and in the end I ended up doing this
the same way as various other ports, including CRIS.  I changed the
MIPS port to use a MEM for the GOT load, in alias set 0.  Richard
warned me that this would make schedules more constrained; it does,
but benchmarking repeatedly did not reveal any measurable difference
in performance.  So I think this is an acceptable compromise.  If
someone wants to make it go faster now, then I recommend doing it by
fixing this:

      /* This MEM doesn't alias anything - not even alias set 0,
	 though there is no way to record that.  */

Of course we'd have to make calls still interfere with such alias sets
or we'd be right back where we started.

Tested on mips-linux using C, C++, and GDB.  I also bootstrapped
it on mips-linux after much screaming and pounding of fists, but
I couldn't build libjava (problems with tar, zip) and I couldn't
get dejagnu working on the target so I fell back to cross testing.
But the compiler did bootstrap.  And I ran several benchmarks
before trying to quantify the performance impact.

OK to commit?  Any other suggested approaches?  Any brilliant ideas
that would let us hoist the load anyway but avoid this pessimization
on the first execution?

Daniel Jacobowitz

2007-10-17  Daniel Jacobowitz  <>

	(load_call<mode>): Delete.
	* config/mips/mips.c (mips_load_call_address): Use a MEM.
	* config/mips/mips.h: Do not mention FAKE_CALL_REGNO.
	(REGISTER_NAMES): Remove $fakec.

Index: config/mips/
--- config/mips/	(revision 181978)
+++ config/mips/	(working copy)
@@ -41,7 +41,6 @@
    (UNSPEC_GP			25)
@@ -55,8 +54,6 @@
    ;; For MIPS Paired-Singled Floating Point Instructions.
    (UNSPEC_MOVE_TF_PS		200)
@@ -5613,32 +5610,6 @@
 ;;  ....................
-;; Instructions to load a call address from the GOT.  The address might
-;; point to a function or to a lazy binding stub.  In the latter case,
-;; the stub will use the dynamic linker to resolve the function, which
-;; in turn will change the GOT entry to point to the function's real
-;; address.
-;; This means that every call, even pure and constant ones, can
-;; potentially modify the GOT entry.  And once a stub has been called,
-;; we must not call it again.
-;; We represent this restriction using an imaginary fixed register that
-;; acts like a GOT version number.  By making the register call-clobbered,
-;; we tell the target-independent code that the address could be changed
-;; by any call insn.
-(define_insn "load_call<mode>"
-  [(set (match_operand:P 0 "register_operand" "=d")
-	(unspec:P [(match_operand:P 1 "register_operand" "r")
-		   (match_operand:P 2 "immediate_operand" "")
-		   (reg:P FAKE_CALL_REGNO)]
-  "<load>\t%0,%R2(%1)"
-  [(set_attr "type" "load")
-   (set_attr "mode" "<MODE>")
-   (set_attr "length" "4")])
 ;; Sibling calls.  All these patterns use jump instructions.
 ;; If TARGET_SIBCALLS, call_insn_operand will only accept constant
Index: config/mips/mips.c
--- config/mips/mips.c	(revision 181978)
+++ config/mips/mips.c	(working copy)
@@ -3680,14 +3684,19 @@ mips_load_call_address (rtx dest, rtx ad
       && mips_ok_for_lazy_binding_p (addr))
       rtx high, lo_sum_symbol;
+      rtx got_addr, got_mem;
       high = mips_unspec_offset_high (dest, pic_offset_table_rtx,
 				      addr, SYMBOL_GOTOFF_CALL);
       lo_sum_symbol = mips_unspec_address (addr, SYMBOL_GOTOFF_CALL);
-      if (Pmode == SImode)
-	emit_insn (gen_load_callsi (dest, high, lo_sum_symbol));
-      else
-	emit_insn (gen_load_calldi (dest, high, lo_sum_symbol));
+      got_addr = gen_rtx_LO_SUM (Pmode, high, lo_sum_symbol);
+      got_mem = gen_rtx_MEM (Pmode, got_addr);
+      /* This MEM doesn't alias anything - not even alias set 0,
+	 though there is no way to record that.  */
+      set_mem_alias_set (got_mem, new_alias_set ());
+      MEM_NOTRAP_P (got_mem) = 1;
+      emit_move_insn (dest, got_mem);
     emit_move_insn (dest, addr);
Index: config/mips/mips.h
--- config/mips/mips.h	(revision 181978)
+++ config/mips/mips.h	(working copy)
@@ -1349,11 +1349,10 @@ extern const struct mips_rtx_cost_data *
    - 8 condition code registers
    - 2 accumulator registers (hi and lo)
    - 32 registers each for coprocessors 0, 2 and 3
-   - 3 fake registers:
+   - 2 fake registers:
-	- FAKE_CALL_REGNO (see the comment above load_callsi for details)
-   - 3 dummy entries that were used at various times in the past.
+   - 4 dummy entries that were used at various times in the past.
    - 6 DSP accumulator registers (3 hi-lo pairs) for MIPS DSP ASE
    - 6 DSP control registers  */
@@ -2519,7 +2518,7 @@ typedef struct mips_args {
   "$f16", "$f17", "$f18", "$f19", "$f20", "$f21", "$f22", "$f23",	   \
   "$f24", "$f25", "$f26", "$f27", "$f28", "$f29", "$f30", "$f31",	   \
   "hi",   "lo",   "",     "$fcc0","$fcc1","$fcc2","$fcc3","$fcc4",	   \
-  "$fcc5","$fcc6","$fcc7","", "", "$arg", "$frame", "$fakec",		   \
+  "$fcc5","$fcc6","$fcc7","", "", "$arg", "$frame", "",			   \
   "$c0r0", "$c0r1", "$c0r2", "$c0r3", "$c0r4", "$c0r5", "$c0r6", "$c0r7",  \
   "$c0r8", "$c0r9", "$c0r10","$c0r11","$c0r12","$c0r13","$c0r14","$c0r15", \
   "$c0r16","$c0r17","$c0r18","$c0r19","$c0r20","$c0r21","$c0r22","$c0r23", \

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]