This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RFA: Prevent double execution of MIPS call stubs
- From: Daniel Jacobowitz <drow at false dot org>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Richard Sandiford <rsandifo at nildram dot co dot uk>, Eric Christopher <echristo at apple dot com>
- Date: Wed, 17 Oct 2007 09:16:45 -0400
- Subject: RFA: Prevent double execution of MIPS call stubs
Richard, this is the result of our conversation a few weeks ago. You
aren't going to like it, I suspect :-)
MIPS lazy binding stubs are similar to PLT entries on other platforms,
though not quite the same. Every call site loads the address of the
function to call from the GOT into $t9 and jumps to it. Initially the
GOT entry points to a unique lazy binding stub, which invokes the
dynamic linker to fill in the GOT entry. Later on the entry points
to the target function in some loaded shared library and the stub is
completely bypassed.
One consequence of this is that hoisting that load from the GOT out
of a loop can do any of these things:
- Go fast. If the stub has already run once, then we'll get the
final procedure address as expected.
- Crash. The lazy binding stubs aren't supposed to be called twice.
This is what happened originally, when Richard first fixed this
problem.
- Go really really REALLY slow. We will invoke the stub every time
through the loop. Some of the libstdc++ tests trigger this, since
they consist of a single loop in main testing some library
routine. They go from thirty seconds to ten minutes.
So what GCC used to do was define every GOT load as an unspec which
referenced a call-clobbered pseudo-register named FAKE_CALL_REGNO.
This let the GOT load be moved freely past other memory references,
which it would never alias, but never past a call.
That stopped working between 4.1 and 4.2 with part of the new dataflow
infrastructure. It deliberately treats clobbers as weaker than sets,
so it detected that the load used an unspecified value of
FAKE_CALL_REGNO. It then showed no reservations of changing which
clobbered value reached the load; one from the previous call was fine,
or one from a call later down the loop was fine too. Good for it!
Bad for MIPS though.
I tried a couple of other things and in the end I ended up doing this
the same way as various other ports, including CRIS. I changed the
MIPS port to use a MEM for the GOT load, in alias set 0. Richard
warned me that this would make schedules more constrained; it does,
but benchmarking repeatedly did not reveal any measurable difference
in performance. So I think this is an acceptable compromise. If
someone wants to make it go faster now, then I recommend doing it by
fixing this:
/* This MEM doesn't alias anything - not even alias set 0,
though there is no way to record that. */
Of course we'd have to make calls still interfere with such alias sets
or we'd be right back where we started.
Tested on mips-linux using C, C++, and GDB. I also bootstrapped
it on mips-linux after much screaming and pounding of fists, but
I couldn't build libjava (problems with tar, zip) and I couldn't
get dejagnu working on the target so I fell back to cross testing.
But the compiler did bootstrap. And I ran several benchmarks
before trying to quantify the performance impact.
OK to commit? Any other suggested approaches? Any brilliant ideas
that would let us hoist the load anyway but avoid this pessimization
on the first execution?
--
Daniel Jacobowitz
CodeSourcery
2007-10-17 Daniel Jacobowitz <dan@codesourcery.com>
* config/mips/mips.md (UNSPEC_LOAD_CALL, FAKE_CALL_REGNO)
(load_call<mode>): Delete.
* config/mips/mips.c (mips_load_call_address): Use a MEM.
* config/mips/mips.h: Do not mention FAKE_CALL_REGNO.
(REGISTER_NAMES): Remove $fakec.
Index: config/mips/mips.md
===================================================================
--- config/mips/mips.md (revision 181978)
+++ config/mips/mips.md (working copy)
@@ -41,7 +41,6 @@
(UNSPEC_STORE_LEFT 20)
(UNSPEC_STORE_RIGHT 21)
(UNSPEC_LOADGP 22)
- (UNSPEC_LOAD_CALL 23)
(UNSPEC_LOAD_GOT 24)
(UNSPEC_GP 25)
(UNSPEC_MFHILO 26)
@@ -55,8 +54,6 @@
(UNSPEC_ADDRESS_FIRST 100)
- (FAKE_CALL_REGNO 79)
-
;; For MIPS Paired-Singled Floating Point Instructions.
(UNSPEC_MOVE_TF_PS 200)
@@ -5613,32 +5610,6 @@
;;
;; ....................
-;; Instructions to load a call address from the GOT. The address might
-;; point to a function or to a lazy binding stub. In the latter case,
-;; the stub will use the dynamic linker to resolve the function, which
-;; in turn will change the GOT entry to point to the function's real
-;; address.
-;;
-;; This means that every call, even pure and constant ones, can
-;; potentially modify the GOT entry. And once a stub has been called,
-;; we must not call it again.
-;;
-;; We represent this restriction using an imaginary fixed register that
-;; acts like a GOT version number. By making the register call-clobbered,
-;; we tell the target-independent code that the address could be changed
-;; by any call insn.
-(define_insn "load_call<mode>"
- [(set (match_operand:P 0 "register_operand" "=d")
- (unspec:P [(match_operand:P 1 "register_operand" "r")
- (match_operand:P 2 "immediate_operand" "")
- (reg:P FAKE_CALL_REGNO)]
- UNSPEC_LOAD_CALL))]
- "TARGET_USE_GOT"
- "<load>\t%0,%R2(%1)"
- [(set_attr "type" "load")
- (set_attr "mode" "<MODE>")
- (set_attr "length" "4")])
-
;; Sibling calls. All these patterns use jump instructions.
;; If TARGET_SIBCALLS, call_insn_operand will only accept constant
Index: config/mips/mips.c
===================================================================
--- config/mips/mips.c (revision 181978)
+++ config/mips/mips.c (working copy)
@@ -3680,14 +3684,19 @@ mips_load_call_address (rtx dest, rtx ad
&& mips_ok_for_lazy_binding_p (addr))
{
rtx high, lo_sum_symbol;
+ rtx got_addr, got_mem;
high = mips_unspec_offset_high (dest, pic_offset_table_rtx,
addr, SYMBOL_GOTOFF_CALL);
lo_sum_symbol = mips_unspec_address (addr, SYMBOL_GOTOFF_CALL);
- if (Pmode == SImode)
- emit_insn (gen_load_callsi (dest, high, lo_sum_symbol));
- else
- emit_insn (gen_load_calldi (dest, high, lo_sum_symbol));
+
+ got_addr = gen_rtx_LO_SUM (Pmode, high, lo_sum_symbol);
+ got_mem = gen_rtx_MEM (Pmode, got_addr);
+ /* This MEM doesn't alias anything - not even alias set 0,
+ though there is no way to record that. */
+ set_mem_alias_set (got_mem, new_alias_set ());
+ MEM_NOTRAP_P (got_mem) = 1;
+ emit_move_insn (dest, got_mem);
}
else
emit_move_insn (dest, addr);
Index: config/mips/mips.h
===================================================================
--- config/mips/mips.h (revision 181978)
+++ config/mips/mips.h (working copy)
@@ -1349,11 +1349,10 @@ extern const struct mips_rtx_cost_data *
- 8 condition code registers
- 2 accumulator registers (hi and lo)
- 32 registers each for coprocessors 0, 2 and 3
- - 3 fake registers:
+ - 2 fake registers:
- ARG_POINTER_REGNUM
- FRAME_POINTER_REGNUM
- - FAKE_CALL_REGNO (see the comment above load_callsi for details)
- - 3 dummy entries that were used at various times in the past.
+ - 4 dummy entries that were used at various times in the past.
- 6 DSP accumulator registers (3 hi-lo pairs) for MIPS DSP ASE
- 6 DSP control registers */
@@ -2519,7 +2518,7 @@ typedef struct mips_args {
"$f16", "$f17", "$f18", "$f19", "$f20", "$f21", "$f22", "$f23", \
"$f24", "$f25", "$f26", "$f27", "$f28", "$f29", "$f30", "$f31", \
"hi", "lo", "", "$fcc0","$fcc1","$fcc2","$fcc3","$fcc4", \
- "$fcc5","$fcc6","$fcc7","", "", "$arg", "$frame", "$fakec", \
+ "$fcc5","$fcc6","$fcc7","", "", "$arg", "$frame", "", \
"$c0r0", "$c0r1", "$c0r2", "$c0r3", "$c0r4", "$c0r5", "$c0r6", "$c0r7", \
"$c0r8", "$c0r9", "$c0r10","$c0r11","$c0r12","$c0r13","$c0r14","$c0r15", \
"$c0r16","$c0r17","$c0r18","$c0r19","$c0r20","$c0r21","$c0r22","$c0r23", \