This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: Prevent double execution of MIPS call stubs

Daniel Jacobowitz <> writes:
> Richard, this is the result of our conversation a few weeks ago.  You
> aren't going to like it, I suspect :-)
> MIPS lazy binding stubs are similar to PLT entries on other platforms,
> though not quite the same.  Every call site loads the address of the
> function to call from the GOT into $t9 and jumps to it.  Initially the
> GOT entry points to a unique lazy binding stub, which invokes the
> dynamic linker to fill in the GOT entry.  Later on the entry points
> to the target function in some loaded shared library and the stub is
> completely bypassed.
> One consequence of this is that hoisting that load from the GOT out
> of a loop can do any of these things:
>   - Go fast.  If the stub has already run once, then we'll get the
>     final procedure address as expected.
>   - Crash.  The lazy binding stubs aren't supposed to be called twice.
>     This is what happened originally, when Richard first fixed this
>     problem.
>   - Go really really REALLY slow.  We will invoke the stub every time
>     through the loop.  Some of the libstdc++ tests trigger this, since
>     they consist of a single loop in main testing some library
>     routine.  They go from thirty seconds to ten minutes.
> So what GCC used to do was define every GOT load as an unspec which
> referenced a call-clobbered pseudo-register named FAKE_CALL_REGNO.
> This let the GOT load be moved freely past other memory references,
> which it would never alias, but never past a call.
> That stopped working between 4.1 and 4.2 with part of the new dataflow
> infrastructure.  It deliberately treats clobbers as weaker than sets,
> so it detected that the load used an unspecified value of
> FAKE_CALL_REGNO.  It then showed no reservations of changing which
> clobbered value reached the load; one from the previous call was fine,
> or one from a call later down the loop was fine too.  Good for it!
> Bad for MIPS though.
> I tried a couple of other things and in the end I ended up doing this
> the same way as various other ports, including CRIS.  I changed the
> MIPS port to use a MEM for the GOT load, in alias set 0.  Richard
> warned me that this would make schedules more constrained; it does,
> but benchmarking repeatedly did not reveal any measurable difference
> in performance.  So I think this is an acceptable compromise.  If
> someone wants to make it go faster now, then I recommend doing it by
> fixing this:
>       /* This MEM doesn't alias anything - not even alias set 0,
> 	 though there is no way to record that.  */
> Of course we'd have to make calls still interfere with such alias sets
> or we'd be right back where we started.

This isn't a full fix though: const functions don't clobber memory,
so loads of their address can still be hoisted.  E.g. for:

#include <math.h>

foo (double *x, int n)
  int i;

  for (i = 0; i < n; i++)
    x[i] = ceil (x[n]);

we still hoist &ceil after the patch.

> OK to commit?  Any other suggested approaches?  Any brilliant ideas
> that would let us hoist the load anyway but avoid this pessimization
> on the first execution?

Still thinking...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]