This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Expand PIC calls without PLT with -fno-plt


On Wed, May 06, 2015 at 11:26:29AM -0700, H.J. Lu wrote:
> On Wed, May 6, 2015 at 10:35 AM, Rich Felker <dalias@libc.org> wrote:
> > On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote:
> >> On Wed, 6 May 2015, Jakub Jelinek wrote:
> >> > The linker would know very well what kind of relocations are used for
> >> > particular PLT slot, and for the new relocations which would resolve to the
> >> > address of the .got.plt slot it could just tweak corresponding 3rd insn
> >> > in the slot, to not jump to first plt slot - 16, but a few bytes before that
> >> > that would just load the address of _G_O_T_ into %ebx and then fallthru
> >> > into the 0x4c2b7310 snippet above.  The lazy binding would be a few ticks
> >> > slower in that case, but no requirement on %ebx to contain _G_O_T_.
> >>
> >> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT stub.
> >
> > Indeed. And the situation is the same on almost all targets. The only
> > exceptions are those with direct PC-relative addressing (like x86_64)
> > and those with reserved inter-procedural linkage registers and
> > efficient PC-relative address loading via them (like ARM and AArch64).
> > MIPS (o32) is also an interesting exception in that the normal ABI is
> > already PLT-free, and while callees need a PIC register loaded, it's a
> > call-clobbered register, not a call-saved one, so it doesn't make the
> > same kind of trouble,
> >
> > I really don't see a need to make no-PLT code gen support lazy binding
> > when it's necessarily going to be costly to do so, and precludes most
> > of the benefits of the no-PLT approach. Anyone still wanting/needing
> > lazy binding semantics can use PLT, and can even choose on a per-TU
> > basis (or maybe even more fine-grained with pragmas/attributes?).
> > Those of us who are suffering the cost of PLT with no benefits
> > (because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by
> > adding -fno-plt) and enjoy something like a 10% performance boost in
> > PIC/PIE.
> >
> 
> There are things compiler can do for performance and correctness
> if it is told what options will be passed to linker.  -z now is one and
> -Bsymbolic is another one:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65886
> 
> I think we should add -fnow and -fsymbolic.  Together with LTO,
> we can generate faster executables as well as shared libraries.

I don't see how knowing about -Bsymbolic can help the compiler
optimize. Without visibility, it can't know whether the symbols will
be defined in the same DSO. With visibility, it can already do the
equivalent hints. Perhaps it helps in the case where the symbol is
already defined (and non-weak) in the same TU, but I think in this
case it should already be optimizing the reference. Symbol
interposition over top of a non-weak symbol from the same TU is always
invalid and the compiler should not be pessimizing code to make it
work.

As for -fnow, I haven't thought about it much but I also don't see
many places where it could help. The only benefit that comes to mind
is on targets with weak memory order, where it would eliminate some of
the cost of synchronizing TLSDESC lazy bindings (see Szabolcs Nagy's
work on AArch64). It might also benefit PLT calls on such targets, but
you would get a lot more benefit from -fno-plt, and in that case -fnow
would not allow any further optimization.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]