This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Expand PIC calls without PLT with -fno-plt


On Wed, May 6, 2015 at 11:37 AM, Rich Felker <dalias@libc.org> wrote:
> On Wed, May 06, 2015 at 11:26:29AM -0700, H.J. Lu wrote:
>> On Wed, May 6, 2015 at 10:35 AM, Rich Felker <dalias@libc.org> wrote:
>> > On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote:
>> >> On Wed, 6 May 2015, Jakub Jelinek wrote:
>> >> > The linker would know very well what kind of relocations are used for
>> >> > particular PLT slot, and for the new relocations which would resolve to the
>> >> > address of the .got.plt slot it could just tweak corresponding 3rd insn
>> >> > in the slot, to not jump to first plt slot - 16, but a few bytes before that
>> >> > that would just load the address of _G_O_T_ into %ebx and then fallthru
>> >> > into the 0x4c2b7310 snippet above.  The lazy binding would be a few ticks
>> >> > slower in that case, but no requirement on %ebx to contain _G_O_T_.
>> >>
>> >> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT stub.
>> >
>> > Indeed. And the situation is the same on almost all targets. The only
>> > exceptions are those with direct PC-relative addressing (like x86_64)
>> > and those with reserved inter-procedural linkage registers and
>> > efficient PC-relative address loading via them (like ARM and AArch64).
>> > MIPS (o32) is also an interesting exception in that the normal ABI is
>> > already PLT-free, and while callees need a PIC register loaded, it's a
>> > call-clobbered register, not a call-saved one, so it doesn't make the
>> > same kind of trouble,
>> >
>> > I really don't see a need to make no-PLT code gen support lazy binding
>> > when it's necessarily going to be costly to do so, and precludes most
>> > of the benefits of the no-PLT approach. Anyone still wanting/needing
>> > lazy binding semantics can use PLT, and can even choose on a per-TU
>> > basis (or maybe even more fine-grained with pragmas/attributes?).
>> > Those of us who are suffering the cost of PLT with no benefits
>> > (because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by
>> > adding -fno-plt) and enjoy something like a 10% performance boost in
>> > PIC/PIE.
>> >
>>
>> There are things compiler can do for performance and correctness
>> if it is told what options will be passed to linker.  -z now is one and
>> -Bsymbolic is another one:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65886
>>
>> I think we should add -fnow and -fsymbolic.  Together with LTO,
>> we can generate faster executables as well as shared libraries.
>
> I don't see how knowing about -Bsymbolic can help the compiler
> optimize. Without visibility, it can't know whether the symbols will
> be defined in the same DSO. With visibility, it can already do the
> equivalent hints. Perhaps it helps in the case where the symbol is
> already defined (and non-weak) in the same TU, but I think in this
> case it should already be optimizing the reference. Symbol
> interposition over top of a non-weak symbol from the same TU is always
> invalid and the compiler should not be pessimizing code to make it
> work.

-Bsymbolic will bind all references to local definitions in shared libraries,
with and without visibility, weak or non-weak.  Compiler can use it
in binds_tls_local_p and we can generate much better codes in shared
libraries.

> As for -fnow, I haven't thought about it much but I also don't see
> many places where it could help. The only benefit that comes to mind
> is on targets with weak memory order, where it would eliminate some of
> the cost of synchronizing TLSDESC lazy bindings (see Szabolcs Nagy's
> work on AArch64). It might also benefit PLT calls on such targets, but
> you would get a lot more benefit from -fno-plt, and in that case -fnow
> would not allow any further optimization.
>

-fno-plt doesn't work with lazy binding.  -fnow tells compiler that
lazy binding is not used and it can optimize without PLT.  With
-flto -fnow, compiler can make much better choices.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]