This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH i386] Allow sibcalls in no-PLT PIC


On Fri, May 15, 2015 at 1:42 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, May 15, 2015 at 01:35:14PM -0700, H.J. Lu wrote:
>> On Fri, May 15, 2015 at 1:23 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Fri, May 15, 2015 at 01:08:15PM -0700, H.J. Lu wrote:
>> >> With relax branch in 32-bit, there are 2 cases:
>> >>
>> >> 1. PIC or PIE:  We generate
>> >>
>> >> set up EBX
>> >> relax call foo@PLT
>> >>
>> >> It is almost the same as we do now, except for the relax prefix.
>> >> If foo is defined in another shared library or may be preempted,
>> >> linker will generate
>> >>
>> >> call *foo@GOTPLT(%ebx)
>> >>
>> >> If foo turns out local, linker will output
>> >>
>> >> relax call foo
>> >
>> > This does not address the initial and primary motivation for no-plt on
>> > 32-bit: eliminating the awful codegen constraint costs of the
>> > GOT-register (ebx, and equivalent on other targets) ABI for calling
>> > PLT entries. If instead you generated code that sets up an expression
>> > for the GOT slot using arbitrary registers, and relaxed it to a direct
>> > call (possibly rendering the register setup useless), it would be
>> > comparable to the no-plt approach. So for example:
>> >
>> > set up ecx (or whatever register)
>> > relax call *foo@GOT(%ecx)
>> >
>> > and relax to:
>> >
>> > set up ecx (or whatever register; now useless)
>> > relax call foo
>> >
>> > But the no-plt approach is still superior in that the address load
>> > from the GOT can be hoisted out of loops, etc., resulting in something
>> > like:
>> >
>> > call *%esi
>> >
>> > This could be valuable in loops calling a math function repeatedly,
>> > for example.
>> >
>> > Overall I'm still not a fan of the relaxation approach. There are very
>> > few places it would actually help that couldn't already be improved
>> > better with use of visibility, and it can't give codegen as good as
>> > no-plt option.
>>
>> With no-plt option, compiler has to know if a function is external
>> or may be preempted.
>
> I still don't see significant practical cases where the linker would
> know this but the compiler can't. If you use visibility properly, the
> compiler knows, and if you do LTO and -Bsymbolic[-functions], the
> compiler should have that information available at LTO time (this is
> an enhancement that needs to be made, though).

There are codes like

extern void foo (void);

void
bar (void)
{
  foo ();
}

Even with LTO, compiler may have to assume foo is external
when foo is compiled with LTO.

>> If compiler guessed wrong, the generated
>> DSO or executable will always go through indirect branch even
>> though the target is local.
>
> The only way this is avoided now is with -Bsymbolic[-functions] which
> is not widely used. Otherwise interposition is always allowed for
> default-visibility functions, so I don't see how the indirect branch
> here is suboptimal.

Relax branch is to avoid indirect branch to local targets.  If
you don't think  indirect branch to local targets is a performance
issue, relax branch isn't for you.

>> With relax branch, the decision is left
>> to linker.  Of course, EBX must be used unless we add a new PLT
>> relocation for each register used to to hold GOT base, like
>>
>> relax call foo@PLT_ECX
>> relax call foo@PLT_EDX
>
> No, that's not needed. If the linker doesn't make the relaxation, the
> instruction the compiler generated remains in place, and has the
> effective address expression using whichever register it wanted:
>
> relax call *foo@GOT(%ecx)
> relax call *foo@GOT(%edx)
> etc.

relax branch is only used for direct branch and it isn't for indirect
branch. I will implement

relax call foo@PLT(%reg)

The compiler can pick any registers to hold GOT base.  Lazy
binding is supported only when EBX is used.

> If the linker chooses to relax it to a direct call, no register at all
> is needed, so the linker can just throw this away and use:
>
> call foo
>
> for all of them.
>
> Rich



-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]