This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH i386] Allow sibcalls in no-PLT PIC


On Tue, May 19, 2015 at 05:10:11PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 1:54 PM, Rich Felker <dalias@libc.org> wrote:
> > On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 1:15 PM, Rich Felker <dalias@libc.org> wrote:
> >> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> >> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson <rth@redhat.com> wrote:
> >> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson <rth@redhat.com> wrote:
> >> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
> >> >> >>>> I'm still mildly worried that concerns for supporting
> >> >> >>>> relaxation might lead to decisions not to optimize code in ways that
> >> >> >>>> would be difficult to relax (e.g. certain types of address load
> >> >> >>>> reordering or hoisting) but I don't understand GCC internals
> >> >> >>>> sufficiently to know if this concern is warranted or not.
> >> >> >>>
> >> >> >>> It is.  The relaxation that HJ is working on requires that the reads from the
> >> >> >>> got not be hoisted.  I'm not especially convinced that what he's working on is
> >> >> >>> a win.
> >> >> >>>
> >> >> >>> With LTO, the compiler can do the same job that he's attempting in the linker,
> >> >> >>> without an extra nop.  Without LTO, leaving it to the linker means that you
> >> >> >>> can't hoist the load and hide the memory latency.
> >> >> >>>
> >> >> >>
> >> >> >> My relax approach won't take away any optimization done by compiler.
> >> >> >> It simply turns indirect branch into direct branch with a nop prefix at
> >> >> >> link-time.  I am having a hard time to understand why we shouldn't do it.
> >> >> >
> >> >> > I well understand what you're doing.
> >> >> >
> >> >> > But my point is that the only time the compiler should present you with the
> >> >> > form of indirect branch you're looking for is when there's no place to hoist
> >> >> > the load.
> >> >> >
> >> >> > At which point, is it really worth adding a new relocation to the ABI?  Is it
> >> >> > really worth adding new code to the linker that won't be exercised often?
> >> >>
> >> >> I believe there are plenty of indirect branches via GOT when compiling
> >> >> PIE/PIC with -fno-plt:
> >> >>
> >> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> >> >> extern void foo (void);
> >> >>
> >> >> void
> >> >> bar (void)
> >> >> {
> >> >>   foo ();
> >> >> }
> >> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> >> >> [hjl@gnu-6 gcc]$ cat x.s
> >> >> ..file "x.c"
> >> >> ..section .text.unlikely,"ax",@progbits
> >> >> ..LCOLDB0:
> >> >> ..text
> >> >> ..LHOTB0:
> >> >> ..p2align 4,,15
> >> >> ..globl bar
> >> >> ..type bar, @function
> >> >> bar:
> >> >> ..LFB0:
> >> >> ..cfi_startproc
> >> >> jmp *foo@GOTPCREL(%rip)
> >> >> ..cfi_endproc
> >> >> ..LFE0:
> >> >> ..size bar, .-bar
> >> >
> >> > I agree these exist. What I question is whether the savings from the
> >> > linker being able to relax this to a direct call in the case where the
> >> > programmer failed to let the compiler make it a direct call to begin
> >> > with (by using hidden or protected visibility) are worth the cost of
> >> > not being able to hoist the load out of loops or schedule it earlier
> >> > in cases where relaxation is not possible because the call target is
> >> > not defined in the same DSO.
> >>
> >> Just for fun.  I compiled binutils as PIE with -fno-plt -flto:
> >>
> >> [hjl@gnu-mic-2 gas]$ file as-new
> >> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> >> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
> >> stripped
> >> [hjl@gnu-mic-2 gas]$
> >>
> >> There are 43:
> >>
> >> ff 25 21 93 2d 00     jmpq   *0x2d9321(%rip)        # 3d5f58 <_DYNAMIC+0x1e8>
> >>
> >> and 1983
> >>
> >> ff 15 eb f4 38 00     callq  *0x38f4eb(%rip)        # 3d60e0 <_DYNAMIC+0x370>
> >
> > How many of those would be relaxed? I suspect it depends a lot on
> > whether libbfd is static or shared.
> 
> When shared libraries are enabled, there are 177 indirect branches
> to locally defined functions.  Call to any locally defined functions,
> which aren't compiled with LTO, is indirect.

And are the above indirect calls/jumps (1983+43) candidates for
scheduling/hoisting the address load (that's not being done yet), or
are they the ones the compiler opted not to schedule/hoist? The win
from relaxation seems small here, but as long as you're not going to
block optimizations that would preclude relaxing, I don't see any
disadvantages to doing it.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]