This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Expand PIC calls without PLT with -fno-plt


On Mon, 4 May 2015, Jeff Law wrote:
> On 05/04/2015 11:39 AM, Jakub Jelinek wrote:
> > On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote:
> > > On 05/04/2015 10:37 AM, Alexander Monakov wrote:
> > > > This patch introduces option -fno-plt that allows to expand calls that
> > > > would
> > > > go via PLT to load the address of the function immediately at call site
> > > > (which
> > > > introduces a GOT load).  Cover letter explains the motivation for this
> > > > patch.
> > > >
> > > > New option documentation for invoke.texi is missing from the patch; if
> > > > this is
> > > > accepted I'll be happy to send a v2 with documentation added.
> > > >
> > > >  * calls.c (prepare_call_address): Transform PLT call to GOT lookup and
> > > >  indirect call by forcing address into a pseudo with -fno-plt.
> > > >  * common.opt (flag_plt): New option.
> > > OK once you cobble together the invoke.texi changes.
> >
> > Isn't what Michael/Alan suggested better?  I mean as/ld/compiler changes to
> > inline the plt slot's first part, then lazy binding will work fine.
> I must have missed Alan/Michael's message.
> 
> ISTM the win here is that by going through the GOT, you can CSE the GOT
> reference and possibly get some more register allocation freedom.  Is that
> still the case with Alan/Michael's approach?

If the same PLT stubs as today are to be used, it constrains the compiler on
32-bit x86 and possibly other arches where PLT stubs need GOT pointer in a
specific register.  It's possible to imagine more complex PLT stubs that
obtain GOT pointer on their own, but in that case you can't let optimizations
such as loop invariant motion move the GOT load away from the call in a
fashion that could result in PLT stub pointer be reused many times.

Going ahead with this patch now allows anyone to play with no-PLT codegen on
any architecture.  As you can see from this series, on x86 it uncovered several
codegen blunders (and fixing those should improve normal codegen as well -- so
everybody wins).

Below is my proposed patch for invoke.texi.  Still OK to check in?

	* doc/invoke.texi (Code Generation Options): Add -fno-plt.
	([-fno-plt]): Document.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 520c2c5..fd4199c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1122,7 +1122,7 @@ See S/390 and zSeries Options.
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
 -fno-common  -fno-ident @gol
--fpcc-struct-return  -fpic  -fPIC -fpie -fPIE @gol
+-fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
 -freg-struct-return  -fshort-enums @gol
@@ -23615,6 +23615,16 @@ used during linking.
 @code{__pie__} and @code{__PIE__}.  The macros have the value 1
 for @option{-fpie} and 2 for @option{-fPIE}.
 
+@item -fno-plt
+@opindex fno-plt
+Do not use PLT for external function calls in position-independent code.
+Instead, load callee address at call site from GOT and branch to it.
+This leads to more efficient code by eliminating PLT stubs and exposing
+GOT load to optimizations.  On architectures such as 32-bit x86 where
+PLT stubs expect GOT pointer in a specific register, this gives more
+register allocation freedom to the compiler.  Lazy binding requires PLT:
+with @option{-fno-plt} all external symbols are resolved at load time.
+
 @item -fno-jump-tables
 @opindex fno-jump-tables
 Do not use jump tables for switch statements even where it would be


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]