This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Expand PIC calls without PLT with -fno-plt


On Wed, May 06, 2015 at 06:24:58PM +0300, Alexander Monakov wrote:
> If the same PLT stubs as today are to be used, it constrains the compiler on
> 32-bit x86 and possibly other arches where PLT stubs need GOT pointer in a
> specific register.  It's possible to imagine more complex PLT stubs that
> obtain GOT pointer on their own, but in that case you can't let optimizations
> such as loop invariant motion move the GOT load away from the call in a
> fashion that could result in PLT stub pointer be reused many times.

Why?
32-bit x86 (shouldn't we care much more about x86-64, where this is a
non-issue?) PLT looks like:

4c2b7310 <_Unwind_Find_FDE@plt-0x10>:
4c2b7310:       ff b3 04 00 00 00       pushl  0x4(%ebx)
4c2b7316:       ff a3 08 00 00 00       jmp    *0x8(%ebx)
4c2b731c:       00 00                   add    %al,(%eax)
        ...

4c2b7320 <_Unwind_Find_FDE@plt>:
4c2b7320:       ff a3 0c 00 00 00       jmp    *0xc(%ebx)
4c2b7326:       68 00 00 00 00          push   $0x0
4c2b732b:       e9 e0 ff ff ff          jmp    4c2b7310

4c2b7330 <realloc@plt>:
4c2b7330:       ff a3 10 00 00 00       jmp    *0x10(%ebx)
4c2b7336:       68 08 00 00 00          push   $0x8
4c2b733b:       e9 d0 ff ff ff          jmp    4c2b7310

The linker would know very well what kind of relocations are used for
particular PLT slot, and for the new relocations which would resolve to the
address of the .got.plt slot it could just tweak corresponding 3rd insn
in the slot, to not jump to first plt slot - 16, but a few bytes before that
that would just load the address of _G_O_T_ into %ebx and then fallthru
into the 0x4c2b7310 snippet above.  The lazy binding would be a few ticks
slower in that case, but no requirement on %ebx to contain _G_O_T_.

As for hoisting the load of the call address before the loop, with lazy
binding that has the obvious disadvantage that you'd resolve the slot again
and again, if you are unlucky enough that the function hasn't been resolved
yet.  Unless the shared PLT stub after computing _G_O_T_ (for x86) also
rechecks the .got.plt address.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]