This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Alexander Monakov <amonakov at ispras dot ru>
- Cc: Jeff Law <law at redhat dot com>, gcc-patches at gcc dot gnu dot org, Rich Felker <dalias at libc dot org>
- Date: Wed, 6 May 2015 17:45:54 +0200
- Subject: Re: [PATCH] Expand PIC calls without PLT with -fno-plt
- Authentication-results: sourceware.org; auth=none
- References: <1430757479-14241-1-git-send-email-amonakov at ispras dot ru> <1430757479-14241-6-git-send-email-amonakov at ispras dot ru> <5547AD8D dot 9080806 at redhat dot com> <20150504173955 dot GE1751 at tucnak dot redhat dot com> <5547AF7C dot 9030500 at redhat dot com> <alpine dot LNX dot 2 dot 11 dot 1505061730460 dot 22867 at monopod dot intra dot ispras dot ru>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Wed, May 06, 2015 at 06:24:58PM +0300, Alexander Monakov wrote:
> If the same PLT stubs as today are to be used, it constrains the compiler on
> 32-bit x86 and possibly other arches where PLT stubs need GOT pointer in a
> specific register. It's possible to imagine more complex PLT stubs that
> obtain GOT pointer on their own, but in that case you can't let optimizations
> such as loop invariant motion move the GOT load away from the call in a
> fashion that could result in PLT stub pointer be reused many times.
Why?
32-bit x86 (shouldn't we care much more about x86-64, where this is a
non-issue?) PLT looks like:
4c2b7310 <_Unwind_Find_FDE@plt-0x10>:
4c2b7310: ff b3 04 00 00 00 pushl 0x4(%ebx)
4c2b7316: ff a3 08 00 00 00 jmp *0x8(%ebx)
4c2b731c: 00 00 add %al,(%eax)
...
4c2b7320 <_Unwind_Find_FDE@plt>:
4c2b7320: ff a3 0c 00 00 00 jmp *0xc(%ebx)
4c2b7326: 68 00 00 00 00 push $0x0
4c2b732b: e9 e0 ff ff ff jmp 4c2b7310
4c2b7330 <realloc@plt>:
4c2b7330: ff a3 10 00 00 00 jmp *0x10(%ebx)
4c2b7336: 68 08 00 00 00 push $0x8
4c2b733b: e9 d0 ff ff ff jmp 4c2b7310
The linker would know very well what kind of relocations are used for
particular PLT slot, and for the new relocations which would resolve to the
address of the .got.plt slot it could just tweak corresponding 3rd insn
in the slot, to not jump to first plt slot - 16, but a few bytes before that
that would just load the address of _G_O_T_ into %ebx and then fallthru
into the 0x4c2b7310 snippet above. The lazy binding would be a few ticks
slower in that case, but no requirement on %ebx to contain _G_O_T_.
As for hoisting the load of the call address before the loop, with lazy
binding that has the obvious disadvantage that you'd resolve the slot again
and again, if you are unlucky enough that the function hasn't been resolved
yet. Unless the shared PLT stub after computing _G_O_T_ (for x86) also
rechecks the .got.plt address.
Jakub