This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=


On Mon, May 4, 2015 at 7:45 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Thu, 30 Apr 2015, Sriraman Tallam wrote:
>
>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated
>> PLT stubs for some of the hot external library functions like memcmp,
>> pow.  The win was from better icache and itlb performance. The main
>> reason was that the PLT stubs had no spatial locality with the
>> call-sites. I have started looking at ways to tell the compiler to
>> eliminate PLT stubs (in-effect inline them) for specified external
>> functions, for x86_64. I have a proposal and a patch and I would like to
>> hear what you think.
>>
>> This comes with caveats.  This cannot be generally done for all
>> functions marked extern as it is impossible for the compiler to say if a
>> function is "truly extern" (defined in a shared library). If a function
>> is not truly extern(ends up defined in the final executable), then
>> calling it indirectly is a performance penalty as it could have been a
>> direct call.
>
> This can be fixed by Alans idea.
>
>> Further, the newly created GOT entries are fixed up at
>> start-up and do not get lazily bound.
>
> And this can be fixed by some enhancements in the linker and dynamic
> linker.  The idea is to still generate a PLT stub and make its GOT entry
> point to it initially (like a normal got.plt slot).  Then the first
> indirect call will use the address of PLT entry (starting lazy resolution)
> and update the GOT slot with the real address, so further indirect calls
> will directly go to the function.
>
> This requires a new asm marker (and hence new reloc) as normally if
> there's a GOT slot it's filled by the real symbols address, unlike if
> there's only a got.plt slot.  E.g. a
>
>   call *foo@GOTPLT(%rip)
>
> would generate a GOT slot (and fill its address into above call insn), but
> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.
>

I added the "relax" prefix support to x86 assembler on users/hjl/relax
branch

at

https://sourceware.org/git/?p=binutils-gdb.git;a=summary

[hjl@gnu-tools-1 relax-3]$ cat r.S
.text
relax jmp foo
relax call foo
relax jmp foo@plt
relax call foo@plt
[hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S
[hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o

r.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0: 66 e9 00 00 00 00     data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4
   6: 66 e8 00 00 00 00     data16 callq 0xc 8: R_X86_64_RELAX_PC32 foo-0x4
   c: 66 e9 00 00 00 00     data16 jmpq 0x12 e: R_X86_64_RELAX_PLT32foo-0x4
  12: 66 e8 00 00 00 00     data16 callq 0x18 14: R_X86_64_RELAX_PLT32foo-0x4
[hjl@gnu-tools-1 relax-3]$

Right now, the relax relocations are treated as PC32/PLT32 relocations.
I am working on linker support.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]