[PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations

Sriraman Tallam tmsriram@google.com
Fri Jul 11 17:42:00 GMT 2014


Ping.

On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Uros,
>
>    Could you please review this patch?
>
> Thanks
> Sri
>
> On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Patch Updated.
>>
>> Sri
>>
>> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Ping.
>>>
>>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Ping.
>>>>
>>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Optimize access to globals with -fpie, x86_64 only:
>>>>>
>>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
>>>>> using the GOT.  This is two instructions, one to get the address of the global
>>>>> from the GOT and the other to get the value.  If it turns out that the global
>>>>> gets defined in the executable at link-time, it still needs to go through the
>>>>> GOT as it is too late then to generate a direct access.
>>>>>
>>>>> Examples:
>>>>>
>>>>> foo.cc
>>>>> ------
>>>>> int a_glob;
>>>>> int main () {
>>>>>   return a_glob; // defined in this file
>>>>> }
>>>>>
>>>>> With -O2 -fpie -pie, the generated code directly accesses the global via
>>>>> PC-relative insn:
>>>>>
>>>>> 5e0   <main>:
>>>>>    mov    0x165a(%rip),%eax        # 1c40 <a_glob>
>>>>>
>>>>> foo.cc
>>>>> ------
>>>>>
>>>>> extern int a_glob;
>>>>> int main () {
>>>>>   return a_glob; // defined in this file
>>>>> }
>>>>>
>>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two
>>>>> memory loads:
>>>>>
>>>>> 6f0  <main>:
>>>>>    mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
>>>>>    mov    (%rax),%eax
>>>>>
>>>>> This is true even if in the latter case the global was defined in the
>>>>> executable through a different file.
>>>>>
>>>>> Some experiments on google benchmarks shows that the extra memory loads affects
>>>>> performance by 1% to 5%.
>>>>>
>>>>>
>>>>> Solution - Copy Relocations:
>>>>>
>>>>> When the linker supports copy relocations, GCC can always assume that the
>>>>> global will be defined in the executable.  For globals that are truly extern
>>>>> (come from shared objects), the linker will create copy relocations and have
>>>>> them defined in the executable. Result is that no global access needs to go
>>>>> through the GOT and hence improves performance.
>>>>>
>>>>> This patch to the gold linker :
>>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html
>>>>> submitted recently allows gold to generate copy relocations for -pie mode when
>>>>> necessary.
>>>>>
>>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do
>>>>> this.  Note that the BFD linker does not support pie copyrelocs yet and this
>>>>> option cannot be used there.
>>>>>
>>>>> Please review.
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>>
>>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option.
>>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
>>>>>  address is still legitimate in the presence of copy relocations
>>>>>  and -fpie.
>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
>>>>>
>>>>>
>>>>>
>>>>> Patch attached.
>>>>> Thanks
>>>>> Sri



More information about the Gcc-patches mailing list