This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: ramrad01 at arm dot com
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>, Evgeny Stupachenko <evstupac at gmail dot com>, Sriraman Tallam <tmsriram at google dot com>, Uros Bizjak <ubizjak at gmail dot com>
- Date: Wed, 22 Apr 2015 16:08:20 -0700
- Subject: Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc
- Authentication-results: sourceware.org; auth=none
- References: <20150422163432 dot GA1053 at intel dot com> <CAJA7tRYLZXQhnK5xh+Sr=dD7CZdPnX0u57gtVxhF=bR4B=pgaw at mail dot gmail dot com>
On Wed, Apr 22, 2015 at 3:15 PM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Wed, Apr 22, 2015 at 5:34 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> Normally, with PIE, GCC accesses globals that are extern to the module
>> using GOT. This is two instructions, one to get the address of the global
>> from GOT and the other to get the value. Examples:
>>
>> ---
>> extern int a_glob;
>> int
>> main ()
>> {
>> return a_glob;
>> }
>> ---
>>
>> With PIE, the generated code accesses global via GOT using two memory
>> loads:
>>
>> movq a_glob@GOTPCREL(%rip), %rax
>> movl (%rax), %eax
>>
>> for 64-bit or
>>
>> movl a_glob@GOT(%ecx), %eax
>> movl (%eax), %eax
>>
>> for 32-bit.
>>
>> Some experiments on google and SPEC CPU benchmarks show that the extra
>> instruction affects performance by 1% to 5%.
>>
>> Solution - Copy Relocations:
>>
>> When the linker supports copy relocations, GCC can always assume that
>> the global will be defined in the executable. For globals that are
>> truly extern (come from shared objects), the linker will create copy
>> relocations and have them defined in the executable. Result is that
>> no global access needs to go through GOT and hence improves performance.
>> We can generate
>>
>> movl a_glob(%rip), %eax
>>
>> for 64-bit and
>>
>> movl a_glob@GOTOFF(%eax), %eax
>>
>> for 32-bit. This optimization only applies to undefined non-weak
>> non-TLS global data. Undefined weak global or TLS data access still
>> must go through GOT.
>>
>> This patch reverts legitimate_pic_address_disp_p change made in revision
>> 218397, which only applies to x86-64. Instead, this patch updates
>> targetm.binds_local_p to indicate if undefined non-weak non-TLS global
>> data is defined locally in PIE. It also introduces a new target hook,
>> binds_tls_local_p to distinguish TLS variable from non-TLS variable. By
>> default, binds_tls_local_p is the same as binds_local_p.
>>
>> This patch checks if 32-bit and 64-bit linkers support PIE with copy
>> reloc at configure time. 64-bit linker is enabled in binutils 2.25
>> and 32-bit linker is enabled in binutils 2.26. This optimization
>> is enabled only if the linker support is available.
>>
>> Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without
>> support for copy relocation in PIE. OK for trunk?
>>
>> Thanks.
>
>
> Looking at this my first reaction was that surely most (if not all ? )
> targets that use ELF and had copy relocs would benefit from this ?
> Couldn't we find a simpler way for targets to have this support ? I
> don't have a more constructive suggestion to make at the minute but
> getting this to work just from the targetm.binds_local_p (decl)
> interface would probably be better ?
default_binds_local_p_3 is a global function which is used to
implement targetm.binds_local_p in x86 backend. Any backend
can use it to optimize for copy relocation.
--
H.J.