This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc


On Wed, Apr 22, 2015 at 3:15 PM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Wed, Apr 22, 2015 at 5:34 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> Normally, with PIE, GCC accesses globals that are extern to the module
>> using GOT.  This is two instructions, one to get the address of the global
>> from GOT and the other to get the value.  Examples:
>>
>> ---
>> extern int a_glob;
>> int
>> main ()
>> {
>>   return a_glob;
>> }
>> ---
>>
>> With PIE, the generated code accesses global via GOT using two memory
>> loads:
>>
>>         movq    a_glob@GOTPCREL(%rip), %rax
>>         movl    (%rax), %eax
>>
>> for 64-bit or
>>
>>         movl    a_glob@GOT(%ecx), %eax
>>         movl    (%eax), %eax
>>
>> for 32-bit.
>>
>> Some experiments on google and SPEC CPU benchmarks show that the extra
>> instruction affects performance by 1% to 5%.
>>
>> Solution - Copy Relocations:
>>
>> When the linker supports copy relocations, GCC can always assume that
>> the global will be defined in the executable.  For globals that are
>> truly extern (come from shared objects), the linker will create copy
>> relocations and have them defined in the executable.  Result is that
>> no global access needs to go through GOT and hence improves performance.
>> We can generate
>>
>>         movl    a_glob(%rip), %eax
>>
>> for 64-bit and
>>
>>         movl    a_glob@GOTOFF(%eax), %eax
>>
>> for 32-bit.  This optimization only applies to undefined non-weak
>> non-TLS global data.  Undefined weak global or TLS data access still
>> must go through GOT.
>>
>> This patch reverts legitimate_pic_address_disp_p change made in revision
>> 218397, which only applies to x86-64.  Instead, this patch updates
>> targetm.binds_local_p to indicate if undefined non-weak non-TLS global
>> data is defined locally in PIE.  It also introduces a new target hook,
>> binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
>> default, binds_tls_local_p is the same as binds_local_p.
>>
>> This patch checks if 32-bit and 64-bit linkers support PIE with copy
>> reloc at configure time.  64-bit linker is enabled in binutils 2.25
>> and 32-bit linker is enabled in binutils 2.26.  This optimization
>> is enabled only if the linker support is available.
>>
>> Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without
>> support for copy relocation in PIE.  OK for trunk?
>>
>> Thanks.
>
>
> Looking at this my first reaction was that surely most (if not all ? )
> targets that use ELF and had copy relocs would benefit from this ?
> Couldn't we find a simpler way for targets to have this support ? I
> don't have a more constructive suggestion to make at the minute but
> getting this to work just from the targetm.binds_local_p (decl)
> interface would probably be better ?

default_binds_local_p_3 is a global function which is used to
implement targetm.binds_local_p in x86 backend.  Any backend
can use it to optimize for copy relocation.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]