This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PING: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc


GCC built with latest binutils and the patch give the following
performance improve:
spec2000INT +3% at "-O2 -m32", +1,5% at "-O2 -m64".

Some other benchmark scores at "-O2" were also improved up to 6%.
The patch is very efficient for PIE mode.

Thanks,
Evgeny


On Tue, May 5, 2015 at 6:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Apr 22, 2015 at 9:34 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> Normally, with PIE, GCC accesses globals that are extern to the module
>> using GOT.  This is two instructions, one to get the address of the global
>> from GOT and the other to get the value.  Examples:
>>
>> ---
>> extern int a_glob;
>> int
>> main ()
>> {
>>   return a_glob;
>> }
>> ---
>>
>> With PIE, the generated code accesses global via GOT using two memory
>> loads:
>>
>>         movq    a_glob@GOTPCREL(%rip), %rax
>>         movl    (%rax), %eax
>>
>> for 64-bit or
>>
>>         movl    a_glob@GOT(%ecx), %eax
>>         movl    (%eax), %eax
>>
>> for 32-bit.
>>
>> Some experiments on google and SPEC CPU benchmarks show that the extra
>> instruction affects performance by 1% to 5%.
>>
>> Solution - Copy Relocations:
>>
>> When the linker supports copy relocations, GCC can always assume that
>> the global will be defined in the executable.  For globals that are
>> truly extern (come from shared objects), the linker will create copy
>> relocations and have them defined in the executable.  Result is that
>> no global access needs to go through GOT and hence improves performance.
>> We can generate
>>
>>         movl    a_glob(%rip), %eax
>>
>> for 64-bit and
>>
>>         movl    a_glob@GOTOFF(%eax), %eax
>>
>> for 32-bit.  This optimization only applies to undefined non-weak
>> non-TLS global data.  Undefined weak global or TLS data access still
>> must go through GOT.
>>
>> This patch reverts legitimate_pic_address_disp_p change made in revision
>> 218397, which only applies to x86-64.  Instead, this patch updates
>> targetm.binds_local_p to indicate if undefined non-weak non-TLS global
>> data is defined locally in PIE.  It also introduces a new target hook,
>> binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
>> default, binds_tls_local_p is the same as binds_local_p.
>>
>> This patch checks if 32-bit and 64-bit linkers support PIE with copy
>> reloc at configure time.  64-bit linker is enabled in binutils 2.25
>> and 32-bit linker is enabled in binutils 2.26.  This optimization
>> is enabled only if the linker support is available.
>>
>> Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without
>> support for copy relocation in PIE.  OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> ---
>> gcc/
>>
>>         PR target/65846
>>         * configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ...
>>         (HAVE_LD_64BIT_PIE_COPYRELOC): This.
>>         (HAVE_LD_32BIT_PIE_COPYRELOC): New.   Defined to 1 if Linux/ia32
>>         linker supports PIE with copy reloc.
>>         * output.h (default_binds_tls_local_p): New.
>>         (default_binds_local_p_3): Add 2 bool arguments.
>>         * target.def (binds_tls_local_p): New target hook.
>>         * varasm.c (decl_default_tls_model): Replace targetm.binds_local_p
>>         with targetm.binds_tls_local_p.
>>         (default_binds_local_p_3): Add a bool argument to indicate TLS
>>         variable and a bool argument to indicate if an undefined non-TLS
>>         non-weak data is local.  Double check TLS variable.  If an
>>         undefined non-TLS non-weak data is local, treat it as defined
>>         locally.
>>         (default_binds_local_p): Pass false and false to
>>         default_binds_local_p_3.
>>         (default_binds_local_p_2): Likewise.
>>         (default_binds_local_p_1): Likewise.
>>         (default_binds_tls_local_p): New.
>>         * config.in: Regenerated.
>>         * configure: Likewise.
>>         * doc/tm.texi: Likewise.
>>         * config/i386/i386.c (legitimate_pic_address_disp_p): Don't
>>         check HAVE_LD_PIE_COPYRELOC here.
>>         (ix86_binds_local): New.
>>         (ix86_binds_tls_local_p): Likewise.
>>         (ix86_binds_local_p): Use it.
>>         (TARGET_BINDS_TLS_LOCAL_P): New.
>>         * doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook.
>>
>> gcc/testsuite/
>>
>>         PR target/65846
>>         * gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32.
>>         * gcc.target/i386/pie-copyrelocs-2.c: Likewise.
>>         * gcc.target/i386/pie-copyrelocs-3.c: Likewise.
>>         * gcc.target/i386/pie-copyrelocs-4.c: Likewise.
>>         * gcc.target/i386/pr32219-9.c: Likewise.
>>         * gcc.target/i386/pr32219-10.c: New file.
>>
>>         * lib/target-supports.exp (check_effective_target_pie_copyreloc):
>>         Check HAVE_LD_64BIT_PIE_COPYRELOC and HAVE_LD_32BIT_PIE_COPYRELOC
>>         instead of HAVE_LD_64BIT_PIE_COPYRELOC.
>
> Richard, Jeff,
>
> Can you review this patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01331.html
>
> Thanks.
>
>
>
> --
> H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]