This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Sat, Mar 10, 2012 at 10:49 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> >>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>>> by checking >>>>>>> >>>>>>> ? ? ? ?movq foo@gottpoff(%rip), %reg >>>>>>> >>>>>>> and >>>>>>> >>>>>>> ? ? ? ?addq foo@gottpoff(%rip), %reg >>>>>>> >>>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>>> instruction. ?With 32bit Pmode, we may not have the REX prefix and >>>>>>> the last byte of the previous instruction may be an offset, which >>>>>>> may look like a REX prefix. ?IE->LE optimization will generate corrupted >>>>>>> binary. ?This patch makes sure we always output an REX pfrefix for >>>>>>> UNSPEC_GOTNTPOFF. ?OK for trunk? >>>>>> >>>>>> Actually, linker has: >>>>>> >>>>>> ? ?case R_X86_64_GOTTPOFF: >>>>>> ? ? ?/* Check transition from IE access model: >>>>>> ? ? ? ? ? ? ? ?mov foo@gottpoff(%rip), %reg >>>>>> ? ? ? ? ? ? ? ?add foo@gottpoff(%rip), %reg >>>>>> ? ? ? */ >>>>>> >>>>>> ? ? ?/* Check REX prefix first. ?*/ >>>>>> ? ? ?if (offset >= 3 && (offset + 4) <= sec->size) >>>>>> ? ? ? ?{ >>>>>> ? ? ? ? ?val = bfd_get_8 (abfd, contents + offset - 3); >>>>>> ? ? ? ? ?if (val != 0x48 && val != 0x4c) >>>>>> ? ? ? ? ? ?{ >>>>>> ? ? ? ? ? ? ?/* X32 may have 0x44 REX prefix or no REX prefix. ?*/ >>>>>> ? ? ? ? ? ? ?if (ABI_64_P (abfd)) >>>>>> ? ? ? ? ? ? ? ?return FALSE; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ?} >>>>>> ? ? ?else >>>>>> ? ? ? ?{ >>>>>> ? ? ? ? ?/* X32 may not have any REX prefix. ?*/ >>>>>> ? ? ? ? ?if (ABI_64_P (abfd)) >>>>>> ? ? ? ? ? ?return FALSE; >>>>>> ? ? ? ? ?if (offset < 2 || (offset + 3) > sec->size) >>>>>> ? ? ? ? ? ?return FALSE; >>>>>> ? ? ? ?} >>>>>> >>>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>>> this is a bug in binutils. >>>>>> >>>>> >>>>> The last byte of the displacement in the previous instruction >>>>> may happen to look like a REX byte. In that case, linker >>>>> will overwrite the last byte of the previous instruction and >>>>> generate the wrong instruction sequence. >>>>> >>>>> I need to update linker to enforce the REX byte check. >>>> >>>> One important observation: if we want to follow the x86_64 TLS spec >>>> strictly, we have to use existing DImode patterns only. This also >>>> means that we should NOT convert other TLS patterns to Pmode, since >>>> they explicitly state movq and addq. If this is not the case, then we >>>> need new TLS specification for X32. >>> >>> Here is a patch to properly generate X32 IE sequence. >>> >>> This is the summary of differences between x86-64 TLS and x32 TLS: >>> >>> ? ? ? ? ? ? ? ? ? ? x86-64 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? x32 >>> GD >>> ? ?byte 0x66; leaq foo@tlsgd(%rip),%rdi; ? ? ? ? leaq foo@tlsgd(%rip),%rdi; >>> ? ?.word 0x6666; rex64; call __tls_get_addr@plt ?.word 0x6666; rex64; >>> call __tls_get_addr@plt >>> >>> GD->IE optimization >>> ? movq %fs:0,%rax; addq x@gottpoff(%rip),%rax ? ?movl %fs:0,%eax; >>> addq x@gottpoff(%rip),%rax >>> >>> GD->LE optimization >>> ? movq %fs:0,%rax; leaq x@tpoff(%rax),%rax ? ? ? movl %fs:0,%eax; >>> leaq x@tpoff(%rax),%rax >>> >>> LD >>> ?leaq foo@tlsld(%rip),%rdi; ? ? ? ? ? ? ? ? ? ? ?leaq foo@tlsld(%rip),%rdi; >>> ?call __tls_get_addr@plt ? ? ? ? ? ? ? ? ? ? ? ? call __tls_get_addr@plt >>> >>> LD->LE optimization >>> ?.word 0x6666; .byte 0x66; movq %fs:0, %rax ? ? ?nopl 0x0(%rax); movl >>> %fs:0, %eax >>> >>> IE >>> ? movq %fs:0,%reg64; ? ? ? ? ? ? ? ? ? ? ? ? ? ? movl %fs:0,%reg32; >>> ? addq x@gottpoff(%rip),%reg64 ? ? ? ? ? ? ? ? ? addl x@gottpoff(%rip),%reg32 >>> >>> ? or >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Not supported if >>> Pmode == SImode >>> ? movq x@gottpoff(%rip),%reg64; ? ? ? ? ? ? ? ? ?movq x@gottpoff(%rip),%reg64; >>> ? movq %fs:(%reg64),%reg32 ? ? ? ? ? ? ? ? ? ? ? movl %fs:(%reg64), %reg32 >>> >>> IE->LE optimization >>> >>> ? movq %fs:0,%reg64; ? ? ? ? ? ? ? ? ? ? ? ? ? ? movl %fs:0,%reg32; >>> ? addq x@gottpoff(%rip),%reg64 ? ? ? ? ? ? ? ? ? addl x@gottpoff(%rip),%reg32 >>> >>> ? to >>> >>> ? movq %fs:0,%reg64; ? ? ? ? ? ? ? ? ? ? ? ? ? ? movl %fs:0,%reg32; >>> ? addq foo@tpoff, %reg64 ? ? ? ? ? ? ? ? ? ? ? ? addl foo@tpoff, %reg32 >>> >>> ? movq %fs:0,%reg64; ? ? ? ? ? ? ? ? ? ? ? ? ? ? movl %fs:0,%reg32; >>> ? leaq foo@tpoff(%reg64), %reg64 ? ? ? ? ? ? ? ? leal foo@tpoff(%reg32), %reg32 >>> >>> ? or >>> >>> ? movq x@gottpoff(%rip),%reg64 ? ? ? ? ? ? ? ? ? movq x@gottpoff(%rip),%reg64; >>> ? movl %fs:(%reg64),%reg32 ? ? ? ? ? ? ? ? ? ? ? movl %fs:(%reg64), %reg32 >>> >>> ? to >>> >>> ? movq foo@tpoff, %reg64 ? ? ? ? ? ? ? ? ? ? ? ? movq foo@tpoff, %reg64 >>> ? movl %fs:(%reeg64),%reg32 ? ? ? ? ? ? ? ? ? ? ?movl %fs:(%reg64), %reg32 >>> >>> LE >>> ? movq %fs:0,%reg64; ? ? ? ? ? ? ? ? ? ? ? ? ? ? movl %fs:0,%reg32; >>> ? leaq x@tpoff(%reg64),%reg32 ? ? ? ? ? ? ? ? ? ?leal x@tpoff(%reg32),%reg32 >>> >>> ? or >>> >>> ? movq %fs:0,%reg64; ? ? ? ? ? ? ? ? ? ? ? ? ? ? movl %fs:0,%reg32; >>> ? addq $x@tpoff,%reg64 ? ? ? ? ? ? ? ? ? ? ? ? ? addl $x@tpoff,%reg32 >>> >>> ? or >>> >>> ? movq %fs:0,%reg64; ? ? ? ? ? ? ? ? ? ? ? ? ? ? movl %fs:0,%reg32; >>> ? movl x@tpoff(%reg64),%reg32 ? ? ? ? ? ? ? ? ? ?movl x@tpoff(%reg32),%reg32 >>> >>> ? or >>> >>> ? movl %fs:x@tpoff,%reg32 ? ? ? ? ? ? ? ? ? ? ? ?movl %fs:x@tpoff,%reg32 >>> >>> >>> X32 TLS implementation is straight forward, except for IE: >>> >>> 1. Since address override works only on the (reg32) part in fs:(reg32), >>> we can't use it as memory operand. ?This patch changes ix86_decompose_address >>> to disallow ?fs:(reg) if Pmode != word_mode. >>> 2. When Pmode == SImode, there may be no REX prefix for ADD. ?Avoid >>> any instructions between MOV and ADD, which may interfere linker >>> IE->LE optimization, since the last byte of the previous instruction >>> before ADD may look like a REX prefix. ?This patch adds tls_initial_exec_x32 >>> to make sure that we always have >>> >>> movl %fs:0, %reg32 >>> addl xgottpoff(%rip), %reg32 >>> >>> so that the last byte of the previous instruction before ADD will >>> never be a REX byte. ?Tested on Linux/x32. >>> >>> 2012-03-09 ?H.J. Lu ?<hongjiu.lu@intel.com> >>> >>> ? ? ? ?* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>> ? ? ? ?if Pmode != word_mode. >>> ? ? ? ?(legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>> ? ? ? ?Pmode == SImode for x32. >>> >>> ? ? ? ?* config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>> ? ? ? ?(tls_initial_exec_x32): Likewise. >> >> Nice solution! >> >> OK for mainline. > > Done. > >> BTW: Did you investigate the issue with memory aliasing? >> > > It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 > which loads address of the TLS symbol. > > Thanks. > Since we must use reg64 in %fs:(%reg) memory operand like movq x@gottpoff(%rip),%reg64; mov %fs:(%reg64),%reg this patch optimizes x32 TLS IE load and store by wrapping %reg64 inside of UNSPEC when Pmode == SImode. OK for trunk? Thanks. -- H.J. --- 2012-03-11 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.md (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise.
Attachment:
gcc-x32-tls-2.patch
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |