This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: RFC: Use 32-byte PLT to preserve bound registers
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: "x86-64-abi at googlegroups dot com" <x86-64-abi at googlegroups dot com>, GCC Development <gcc at gcc dot gnu dot org>, GNU C Library <libc-alpha at sourceware dot org>, Binutils <binutils at sourceware dot org>
- Date: Mon, 18 Nov 2013 11:18:32 -0800
- Subject: Re: RFC: Use 32-byte PLT to preserve bound registers
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOq7jOjFZK4ocMFVWwyw861gHaD0za2AbAjcL+PMNFD=0Q at mail dot gmail dot com>
There is a typo in pushq offset computation. It should be
pushq_offset += ((unsigned char *) pushq_offset)[-6] == 0xf2 ? 1 : 0
instead of
pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0
H.J.
----
On Mon, Nov 18, 2013 at 11:03 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> Here is a proposal to use 32-byte PLT to preserve bound registers.
> Any comments?
>
> BTW, we are working on another proposal to use a second PLT
> section with 8 byte or 16 byte memory overhead, instead of
> 24 byte overhead.
>
> --
> H.J.
> ---
> Intel MPX:
>
> http://software.intel.com/sites/default/files/319433-015.pdf
>
> introduces 4 bound registers, which will be used for parameter passing
> in x86-64. Bound registers are cleared by branch instructions. Branch
> instructions with BND prefix will keep bound register contents. This leads
> to 2 requirements to 64-bit MPX run-time:
>
> 1. Dynamic linker (ld.so) should save and restore bound registers during
> symbol lookup.
> 2. Change the current 16-byte PLT0:
>
> ff 35 08 00 00 00 pushq GOT+8(%rip)
> ff 25 00 10 00 jmpq *GOT+16(%rip)
> 0f 1f 40 00 nopl 0x0(%rax)
>
> and 16-byte PLT1:
>
> ff 25 00 00 00 00 jmpq *name@GOTPCREL(%rip)
> 68 00 00 00 00 pushq $index
> e9 00 00 00 00 jmpq PLT0
>
> which clear bound registers, to preserve bound registers.
>
> We use 2 new relocations:
>
> #define R_X86_64_PC32_BND 39 /* PC relative 32 bit signed with BND prefix */
> #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */
>
> to mark branch instructions with BND prefix.
>
> When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
> it switches to a different PLT0:
>
> ff 35 08 00 00 00 pushq GOT+8(%rip)
> f2 ff 25 00 10 00 bnd jmpq *GOT+16(%rip)
> 0f 1f 00 nopl (%rax)
>
> to preserve bound registers for symbol lookup. For a symbol with
> R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use
> a 32-byte PLT1:
>
> f2 ff 25 00 00 00 00 bnd jmpq *name@GOTPCREL(%rip)
> 68 00 00 00 00 pushq $index
> f2 e9 00 00 00 00 bnd jmpq PLT0
> 0f 1f 80 00 00 00 00 nopl 0(%rax)
> 0f 1f 80 00 00 00 00 nopl 0(%rax)
>
> Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and
> GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing
> the corresponding the pushq offset with
>
> GOT[1] + (GOT offset - &GOT[3]) * 2
>
> It depends on that each pushq is 16-byte apart and GOT entry is 8 byte.
> To support prelink, each 16-byte block in PLT must have an 8-byte entry
> in GOT. Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1.
> Then we can undo prelink by computing the corresponding the pushq offset
> with
>
> pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2
> pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0
>
> For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND
> relocations, this approach increases PLT size by 16 bytes and
> GOT size by 8 bytes. That is 24 bytes in total.
>
> Pros: No additional sections are needed.
> Cons: 24-byte memory overhead for each symbol with BND relocation.
--
H.J.