This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RFC: Use 32-byte PLT to preserve bound registers
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: "x86-64-abi at googlegroups dot com" <x86-64-abi at googlegroups dot com>, GCC Development <gcc at gcc dot gnu dot org>, GNU C Library <libc-alpha at sourceware dot org>, Binutils <binutils at sourceware dot org>
- Date: Mon, 18 Nov 2013 11:03:21 -0800
- Subject: RFC: Use 32-byte PLT to preserve bound registers
- Authentication-results: sourceware.org; auth=none
Here is a proposal to use 32-byte PLT to preserve bound registers.
Any comments?
BTW, we are working on another proposal to use a second PLT
section with 8 byte or 16 byte memory overhead, instead of
24 byte overhead.
--
H.J.
---
Intel MPX:
http://software.intel.com/sites/default/files/319433-015.pdf
introduces 4 bound registers, which will be used for parameter passing
in x86-64. Bound registers are cleared by branch instructions. Branch
instructions with BND prefix will keep bound register contents. This leads
to 2 requirements to 64-bit MPX run-time:
1. Dynamic linker (ld.so) should save and restore bound registers during
symbol lookup.
2. Change the current 16-byte PLT0:
ff 35 08 00 00 00 pushq GOT+8(%rip)
ff 25 00 10 00 jmpq *GOT+16(%rip)
0f 1f 40 00 nopl 0x0(%rax)
and 16-byte PLT1:
ff 25 00 00 00 00 jmpq *name@GOTPCREL(%rip)
68 00 00 00 00 pushq $index
e9 00 00 00 00 jmpq PLT0
which clear bound registers, to preserve bound registers.
We use 2 new relocations:
#define R_X86_64_PC32_BND 39 /* PC relative 32 bit signed with BND prefix */
#define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */
to mark branch instructions with BND prefix.
When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
it switches to a different PLT0:
ff 35 08 00 00 00 pushq GOT+8(%rip)
f2 ff 25 00 10 00 bnd jmpq *GOT+16(%rip)
0f 1f 00 nopl (%rax)
to preserve bound registers for symbol lookup. For a symbol with
R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use
a 32-byte PLT1:
f2 ff 25 00 00 00 00 bnd jmpq *name@GOTPCREL(%rip)
68 00 00 00 00 pushq $index
f2 e9 00 00 00 00 bnd jmpq PLT0
0f 1f 80 00 00 00 00 nopl 0(%rax)
0f 1f 80 00 00 00 00 nopl 0(%rax)
Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and
GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing
the corresponding the pushq offset with
GOT[1] + (GOT offset - &GOT[3]) * 2
It depends on that each pushq is 16-byte apart and GOT entry is 8 byte.
To support prelink, each 16-byte block in PLT must have an 8-byte entry
in GOT. Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1.
Then we can undo prelink by computing the corresponding the pushq offset
with
pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2
pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0
For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND
relocations, this approach increases PLT size by 16 bytes and
GOT size by 8 bytes. That is 24 bytes in total.
Pros: No additional sections are needed.
Cons: 24-byte memory overhead for each symbol with BND relocation.