This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: Use 32-byte PLT to preserve bound registers


Here is a proposal to use 32-byte PLT to preserve bound registers.
Any comments?

BTW, we are working on another proposal to use a second PLT
section with 8 byte or 16 byte memory overhead, instead of
24 byte overhead.

-- 
H.J.
---
Intel MPX:

http://software.intel.com/sites/default/files/319433-015.pdf

introduces 4 bound registers, which will be used for parameter passing
in x86-64.  Bound registers are cleared by branch instructions.  Branch
instructions with BND prefix will keep bound register contents. This leads
to 2 requirements to 64-bit MPX run-time:

1. Dynamic linker (ld.so) should save and restore bound registers during
symbol lookup.
2. Change the current 16-byte PLT0:

  ff 35 08 00 00 00    pushq  GOT+8(%rip)
  ff 25 00 10 00    jmpq  *GOT+16(%rip)
  0f 1f 40 00        nopl   0x0(%rax)

and 16-byte PLT1:

  ff 25 00 00 00 00        jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00           pushq  $index
  e9 00 00 00 00           jmpq   PLT0

which clear bound registers, to preserve bound registers.

We use 2 new relocations:

#define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix */
#define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */

to mark branch instructions with BND prefix.

When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
it switches to a different PLT0:

  ff 35 08 00 00 00    pushq  GOT+8(%rip)
  f2 ff 25 00 10 00    bnd jmpq *GOT+16(%rip)
  0f 1f 00        nopl   (%rax)

to preserve bound registers for symbol lookup.  For a symbol with
R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use
a 32-byte PLT1:

  f2 ff 25 00 00 00 00        bnd jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00        pushq       $index
  f2 e9 00 00 00 00           bnd jmpq   PLT0
  0f 1f 80 00 00 00 00        nopl       0(%rax)
  0f 1f 80 00 00 00 00        nopl       0(%rax)

Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and
GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
the corresponding the pushq offset with

GOT[1] + (GOT offset - &GOT[3]) * 2

It depends on that each pushq is 16-byte apart and GOT entry is 8 byte.
To support prelink, each 16-byte block in PLT must have an 8-byte entry
in GOT.  Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1.
Then we can undo prelink by computing the corresponding the pushq offset
with

pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2
pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0

For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND
relocations, this approach increases PLT size by 16 bytes and
GOT size by 8 bytes.  That is 24 bytes in total.

Pros: No additional sections are needed.
Cons: 24-byte memory overhead for each symbol with BND relocation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]