This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath <> wrote:
> Will an MPX-using binary require an MPX-supporting dynamic linker to run
> correctly?
> * An old dynamic linker won't clobber %bndN directly, so that's not a
>   problem.

These are my answers and likely incorrect.

It will clobber the registers indirectly, though, as soon as it
executes a branching instruction.  The effect will be that calls from
bnd-checked code to bnd-checked code through the dynamic linker will
not succeed.

I have not yet seen the changes this will require to the ABI, but I'm
making the natural assumptions: the first four pointer arguments to a
function will be associated with a pair of bound registers, and
similarly for a returned pointer.  I don't know what the proposal is
for struct parameters and return values.

> * Does having the bounds registers set have any effect on regular/legacy
>   code, or only when bndc[lun] instructions are used?

As far as I can tell, only when the bndXX instructions are used,
though I'd be happy to hear otherwise.

>   If it doesn't affect normal instructions, then I don't entirely
>   understand why it would matter to clear %bnd* when entering or leaving
>   legacy code.  Is it solely for the case of legacy code returning a
>   pointer value, so that the new code would expect the new ABI wherein
>   %bnd0 has been set to correspond to the pointer returned in %rax?

There is no problem with clearing the bnd registers when calling in or
out of legacy code.  The issue is avoiding clearing the pointers when
calling from bnd-enabled code to bnd-enabled code.

> * What's the effect of entering the dynamic linker via "bnd jmp"
>   (i.e. new MPX-using binary with new PLT, old dynamic linker)?  The old
>   dynamic linker will leave %bndN et al exactly as they are, until its
>   first unadorned branching instruction implicitly clears them.  So the
>   only problem would be if the work _dl_runtime_{resolve,profile} does
>   before its first branch/call were affected by the %bndN state.

"It's not a problem."

> In a related vein, what's the effect of entering some legacy code via
> "bnd jmp" (i.e. new binary using PLT call into legacy DSO)?
> * If the state of %bndN et al does not affect legacy code directly, then
>   it's not a problem.  The legacy code will eventually use an unadorned
>   branch instruction, and that will implicitly clear %bnd*.  (Even if
>   it's a leaf function that's entirely branch-free, its return will
>   count as such an unadorned branch instruction.)


> * If that's not the case, ....

It is the case.

> I can't tell if you are proposing that a single object might contain
> both 16-byte and 32-byte PLT slots next to each other in the same .plt
> section.  That seems like a bad idea.  I can think of two things off
> hand that expect PLT entries to be of uniform size, and there may well
> be more.
> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>   the BFD backend knowing the size of PLT entries.  Arguably this ought
>   to look at sh_entsize of .plt instead of using baked-in knowledge, but
>   it doesn't.

This seems fixable.  Of course, we could also keep the PLT the same
length by changing it.  The current PLT entries are

    jmpq *GOT(sym)
    pushq offset
    jmpq plt0

The linker or dynamic linker initializes *GOT(sym) to point to the
second instruction in this sequence.  So we can keep the PLT at 16
bytes by simply changing it to jump somewhere else.

    bnd jmpq *GOT(sym)
    .skip 9

We have the linker or dynamic linker fill in *GOT(sym) to point to the
second PLT table.  When the dynamic linker is involved, we use another
DT tag to point to the second PLT.  The offsets are consistent: there
is one entry in each PLT table, so the dynamic linker can compute the
right value.  Then in the second PLT we have the sequence

    pushq offset
    bnd jmpq plt0

That gives the dynamic linker the offset that it needs to update
*GOT(sym) to point to the runtime symbol value.  So we get slightly
worse instruction cache handling the first time a function is called,
but after that we are the same as before.  And PLT entries are the
same size as always so everything is simpler.

The special DT tag will tell the dynamic linker to apply the special
processing.  No attribute is needed to change behaviour.  The issue
then is: a program linked in this way will not work with an old
dynamic linker, because the old dynamic linker will not initialize
GOT(sym) to the right value.  That is a problem for any scheme, so I
think that is OK.  But if that is a concern, we could actually handle
by generating two PLTs.  One conventional PLT, and another as I just
outlined.  The linker branches to the new PLT, and initializes
GOT(sym) to point to the old PLT.  The dynamic linker spots this
because it recognizes the new DT tags, and cunningly rewrites the GOT
to point to the new PLT.  Cost is an extra jump the first time a
function is called when using the old dynamic linker.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]