This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][RFA/RFC] Stack clash mitigation 0/9

This patch series is designed to mitigate the problems exposed by the
stack-clash exploits.  As I've noted before, the way to address this
class of problems is via a good stack probing strategy.

This has taken much longer than expected to pull together for
submission.  Sorry about that.  However, the delay has led to some clear
improvements on ppc, aarch64 and s390 as well as tests which aren't
eyeballed, but instead are part of the testsuite.

This series introduces -fstack-check=clash which is a variant of
-fstack-check designed to prevent "jumping the stack" as seen in the
stack-clash exploits.

The key ideas here:

Individual stack allocations are never more than PROBE_INTERVAL in size
(4k by default).  Larger allocations are broken up into PROBE_INTERVAL
chunks and each chunk is probed as it is allocated.

No combination of stack allocations can exceed PROBE_INTERVAL bytes
without probing.  ie, if we have an allocation of 2k and a later
allocation of 3k, then there must be a stack probe into the first 4k of
allocated space that executes between the two allocations.

We must consider an environment where code compiled without stack
probing is linked statically or dynamically with code that is compiled
with stack probing.  That is actually the most likely scenario for an
indefinite period of time.  Thus we have to consider the possibility of
a hostile caller in the call stack.

We need not guarantee enough stack space to handle a signal if a probe
hits the guard page.


Probes come in two forms.  They can be explicit or implicit.

Explicit probes are emitted by prologue generation or dynamic stack
allocation routines.  These are net new code and avoiding them when it
is safe to do so helps reduce the overhead of stack probing.

Implicit probes are "probes" that occur as a natural side effect of the
existing code or guarantees provided by the ABI.  They are essentially
free and may allow the compiler to avoid some explicit probes.

Examples of implicit probes include

  1. ISA which pushes the return address onto the stack in a call
     instruction (x86)

  2. ABI mandates that *sp always contain a backchain pointer (ppc)

  3. Prologue stores a register into the stack.  We exploit this on
     aarch64 and s390.  On s390 register saves go into the caller's
     stack frame, on aarch64 register saves hit newly allocated
     space in the callee's frame.  We can exploit both to avoid
     some explicit probing.

I've done implementations for x86, ppc, aarch64 and s390 and the
included tests have been checked against those targets

This patch does not change the probing insn itself.  We've had various
discussions on-list on a better probe insn for x86.  I think the
consensus is to avoid read-modify-write insns.  A testb may ultimately
be best.  This is IMHO an independent implementation detail for each
target and should be handled as a follow-up.  But if folks insist, it's
a trivial change to make as it doesn't fundamentally affect how all this
stuff works.

Other targets that have an existing -fstack-check=specific, but for
which I have not added a -fstack-check=clash implementation get partial
protection against stack clash as well.  This is a side effect of
keeping some of the early code we'd hoped to use to avoid writing a new
probe implementation for each target.


To get a sense of overhead, just 1.5% of routines in glibc need probing
in their prologues (x86) in the testing I performed.  IIRC each and
every one of those routines needed just 1-4 inlined probes.

Significantly more functions need alloca space probed (IIRC ~5%), but
given the amazingly inefficient alloca code, I can't believe anyone will
ever notice the probing overhead.


Patch #1 contains the new option -fstack-check=clash and some dejagnu
infrastructure  (most of which is unused until later patches)

Patch #2 adds the new style probing support to the alloca/vla area and
indirects uses of STACK_CHECK_PROTECT through get_stack_check_protect.

Patch #3 Add some generic dumping support for use by the target prologue

Patch #4 introduces the x86 specific bits

Patch #5 addresses combine-stack-adjustments interactions with

Patch #6 adds PPC support

Patch #7 adds aarch64 support

Patch #8 adds s390 support

The patch series has been bootstrapped and regression tested on
s390x-linux-gnu (another respin of this is still in-progress)

Additionally, each target has been bootstrapped with -fstack-check=clash
enabled by default, the testsuite run and checked for glaring errors.

Earlier versions have also bootstrapped on 32bit PPC and 32bit s390.

Earlier versions have also been used to build and regression test.
glibc-2.17 with -fstack-check=clash enabled by default.  The resulting
x86 and x86_64 libraries also were scanned to verify proper probing.
Similarly for x86_64 builds with the trunk glibc.

An earlier version was also used to build a particular application (that
shall remain nameless) that had the most large frames in RHEL 7.  When
-fstack-check=clash was enabled the scanner was able to verify all the
large frames had been broken down in PROBE_INTERVAL chunks with proper
probing on x86_64.

There are some dependencies between patches.  Prior versions were
bootstrapped and regression tested at each step.  I have not gone
through that rigor after the various rebases and tweaks over the few
weeks.  If partial patches are approved for installing on the trunk I
will verify they work across the set of targets noted above before

Let the comments, questions, flames and suggestions fly :-)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]