This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

From: Jeff Law <law at redhat dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Cc: nd <nd at arm dot com>
Date: Fri, 14 Jul 2017 00:14:29 -0600
Subject: Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law at redhat dot com
Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com C08A24E326
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com C08A24E326
References: <AM5PR0802MB2610577C26553C3E7C2B8FCC83AE0@AM5PR0802MB2610.eurprd08.prod.outlook.com>

On 07/11/2017 06:20 PM, Wilco Dijkstra wrote:
> Jeff Law wrote:
>> aarch64 is the first target that does not have any implicit probes in
>> the caller.  Thus at prologue entry it must make conservative
>> assumptions about the offset of the most recent probed address relative
>> to the stack pointer.
> 
> No - like I mentioned before that's not correct nor acceptable as it would imply
> that ~70% of functions need a probe at entry. I did a quick run across SPEC and
> found the outgoing argument size is > 1024 in just 9 functions out of 147000!
> Those 9 were odd special cases due to auto generated code to interface between
> C and Fortran. This is extremely unlikely to occur anywhere else. So even assuming
> an unchecked caller, large outgoing arguments are simply not a realistic threat.
So would a half-half (2k caller/2k callee) split like Florian has
proposed work for you?  ie, we simply declare a caller that pushes the
stack pointer 2k or more into the guard as broken?

While I'm not really comfortable with the 2k/2k split in general, I
think I can support it from a Red Hat point of view -- largely because
we use 64k pages in RHEL.  So our guard is actually 64k.  Having a
hostile call chain push 2k into the guard isn't so bad in that case.

In fact, with a 64k guard, the dynamic area dwarfs the outgoing args as
the area of most concern.  And with that size guard we're far more
likely to see an attempted jump with an unconstrained alloca rather than
with a fairly large alloca.

And if there's an attacker controlled unconstrained alloca, then, well,
we've lost because no matter the size of the guard, the attacker can
jump it and there isn't a damn thing the callee can do about it.

> 
> Therefore even when using a tiny 4K probe size we can safely adjust SP by 3KB
> before needing an explicit probe - now only 0.6% of functions need a probe.
> If we choose a proper minimum probe distance, say 64KB, explicit probes are
> basically non-existent (just 35 functions, or ~0.02% of all functions are > 64KB).
> Clearly inserting probes can be the default as the impact on code quality is negligible.
I believe Richard Earnshaw indicated that 4k pages were in use on some
aarch64 systems, so I didn't try to use a larger probe interval.  Though
I certainly considered it.  I think that's a decision that belongs in
the hands of the target maintainers.  Though I think it's best if you're
going to use a larger probe interval to mandate a suitable page size in
the ABI.

> 
> With respect to implementation it is relatively easy to decide in aarch64_layout_frame
> which frames need probes and where. I don't think keeping a running offset of the last
> probe/store is useful, it'll just lead to inefficiencies and bugs. 
It's certainly possible.  My goal was the minimize probing while
providing stack-clash safety, even if it meant a more complex tracking
scheme.

Some (simpler) tracking is still needed because allocations happen in
potentially 3 places for aarch64.  There's almost certainly cases where
none of them individually are larger than PROBE_INTERVAL, but as a group
they could be.

But that would certainly be simpler than the tracking I did at the cost
of sometimes probing when it's not strictly necessary.  Again, given the
lack of implicit probes prior to function entry I wanted to do as good a
job as posssible exploiting the implicit probes generated in the prologue.

So how about this.

First we have to disallow any individual allocation from allocating more
than PROBE_INTERVAL bytes.

If the total frame size is less than X (where we're still discussing X),
we emit no probes.

If the total frame size is greater than X, then after the first
allocation we probe the highest address within in the just allocated
area and we probe every PROBE_INTERVAL bytes thereafter as space is
allocated.

PROBE_INTERVAL is currently 4k, but the aarch64 maintainers can choose
to change that.  Note that significantly larger probe intervals may
require tweaking the sequences -- I'm not familiar enough with the
various immediate field limitations on aarch64 instructions to be sure
either way on that issue.

The patch doesn't deal
> with the delayed stores due to shrinkwrapping for example. Inserting probes before
> the prolog would be easier, eg.
Yea, I'm glad you brought up shrink-wrapping.  It's definitely a
concern.  Though probably less so when your suggestions from above.

Jeff

Follow-Ups:
- Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08
  - From: Wilco Dijkstra

References:
- [PATCH][RFA/RFC] Stack clash mitigation patch 07/08
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]