This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)


On Mon, Nov 27, 2017 at 03:48:41PM +0000, Szabolcs Nagy wrote:
> On 28/10/17 05:08, Jeff Law wrote:
> > On 10/13/2017 02:26 PM, Wilco Dijkstra wrote:
> >> For larger frames the first oddity is that there are now 2 separate params
> >> controlling how probes are generated:
> >>
> >> stack-clash-protection-guard-size (default 12, but set to 16 on AArch64)
> >> stack-clash-protection-probe-interval (default 12)
> >>
> >> I don't see how this makes sense. These values are closely related, so if
> >> one is different from the other, probing becomes ineffective/incorrect. 
> >> For example we generate code that trivially bypasses the guard despite
> >> all the probing:
> > My hope would be that we simply don't ever use the params.  They were
> > done as much for *you* to experiment with as anything.  I'd happy just
> > delete them as there's essentially no guard rails to ensure their values
> > are sane.
> 
> so is there a consensus now that 64k guard size is what
> gcc stack probing will assume?

I would very much prefer the compiler never assume more than 1 guard
page is present. I understand the performance argument, but I think
it's mostly in artificial scenarios where it makes a difference:

If a function has 4k and actually uses it, you're almost certainly
looking at >4k cycles, and a few cycles to probe the guard page are
irrelevant (dominated by the actual work).

If a function has >4k of local data and code paths that can easily be
determined don't use it (e.g. the big data is in a conditional block),
the compiler should shrink-wrap them anyway in order to avoid
pathological blowing away of the stack like what LLVM does all over
the place (hoisting stack allocations up as far as it can). The probe
can then likewise be moved with the shrinkwrapping.

The only remaining case is functions which have >4k of local data
that's mostly unused, but no easy way to determine that it's not used,
or where which small part is actually used is data-dependant. This is
the case where the probe is mildly costly, but it seems very unlikely
to be a common case in real-world usage.

> if so i can propose a patch to glibc to actually have
> that much guard by default in threads.. (i think it
> makes sense on all 64bit targets to have bigger guard
> and a consensus here would help making that change)

I don't object to making musl libc default to >64k (I'd prefer 68k to
avoid preserving alignment mod a large power of 2) guard size on
64-bit archs (on 32-bit, including ILP32, though, it's prohibitive
because it exhausts virtual address space; this may affect aarch64's
ILP32 model). It doesn't have any significant cost and it's useful
hardening. But I think it would be unfortunate if smaller guard sizes,
which applications can request/set, were left unsafe due to the
compiler making a tradeoff for performance that doesn't actually get
you any measureable real-world performance benefits.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]