This is the mail archive of the
mailing list for the GCC project.
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>
- Cc: Jakub Jelinek <jakub at redhat dot com>, Jeff Law <law at redhat dot com>, Jan Hubicka <jh at suse dot cz>, Eric Botcazou <ebotcazou at adacore dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 20 Jun 2017 12:18:33 +0200
- Subject: Re: RFC: stack/heap collision vulnerability and mitigation with GCC
- Authentication-results: sourceware.org; auth=none
- References: <email@example.com> <20170619172932.GV2123@tucnak> <firstname.lastname@example.org> <20170619175149.GY2123@tucnak> <CAFULd4Z98CnqziqTg4GKE8uomepdzkpSeMQ40n1kF=cpA973PA@mail.gmail.com>
On Tue, Jun 20, 2017 at 10:03 AM, Uros Bizjak <email@example.com> wrote:
> On Mon, Jun 19, 2017 at 7:51 PM, Jakub Jelinek <firstname.lastname@example.org> wrote:
>> On Mon, Jun 19, 2017 at 11:45:13AM -0600, Jeff Law wrote:
>>> On 06/19/2017 11:29 AM, Jakub Jelinek wrote:
>>> > Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>>> > while it is shorter, is it actually faster or as slow as movq $0, (%rsp)
>>> > or movl $0, (%esp) ?
>>> Florian raised this privately to me as well. THere's a couple issues.
>>> 1. Is there a performance penalty/gain for sub-word operations? If not,
>>> we can improve things slighly there. Even if it's performance
>>> neutral we can probably do better on code size.
>> CCing Uros and Honza here, I believe there are at least on x86 penalties
>> for 2-byte, maybe for 1-byte and then sometimes some stalls when you
>> write or read in a different size from a recent write or read.
> Don't use orq $0, (%rsp), as this is a high latency RMW insn.
Well, but _maybe_ it's optimized because oring 0 never changes anything?
At least it would be nice if it would only trigger the page-fault side-effect
and then not consume other CPU resources.
I guess micro-benchmark plus performance counters might tell.
> movq $0x0, (%rsp) is fast, but also quite long insn.
> push $0x0 increases the stack pointer for 4 or 8 bytes, depending on
> target word size. Push insn also updates delta stack pointer, so
> update of SP is required (effectively, another ALU operation) if SP is
> later referenced from insn other than push/pop/call/ret. There are no
> non-word-sized register pushes.
I only suggested push $0x0 because that doesn't leave the window
open for the async signal where %rsp points somewhere we didn't probe yet.
> I think that for the purpose of stack probe, we can write a byte to
> the end of the stack, so
> movb $0x0, (%rsp).
> This is relatively short insn, and operates in the same way for 32bit
> and 64bit targets. There are no issues with partial memory stalls
> since nothing immediately reads a different sized value from the
> written location.