This is the mail archive of the
mailing list for the GCC project.
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
- From: Jeff Law <law at redhat dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, Jakub Jelinek <jakub at redhat dot com>, Eric Botcazou <ebotcazou at adacore dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 19 Jun 2017 16:08:46 -0600
- Subject: Re: RFC: stack/heap collision vulnerability and mitigation with GCC
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com E9904C04B31F
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E9904C04B31F
- References: <email@example.com> <20170619172932.GV2123@tucnak> <759F8732-F3ED-4778-9CD6-9A4DF1015D44@gmail.com> <3FD871AF-91A5-4C77-B5CF-A1E66C02E486@gmail.com>
On 06/19/2017 12:02 PM, Richard Biener wrote:
> On June 19, 2017 8:00:19 PM GMT+02:00, Richard Biener <firstname.lastname@example.org> wrote:
>> On June 19, 2017 7:29:32 PM GMT+02:00, Jakub Jelinek <email@example.com>
>>> On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote:
>>>> After much poking around I concluded that we really need to
>>>> allocation and probing via a "moving sp" strategy. Probing into
>>>> unallocated areas runs afoul of valgrind, so that's a non-starter.
>>>> Allocating stack space, then probing the pages within the space is
>>>> vulnerable to async signal delivery between the allocation point and
>>>> probe point. If that occurs the signal handler could end up running
>>>> a stack that has collided with the heap.
>>>> Ideally we would allocate and probe a page as an atomic unit (which
>>>> feasible on PPC). Alternatively, due to ISA restrictions, allocate
>>>> page, then probe the page as distinct instructions. The latter
>>>> has a race, but we'd have to take the async signal in a single
>>>> instruction window.
>>> And if the allocation is only a page at a time, the single insn race
>>> can be mitigated in the kernel (probe (read-only is fine) the word at
>>> stack when setting up a signal frame for async signal).
>>>> So, time to open the discussion to questions & comments.
>>>> I've got patches I need to cleanup and post for comments that
>>>> this for x86, ppc, aarch64 and s390. x86 and ppc are IMHO in good
>>>> shape. THere's an unhandled case for s390. I've got evaluation
>>>> to do on aarch64.
>>> In the patches Jeff is going to post, we have (at least for
>>> -fasynchronous-unwind-tables which is on by default on e.g. x86)
>>> precise unwind info even with the new stack check mode.
>>> ira.c currently has:
>>> /* We need the frame pointer to catch stack overflow exceptions
>>> the stack pointer is moving (as for the alloca case just above).
>>> || (STACK_CHECK_MOVING_SP
>>> && flag_stack_check
>>> && flag_exceptions
>>> && cfun->can_throw_non_call_exceptions)
>>> For alloca we have a frame pointer for other reasons, the question is
>>> if we really need this hunk even if we provided proper unwind info
>>> even for the Ada -fstack-check mode. Or, if we provide proper unwind
>>> for -fasynchronous-unwind-tables, if the above could not be also
>>> && !flag_asynchronous_unwind_tables. Eric, what exactly is the reason
>>> for the above, is it just lack of proper CFI notes, or something
>>> Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>>> while it is shorter, is it actually faster or as slow as movq $0,
>>> or movl $0, (%esp) ?
>> It at least has the chance of bypassing all of the store queue in CPUs
>> and thus cause no cacheline allocation or trigger prefetching.
>> Not sure if any of that is done though.
>> Performance counters might tell.
>> Otherwise incrementing SP by 4095 and then pushing al would work as
>> well (and be similarly short as the or).
> Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 would solve the signal atomicity as well. Might be larger and somewhat interfere with CPUs stack engine. Who knows...
Happy to rely on Honza or Uros for guidance on that. Though we do have
to maintain proper stack alignment, right?