RFC: stack/heap collision vulnerability and mitigation with GCC

Richard Biener richard.guenther@gmail.com
Tue Jun 20 10:18:00 GMT 2017


On Tue, Jun 20, 2017 at 10:03 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Jun 19, 2017 at 7:51 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Mon, Jun 19, 2017 at 11:45:13AM -0600, Jeff Law wrote:
>>> On 06/19/2017 11:29 AM, Jakub Jelinek wrote:
>>> >
>>> > Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>>> > while it is shorter, is it actually faster or as slow as movq $0, (%rsp)
>>> > or movl $0, (%esp) ?
>>> Florian raised this privately to me as well.  THere's a couple issues.
>>>
>>> 1. Is there a performance penalty/gain for sub-word operations?  If not,
>>>    we can improve things slighly there.  Even if it's performance
>>>    neutral we can probably do better on code size.
>>
>> CCing Uros and Honza here, I believe there are at least on x86 penalties
>> for 2-byte, maybe for 1-byte and then sometimes some stalls when you
>> write or read in a different size from a recent write or read.
>
> Don't use orq $0, (%rsp), as this is a high latency RMW insn.

Well, but _maybe_ it's optimized because oring 0 never changes anything?
At least it would be nice if it would only trigger the page-fault side-effect
and then not consume other CPU resources.

I guess micro-benchmark plus performance counters might tell.

> movq $0x0, (%rsp) is fast, but also quite long insn.
>
> push $0x0 increases the stack pointer for 4 or 8 bytes, depending on
> target word size. Push insn also updates delta stack pointer, so
> update of SP is required (effectively, another ALU operation) if SP is
> later referenced from insn other than push/pop/call/ret. There are no
> non-word-sized register pushes.

I only suggested push $0x0 because that doesn't leave the window
open for the async signal where %rsp points somewhere we didn't probe yet.

> I think that for the purpose of stack probe, we can write a byte to
> the end of the stack, so
>
> movb $0x0, (%rsp).
>
> This is relatively short insn, and operates in the same way for 32bit
> and 64bit targets. There are no issues with partial memory stalls
> since nothing immediately reads a different sized value from the
> written location.
>
> Uros.



More information about the Gcc-patches mailing list