[PATCH][RFA/RFC] Stack clash mitigation patch 07/08 V2
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Fri Jul 21 18:17:00 GMT 2017
Jeff Law wrote:
> Examples please? We should be probing the outgoing args at the probe
> interval once the total static frame is greater than 3k. The dynamic
> space should be probed by generic code.
OK, here are a few simple examples that enable a successful jump of the stack
guard despite -fstack-clash-protection:
int t1(int x)
{
char arr[3000];
return arr[x];
}
int t2(int x)
{
char *p = __builtin_alloca (4050);
x = t1 (x);
return p[x];
}
#define ARG32(X) X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X
#define ARG192(X) ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X)
void out1(ARG192(__int128));
int t3(int x)
{
if (x < 1000)
return t1 (x) + 1;
out1 (ARG192(0));
return 0;
}
This currently generates:
t1:
sub sp, sp, #3008
add x1, sp, 8
ldrb w0, [x1, w0, sxtw]
add sp, sp, 3008
ret
t2:
stp x29, x30, [sp, -16]!
mov x1, 4072
add x29, sp, 0
sub sp, sp, #4080
mov x2, sp
str xzr, [sp, x1]
bl t1
ldrb w0, [x2, w0, sxtw]
add sp, x29, 0
ldp x29, x30, [sp], 16
ret
t3:
stp x29, x30, [sp, -16]!
cmp w0, 999
add x29, sp, 0
sub sp, sp, #3008
bgt .L15
bl t1
add w0, w0, 1
.L14:
add sp, x29, 0
ldp x29, x30, [sp], 16
ret
As you can see t2 allocates 4080 bytes on the stack but probes only the top,
leaving 4072 bytes unprobed. When it calls t1, it drops the stack by another 3008
bytes without a probe (as it is allowed up to 3KB), so now we've got a distance of
almost 7KB between 2 probes...
Similarly functions with large outgoing arguments are not correctly probed.
t3 creates 3008 bytes of outgoing area without a probe and then calls t1
which will decrement the stack by another 3008 bytes without a probe.
Both t2 and t3 must probe SP+1024 to ensure the callee can adjust SP by up to
3KB before emitting a probe.
> What we should be doing, per your request is emit an initial probe if we
> know the function is going to require probing of any form. Then we emit
> probes at 4k intervals. At least that's how I understood your
> simplification. So for a 7k stack that's two probes -- one at *sp at
> the start of the prologue then the second after the first 4k is allocated.
There is no benefit in doing an initial probe that way. We don't have to probe
the first 3KB even if the function has a larger stack. So if we have a 6KB stack,
we can drop the stack by 3KB, probe, drop it by another 3KB, then push the
callee-saves. If the stack is larger than 7KB we probe at 7KB, then 11KB etc.
> > I don't understand what last_probe_offset is for, it should always be zero
> > after saving the callee-saves (though currently it is not set correctly). And it
> > has to be set to a fixed value to limit the maximum outgoing args when doing
> > final_adjust.
> It's supposed to catch the case where neither the initial adjustment nor
> final adjustment in and of themselves require probes, but the
> combination would.
That's not possible - the 2 cases are completely independent given that the
callee-saves always provide an implicit probe (you can't have outgoing arguments
if there is no call, and you can't have a call without callee-saving LR).
> My understanding was that you didn't want that level of trackign around
> the callee saves. It's also not clear if that will work in the presence
> of separate shrink wrapping.
Shrinkwrapping will be safe once I ensure LR is at the bottom of the callee-saves.
With this fix we know for sure that if there is a call, we've saved LR at SP+0 before
adjusting SP by final_adjust. This means we don't need to try to figure out where
the last probe was - it's always SP+0 if final_adjust != 0, simplifying things.
> I have no idea what you mean by setting to a fixed value for limit the
> maximum outgoing args. We don't limit the maximum outgoing args -- we
> probe that space just like any other as we cross 4k boundaries.
No that's not correct - remember we want to reserve 1024 bytes for the outgoing
area. So we must probe if the outgoing area is > 1024, not 4KB (and we need to
probe again at 5KB, 9KB etc).
> We're clearly not communicating well. Example would probably help.
My t3 example above shows how you can easily jump the stack guard using lots of
outgoing args (but only if you call a different function with no outgoing args first!).
Wilco
More information about the Gcc-patches
mailing list