Bug 81715 - asan-stack=1 redzone allocation is too inflexible
Summary: asan-stack=1 redzone allocation is too inflexible
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: sanitizer (show other bugs)
Version: 7.1.1
: P3 normal
Target Milestone: ---
Assignee: Martin Liška
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-04 10:23 UTC by Arnd Bergmann
Modified: 2017-09-21 12:27 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-09-15 00:00:00


Attachments
preprocessed linux/drivers/media/dvb-frontends/stv090x.c, compressed (213.17 KB, application/gzip)
2017-09-15 13:52 UTC, Arnd Bergmann
Details
preprocessed net/wireless/nl80211.c, compressed (449.35 KB, application/gzip)
2017-09-18 13:25 UTC, Arnd Bergmann
Details
gcc8-pr81715.patch (1.47 KB, patch)
2017-09-20 15:53 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arnd Bergmann 2017-08-04 10:23:09 UTC
In the Linux kernel, we have several instances of code exceeding the permitted stack frame limit (usually 1kb or 2kb per function) when asan-stack=1 is used.

A typical pattern is something like

extern void f_multiple(unsigned char *arg, unsigned long count);
static inline void f_single(unsigned char arg)
{
      f_multiple(&arg, 1);
}
void g(void)
{
     f_single(0);
     f_single(1);
     f_single(2);
     /* ... */
     f_single(100);
}

gcc -Wall -c test.c -O2 -fsanitize=kernel-address --param asan-stack=1 -Wframe-larger-than=0
test.c: In function ‘g’:
test.c:15:1: warning: the frame size of 288 bytes is larger than 0 bytes [-Wframe-larger-than=]

Here, each call to f_single() allocates a new stack location for its argument, and adds a 32-byte redzone before and after each byte to catch out-of-bounds access on the pointer.

In comparison, clang/llvm appears to to better in two ways here:

1. As of https://github.com/llvm-mirror/llvm/commit/daa1bf3b74054#diff-a6f91f41a097bdf01d36783f8bec4ed6R43, the redzone size is dynamically adapted to the object size, using only 16 bytes for variables up to four bytes, but also using much larger redzones for large stack objects.

2. In some cases, the stack for the temporary objects gets reused between calls, leading to the stack usage to no longer scale linearly with the number of calls to the inline helper.

I have a workaround for the kernel that marks all inline functions as __attribute__((noinline)) when I found a code path that has an excessive stack usage with asan-stack using an affected gcc version (gcc-4.9 and higher, I have not tried versions higher than 7.1.1).

It would be good to improve the stack frame allocation here in a future gcc release so we can turn off that workaround again in the kernel when using newer compilers.
Comment 1 Martin Liška 2017-09-15 12:37:24 UTC
May I please ask you for pre-processed source file. Note that with GCC 7 the situation can be even worse as we added -fsanitize-address-use-after-scope.

Having the example I can do some measurements and we'll think about improvements.
Comment 2 Arnd Bergmann 2017-09-15 13:52:48 UTC
Created attachment 42178 [details]
preprocessed linux/drivers/media/dvb-frontends/stv090x.c, compressed

This is one of the typical files showing the behavior, with stack sizes up to 8 KB using gcc, but only a few hundred bytes using clang:

$ /home/arnd/cross-gcc/bin/x86_64-linux-gcc-8.0.0 -c /tmp/stv090x.i  -Werror=frame-larger-than=256 -O2 -fsanitize=kernel-address  --param asan-stack=0 # no warnings

$ /home/arnd/cross-gcc/bin/x86_64-linux-gcc-8.0.0 -c stv090x.i  -Werror=frame-larger-than=128 -O2 -fsanitize=kernel-address  --param asan-stack=1
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_i2c_gate_ctrl':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:810:1: error: the frame size of 288 bytes is larger than 256 bytes [-Werror=frame-larger-than=]

/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_blind_search':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:2067:1: error: the frame size of 8688 bytes is larger than 256 bytes [-Werror=frame-larger-than=]
 }
 ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_optimize_track':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3089:1: error: the frame size of 5872 bytes is larger than 256 bytes [-Werror=frame-larger-than=]
 }
 ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_search':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3505:1: error: the frame size of 8016 bytes is larger than 256 bytes [-Werror=frame-larger-than=]
 }
 ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_attach':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:4998:1: error: the frame size of 1152 bytes is larger than 256 bytes [-Werror=frame-larger-than=]

$ /home/arnd/cross-gcc/bin/clang  -Wframe-larger-than=256 /tmp/stv090x.i -c -Wall -Wno-constant-logical-operand  -O2 -Wno-unused-value -Wno-pointer-sign -fsanitize=kernel-address  -mllvm -asan-stack=0
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3173:34: warning: stack frame size of 264 bytes in function
      'stv090x_algo' [-Wframe-larger-than=]
static enum stv090x_signal_state stv090x_algo(struct stv090x_state *state)

$ /home/arnd/cross-gcc/bin/clang  -Wframe-larger-than=256 /tmp/stv090x.i -c -Wall -Wno-constant-logical-operand  -O2 -Wno-unused-value -Wno-pointer-sign -fsanitize=kernel-address  -mllvm -asan-stack=1
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:756:12: warning: stack frame size of 280 bytes in function
      'stv090x_write_reg' [-Wframe-larger-than=]
static int stv090x_write_reg(struct stv090x_state *state, unsigned int reg, u8 data)
           ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3173:34: warning: stack frame size of 376 bytes in function
      'stv090x_algo' [-Wframe-larger-than=]
static enum stv090x_signal_state stv090x_algo(struct stv090x_state *state)
                                 ^
Comment 3 Arnd Bergmann 2017-09-15 16:03:42 UTC
(In reply to Arnd Bergmann from comment #2)
> Created attachment 42178 [details]
> preprocessed linux/drivers/media/dvb-frontends/stv090x.c, compressed
> 
> This is one of the typical files showing the behavior, with stack sizes up
> to 8 KB using gcc, but only a few hundred bytes using clang:
> 

Looking at it again, it was probably not the best example, in this case the difference between clang and gcc is that clang happens to inline 
stv090x_write_regs while gcc doesn't inline it. If I mark that function as __always_inline, gcc produces good code as well, but that is a little besides the point. In a lot of other examples the function that gets called is from another file, so we can't inline it. I can attach a few other examples if that helps, or you could just replace the function with an extern declaration for testing.
Comment 4 Arnd Bergmann 2017-09-18 13:25:40 UTC
Created attachment 42195 [details]
preprocessed net/wireless/nl80211.c, compressed

This is another file that shows the problem, in fact we hit this with every allmodconfig build using gcc-5 through gcc-8.0.0:

/home/arnd/cross-gcc/bin/x86_64-linux-gcc-5.4.1 -c nl80211.i -O2  --std=gnu89 -Wall -Wno-unused -Wno-pointer-sign -Wframe-larger-than=1024 -fno-strict-aliasing -fno-strict-overflow -fsanitize=kernel-address --param asan-stack=1  -fconserve-stack/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_add_commands_unsplit':
/git/arm-soc/net/wireless/nl80211.c:1411:1: warning: the frame size of 2208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_get_mesh_config':
/git/arm-soc/net/wireless/nl80211.c:5779:1: warning: the frame size of 2048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_send_wiphy':
/git/arm-soc/net/wireless/nl80211.c:1905:1: warning: the frame size of 3856 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
...
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_send_station.isra.47':
/git/arm-soc/net/wireless/nl80211.c:4476:1: warning: the frame size of 2240 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_dump_station':
/git/arm-soc/net/wireless/nl80211.c:4529:1: warning: the frame size of 1168 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_dump_scan':
/git/arm-soc/net/wireless/nl80211.c:7837:1: warning: the frame size of 1280 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }

In comparison:
/home/arnd/cross-gcc/bin/clang -c nl80211.i -O2 --std=gnu89 -Wno-gnu -Wall -Wno-unused -Wno-pointer-sign -Wno-tautological-compare -Wno-constant-logical-operand -no-integrated-as -Wframe-larger-than=1024 -fsanitize=kernel-address -mllvm -asan-stack=1
/git/arm-soc/net/wireless/nl80211.c:1420:12: warning: stack frame size of 1816 bytes in function 'nl80211_send_wiphy'
      [-Wframe-larger-than=]
static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
           ^
/git/arm-soc/net/wireless/nl80211.c:14117:6: warning: stack frame size of 1112 bytes in function
      'cfg80211_del_sta_sinfo' [-Wframe-larger-than=]
void cfg80211_del_sta_sinfo(struct net_device *dev, const u8 *mac_addr,
     ^
/git/arm-soc/net/wireless/nl80211.c:9717:12: warning: stack frame size of 1144 bytes in function
      'cfg80211_cqm_rssi_update' [-Wframe-larger-than=]
static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
           ^
/git/arm-soc/net/wireless/nl80211.c:4531:12: warning: stack frame size of 1112 bytes in function 'nl80211_get_station'
      [-Wframe-larger-than=]
static int nl80211_get_station(struct sk_buff *skb, struct genl_info *info)
           ^
/git/arm-soc/net/wireless/nl80211.c:4478:12: warning: stack frame size of 1240 bytes in function
      'nl80211_dump_station' [-Wframe-larger-than=]
static int nl80211_dump_station(struct sk_buff *skb,

The proposed workaround in the kernel is https://patchwork.kernel.org/patch/9787449/, but it would be good to not need this for future compilers.
Comment 5 Martin Liška 2017-09-19 13:13:10 UTC
I can confirm that for the biggest function 'nl80211_send_wiphy', it really contains majority of stack variables which are 4B large. Having an adaptive redzone mechanism, one can effectively save half of stack. That will still not be enough for 1024B.

Anyway, I there's another example where one needs to limit (or disable ASAN) in kernel: https://patchwork.kernel.org/patch/9653417/

Jakub what's your opinion on this?
Comment 6 Arnd Bergmann 2017-09-19 21:42:04 UTC
(In reply to Martin Liška from comment #5)
> I can confirm that for the biggest function 'nl80211_send_wiphy', it really
> contains majority of stack variables which are 4B large. Having an adaptive
> redzone mechanism, one can effectively save half of stack. That will still
> not be enough for 1024B.

To clarify the limit: The test output was from a 64-bit build, which currently has a 2048 byte limit in the kernel, while 32-bit targets use a 1024 byte limit. 

My series that avoids the problem here and in other files by adding optional 'noinline' attributes has a follow-up series that lowers the 64-bit limit to 1536 bytes after addressing all the warnings, but that is not currently posted for inclusion. If we can get gcc-8 to stay below 2048 bytes for all files without the noinline annotation, we could still keep using that limit.
 
> Anyway, I there's another example where one needs to limit (or disable ASAN)
> in kernel: https://patchwork.kernel.org/patch/9653417/

If I remember correctly, that one only happens with  -fsanitize-address-use-after-scope, and I had a workaround that avoids this by changing the kernel code.
Comment 7 Martin Liška 2017-09-20 10:04:52 UTC
Ok, I'm quite opened for changes that will make smaller red zones for smaller variables. However, in case of sanitization-aware inlining, it's probably too complicated and I would rather use no_inline attribute on places where needed.
Comment 8 Arnd Bergmann 2017-09-20 10:14:54 UTC
(In reply to Martin Liška from comment #7)
> Ok, I'm quite opened for changes that will make smaller red zones for
> smaller variables. However, in case of sanitization-aware inlining, it's
> probably too complicated and I would rather use no_inline attribute on
> places where needed.

Ok, makes sense.

What do you think of a possible optimization of the way that the stack variables get allocated (regardless of asan-stack), to allow gcc to reuse the stack locations for multiple instances of inlining the same function? I guess we don't want to do that for -fsanitize-address-use-after-scope, but for all other cases it sounds like a useful optimization that would drastically reduce the frame size of nl80211_send_wiphy() and many other functions.
When I looked at this in the past, I found that clang does this more often than gcc already, and it did not seem to be impacted by enabling or disabling -fsanitize=kernel-address or asan-stack=1.
Comment 9 Jakub Jelinek 2017-09-20 11:15:15 UTC
Unless -fsanitize-user-after-scope (default for -fsanitize=address, but not for -fsanitize=kernel-address), GCC does reuse stack slots.  Just try say:
void foo (int *, int *, int *, int *, int *, int *);

void
bar ()
{
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int g, b, c, d, e, f;
    foo (&g, &b, &c, &d, &e, &f);
  }
  {
    int h, b, c, d, e, f;
    foo (&h, &b, &c, &d, &e, &f);
  }
  {
    int i, b, c, d, e, f;
    foo (&i, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
}

int
baz (int *p)
{
  return *p;
}
Comment 10 Arnd Bergmann 2017-09-20 12:08:05 UTC
As far as I can tell, gcc doesn't merge stack slots that came from inline functions, as in comment 1, or this example:

void baz (int *, int *, int *, int *, int *, int *);
static inline void foo (int a, int b, int c, int d, int e, int f)
{
    baz (&a, &b, &c, &d, &e, &f);
}
void
bar (int a, int b, int c, int d, int e, int f)
{
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
}

The frame sizes I see here are

gcc-8 -O2:                                            192 bytes
gcc-8 -O2 -fsanitize=address:                        3120 bytes
gcc-8 -O2 -fsanitize=kernel-address:                  192 bytes
gcc-8 -O2 -fsanitize=address asan-stack=0:            192 bytes
gcc-8 -O2 -fsanitize=kernel-address asan-stack=1:    3120 bytes
clang -O2:                                             72 bytes
clang -O2 -fsanitize=address:                          88 bytes
clang -O2 -fsanitize=kernel-address:                  888 bytes
clang -O2 -fsanitize=address asan-stack=0:            104 bytes
clang -O2 -fsanitize=kernel-address asan-stack=0:     104 bytes

(note: clang -fsanitize=kernel-address defaults to asan-stack=1, while gcc defaults to asan-stack=0. gcc-5 and gcc-8 have identical output).
Comment 11 Jakub Jelinek 2017-09-20 12:17:30 UTC
(In reply to Arnd Bergmann from comment #10)
> As far as I can tell, gcc doesn't merge stack slots that came from inline
> functions, as in comment 1, or this example:
> 
> void baz (int *, int *, int *, int *, int *, int *);
> static inline void foo (int a, int b, int c, int d, int e, int f)
> {
>     baz (&a, &b, &c, &d, &e, &f);
> }
> void
> bar (int a, int b, int c, int d, int e, int f)
> {
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
> }
> 
> The frame sizes I see here are
> 
> gcc-8 -O2:                                            192 bytes
> gcc-8 -O2 -fsanitize=address:                        3120 bytes
> gcc-8 -O2 -fsanitize=kernel-address:                  192 bytes
> gcc-8 -O2 -fsanitize=address asan-stack=0:            192 bytes
> gcc-8 -O2 -fsanitize=kernel-address asan-stack=1:    3120 bytes
> clang -O2:                                             72 bytes
> clang -O2 -fsanitize=address:                          88 bytes
> clang -O2 -fsanitize=kernel-address:                  888 bytes
> clang -O2 -fsanitize=address asan-stack=0:            104 bytes
> clang -O2 -fsanitize=kernel-address asan-stack=0:     104 bytes
> 
> (note: clang -fsanitize=kernel-address defaults to asan-stack=1, while gcc
> defaults to asan-stack=0. gcc-5 and gcc-8 have identical output).

That is unrelated to sanitization, we don't merge them with just -O2 either.
We do handle:
void baz (int *, int *, int *, int *, int *, int *);
static inline void foo (int a, int b, int c, int d, int e, int f)
{
    int a2 = a, b2 = b, c2 = c, d2 = d, e2 = e, f2 = f;
    baz (&a2, &b2, &c2, &d2, &e2, &f2);
}
void
bar (int a, int b, int c, int d, int e, int f)
{
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
}
though; the problem is that while for this testcase we have CLOBBER stmts for the addressable variables, when inlining a function which has addressable arguments the inliner doesn't add CLOBBER stmts for the arguments turned into variables.  Let me fix that.
Comment 12 Jakub Jelinek 2017-09-20 15:53:05 UTC
Created attachment 42212 [details]
gcc8-pr81715.patch

Untested fix.
Comment 13 Arnd Bergmann 2017-09-20 19:58:42 UTC
Tested the fix with an x86 allmodconfig kernel (linux-next, with -fsanitize-address-use-after-scope disabled manually). With an arbitrary limit of 1500 bytes (the default is no limit when -fsanitize=kernel-address is used), I get 46 warnings in 22 files without the fix, including the two files I attached earlier.

With the patch applied, only six warnings remain, and they are all below 2048 bytes (which I would then suggest as the new warning limit for sanitized kernels):

net/caif/cfctrl.c:555:1: error: the frame size of 1568 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/crypto/qat/qat_common/qat_hal.c:963:1: error: the frame size of 1800 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/input/mouse/cyapa_gen5.c:2434:1: error: the frame size of 1920 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/isdn/hardware/eicon/message.c:5984:1: error: the frame size of 2016 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/media/tuners/tda8290.c:310:1: error: the frame size of 1664 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/staging/rtl8712/rtl871x_ioctl_linux.c:335:1: error: the frame size of 1712 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]

This is wonderful, it almost solves the entire problem that I have spent several  weeks on finding workarounds for over the past year.

I would still like to see the redzone size change we discussed earlier, to help with the remaining warnings (I'd have to look at the six files I mentioned to see if they would benefit, will open another PR if I see a third problem in there) and with the -fsanitize-address-use-after-scope case. I'll also try to work around unpatched gcc-5/6/7 compilers by using the local variable trick from comment 11 for the worst cases.
Comment 14 Jakub Jelinek 2017-09-20 20:11:25 UTC
Note the patch failed bootstrap, but just the first hunk from it passed bootstrap and is being regtested right now.  I'll need to debug what's wrong with the retval clobbers tomorrow.
Comment 15 Jakub Jelinek 2017-09-21 12:27:06 UTC
Author: jakub
Date: Thu Sep 21 12:26:34 2017
New Revision: 253065

URL: https://gcc.gnu.org/viewcvs?rev=253065&root=gcc&view=rev
Log:
	PR sanitizer/81715
	* tree-inline.c (expand_call_inline): Emit clobber stmts for
	VAR_DECLs to which addressable non-volatile parameters are mapped
	and for id->retvar after the return value assignment.  Clear
	id->retval and id->retbnd after inlining.

	* g++.dg/tree-ssa/pr8781.C (noop): Change argument type from
	const predicate to const predicate & to avoid UB.
	* g++.dg/opt/pr81715.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/opt/pr81715.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr8781.C
    trunk/gcc/tree-inline.c