Bug 81715 - asan-stack=1 redzone allocation is too inflexible
Summary: asan-stack=1 redzone allocation is too inflexible
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: sanitizer (show other bugs)
Version: 7.1.1
: P3 normal
Target Milestone: 9.0
Assignee: Martin Liška
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-04 10:23 UTC by Arnd Bergmann
Modified: 2018-02-21 10:24 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-09-15 00:00:00


Attachments
preprocessed linux/drivers/media/dvb-frontends/stv090x.c, compressed (213.17 KB, application/gzip)
2017-09-15 13:52 UTC, Arnd Bergmann
Details
preprocessed net/wireless/nl80211.c, compressed (449.35 KB, application/gzip)
2017-09-18 13:25 UTC, Arnd Bergmann
Details
gcc8-pr81715.patch (1.47 KB, patch)
2017-09-20 15:53 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arnd Bergmann 2017-08-04 10:23:09 UTC
In the Linux kernel, we have several instances of code exceeding the permitted stack frame limit (usually 1kb or 2kb per function) when asan-stack=1 is used.

A typical pattern is something like

extern void f_multiple(unsigned char *arg, unsigned long count);
static inline void f_single(unsigned char arg)
{
      f_multiple(&arg, 1);
}
void g(void)
{
     f_single(0);
     f_single(1);
     f_single(2);
     /* ... */
     f_single(100);
}

gcc -Wall -c test.c -O2 -fsanitize=kernel-address --param asan-stack=1 -Wframe-larger-than=0
test.c: In function ‘g’:
test.c:15:1: warning: the frame size of 288 bytes is larger than 0 bytes [-Wframe-larger-than=]

Here, each call to f_single() allocates a new stack location for its argument, and adds a 32-byte redzone before and after each byte to catch out-of-bounds access on the pointer.

In comparison, clang/llvm appears to to better in two ways here:

1. As of https://github.com/llvm-mirror/llvm/commit/daa1bf3b74054#diff-a6f91f41a097bdf01d36783f8bec4ed6R43, the redzone size is dynamically adapted to the object size, using only 16 bytes for variables up to four bytes, but also using much larger redzones for large stack objects.

2. In some cases, the stack for the temporary objects gets reused between calls, leading to the stack usage to no longer scale linearly with the number of calls to the inline helper.

I have a workaround for the kernel that marks all inline functions as __attribute__((noinline)) when I found a code path that has an excessive stack usage with asan-stack using an affected gcc version (gcc-4.9 and higher, I have not tried versions higher than 7.1.1).

It would be good to improve the stack frame allocation here in a future gcc release so we can turn off that workaround again in the kernel when using newer compilers.
Comment 1 Martin Liška 2017-09-15 12:37:24 UTC
May I please ask you for pre-processed source file. Note that with GCC 7 the situation can be even worse as we added -fsanitize-address-use-after-scope.

Having the example I can do some measurements and we'll think about improvements.
Comment 2 Arnd Bergmann 2017-09-15 13:52:48 UTC
Created attachment 42178 [details]
preprocessed linux/drivers/media/dvb-frontends/stv090x.c, compressed

This is one of the typical files showing the behavior, with stack sizes up to 8 KB using gcc, but only a few hundred bytes using clang:

$ /home/arnd/cross-gcc/bin/x86_64-linux-gcc-8.0.0 -c /tmp/stv090x.i  -Werror=frame-larger-than=256 -O2 -fsanitize=kernel-address  --param asan-stack=0 # no warnings

$ /home/arnd/cross-gcc/bin/x86_64-linux-gcc-8.0.0 -c stv090x.i  -Werror=frame-larger-than=128 -O2 -fsanitize=kernel-address  --param asan-stack=1
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_i2c_gate_ctrl':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:810:1: error: the frame size of 288 bytes is larger than 256 bytes [-Werror=frame-larger-than=]

/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_blind_search':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:2067:1: error: the frame size of 8688 bytes is larger than 256 bytes [-Werror=frame-larger-than=]
 }
 ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_optimize_track':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3089:1: error: the frame size of 5872 bytes is larger than 256 bytes [-Werror=frame-larger-than=]
 }
 ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_search':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3505:1: error: the frame size of 8016 bytes is larger than 256 bytes [-Werror=frame-larger-than=]
 }
 ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c: In function 'stv090x_attach':
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:4998:1: error: the frame size of 1152 bytes is larger than 256 bytes [-Werror=frame-larger-than=]

$ /home/arnd/cross-gcc/bin/clang  -Wframe-larger-than=256 /tmp/stv090x.i -c -Wall -Wno-constant-logical-operand  -O2 -Wno-unused-value -Wno-pointer-sign -fsanitize=kernel-address  -mllvm -asan-stack=0
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3173:34: warning: stack frame size of 264 bytes in function
      'stv090x_algo' [-Wframe-larger-than=]
static enum stv090x_signal_state stv090x_algo(struct stv090x_state *state)

$ /home/arnd/cross-gcc/bin/clang  -Wframe-larger-than=256 /tmp/stv090x.i -c -Wall -Wno-constant-logical-operand  -O2 -Wno-unused-value -Wno-pointer-sign -fsanitize=kernel-address  -mllvm -asan-stack=1
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:756:12: warning: stack frame size of 280 bytes in function
      'stv090x_write_reg' [-Wframe-larger-than=]
static int stv090x_write_reg(struct stv090x_state *state, unsigned int reg, u8 data)
           ^
/git/arm-soc/drivers/media/dvb-frontends/stv090x.c:3173:34: warning: stack frame size of 376 bytes in function
      'stv090x_algo' [-Wframe-larger-than=]
static enum stv090x_signal_state stv090x_algo(struct stv090x_state *state)
                                 ^
Comment 3 Arnd Bergmann 2017-09-15 16:03:42 UTC
(In reply to Arnd Bergmann from comment #2)
> Created attachment 42178 [details]
> preprocessed linux/drivers/media/dvb-frontends/stv090x.c, compressed
> 
> This is one of the typical files showing the behavior, with stack sizes up
> to 8 KB using gcc, but only a few hundred bytes using clang:
> 

Looking at it again, it was probably not the best example, in this case the difference between clang and gcc is that clang happens to inline 
stv090x_write_regs while gcc doesn't inline it. If I mark that function as __always_inline, gcc produces good code as well, but that is a little besides the point. In a lot of other examples the function that gets called is from another file, so we can't inline it. I can attach a few other examples if that helps, or you could just replace the function with an extern declaration for testing.
Comment 4 Arnd Bergmann 2017-09-18 13:25:40 UTC
Created attachment 42195 [details]
preprocessed net/wireless/nl80211.c, compressed

This is another file that shows the problem, in fact we hit this with every allmodconfig build using gcc-5 through gcc-8.0.0:

/home/arnd/cross-gcc/bin/x86_64-linux-gcc-5.4.1 -c nl80211.i -O2  --std=gnu89 -Wall -Wno-unused -Wno-pointer-sign -Wframe-larger-than=1024 -fno-strict-aliasing -fno-strict-overflow -fsanitize=kernel-address --param asan-stack=1  -fconserve-stack/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_add_commands_unsplit':
/git/arm-soc/net/wireless/nl80211.c:1411:1: warning: the frame size of 2208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_get_mesh_config':
/git/arm-soc/net/wireless/nl80211.c:5779:1: warning: the frame size of 2048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_send_wiphy':
/git/arm-soc/net/wireless/nl80211.c:1905:1: warning: the frame size of 3856 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
...
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_send_station.isra.47':
/git/arm-soc/net/wireless/nl80211.c:4476:1: warning: the frame size of 2240 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_dump_station':
/git/arm-soc/net/wireless/nl80211.c:4529:1: warning: the frame size of 1168 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^
/git/arm-soc/net/wireless/nl80211.c: In function 'nl80211_dump_scan':
/git/arm-soc/net/wireless/nl80211.c:7837:1: warning: the frame size of 1280 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }

In comparison:
/home/arnd/cross-gcc/bin/clang -c nl80211.i -O2 --std=gnu89 -Wno-gnu -Wall -Wno-unused -Wno-pointer-sign -Wno-tautological-compare -Wno-constant-logical-operand -no-integrated-as -Wframe-larger-than=1024 -fsanitize=kernel-address -mllvm -asan-stack=1
/git/arm-soc/net/wireless/nl80211.c:1420:12: warning: stack frame size of 1816 bytes in function 'nl80211_send_wiphy'
      [-Wframe-larger-than=]
static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
           ^
/git/arm-soc/net/wireless/nl80211.c:14117:6: warning: stack frame size of 1112 bytes in function
      'cfg80211_del_sta_sinfo' [-Wframe-larger-than=]
void cfg80211_del_sta_sinfo(struct net_device *dev, const u8 *mac_addr,
     ^
/git/arm-soc/net/wireless/nl80211.c:9717:12: warning: stack frame size of 1144 bytes in function
      'cfg80211_cqm_rssi_update' [-Wframe-larger-than=]
static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
           ^
/git/arm-soc/net/wireless/nl80211.c:4531:12: warning: stack frame size of 1112 bytes in function 'nl80211_get_station'
      [-Wframe-larger-than=]
static int nl80211_get_station(struct sk_buff *skb, struct genl_info *info)
           ^
/git/arm-soc/net/wireless/nl80211.c:4478:12: warning: stack frame size of 1240 bytes in function
      'nl80211_dump_station' [-Wframe-larger-than=]
static int nl80211_dump_station(struct sk_buff *skb,

The proposed workaround in the kernel is https://patchwork.kernel.org/patch/9787449/, but it would be good to not need this for future compilers.
Comment 5 Martin Liška 2017-09-19 13:13:10 UTC
I can confirm that for the biggest function 'nl80211_send_wiphy', it really contains majority of stack variables which are 4B large. Having an adaptive redzone mechanism, one can effectively save half of stack. That will still not be enough for 1024B.

Anyway, I there's another example where one needs to limit (or disable ASAN) in kernel: https://patchwork.kernel.org/patch/9653417/

Jakub what's your opinion on this?
Comment 6 Arnd Bergmann 2017-09-19 21:42:04 UTC
(In reply to Martin Liška from comment #5)
> I can confirm that for the biggest function 'nl80211_send_wiphy', it really
> contains majority of stack variables which are 4B large. Having an adaptive
> redzone mechanism, one can effectively save half of stack. That will still
> not be enough for 1024B.

To clarify the limit: The test output was from a 64-bit build, which currently has a 2048 byte limit in the kernel, while 32-bit targets use a 1024 byte limit. 

My series that avoids the problem here and in other files by adding optional 'noinline' attributes has a follow-up series that lowers the 64-bit limit to 1536 bytes after addressing all the warnings, but that is not currently posted for inclusion. If we can get gcc-8 to stay below 2048 bytes for all files without the noinline annotation, we could still keep using that limit.
 
> Anyway, I there's another example where one needs to limit (or disable ASAN)
> in kernel: https://patchwork.kernel.org/patch/9653417/

If I remember correctly, that one only happens with  -fsanitize-address-use-after-scope, and I had a workaround that avoids this by changing the kernel code.
Comment 7 Martin Liška 2017-09-20 10:04:52 UTC
Ok, I'm quite opened for changes that will make smaller red zones for smaller variables. However, in case of sanitization-aware inlining, it's probably too complicated and I would rather use no_inline attribute on places where needed.
Comment 8 Arnd Bergmann 2017-09-20 10:14:54 UTC
(In reply to Martin Liška from comment #7)
> Ok, I'm quite opened for changes that will make smaller red zones for
> smaller variables. However, in case of sanitization-aware inlining, it's
> probably too complicated and I would rather use no_inline attribute on
> places where needed.

Ok, makes sense.

What do you think of a possible optimization of the way that the stack variables get allocated (regardless of asan-stack), to allow gcc to reuse the stack locations for multiple instances of inlining the same function? I guess we don't want to do that for -fsanitize-address-use-after-scope, but for all other cases it sounds like a useful optimization that would drastically reduce the frame size of nl80211_send_wiphy() and many other functions.
When I looked at this in the past, I found that clang does this more often than gcc already, and it did not seem to be impacted by enabling or disabling -fsanitize=kernel-address or asan-stack=1.
Comment 9 Jakub Jelinek 2017-09-20 11:15:15 UTC
Unless -fsanitize-user-after-scope (default for -fsanitize=address, but not for -fsanitize=kernel-address), GCC does reuse stack slots.  Just try say:
void foo (int *, int *, int *, int *, int *, int *);

void
bar ()
{
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int g, b, c, d, e, f;
    foo (&g, &b, &c, &d, &e, &f);
  }
  {
    int h, b, c, d, e, f;
    foo (&h, &b, &c, &d, &e, &f);
  }
  {
    int i, b, c, d, e, f;
    foo (&i, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
  {
    int a, b, c, d, e, f;
    foo (&a, &b, &c, &d, &e, &f);
  }
}

int
baz (int *p)
{
  return *p;
}
Comment 10 Arnd Bergmann 2017-09-20 12:08:05 UTC
As far as I can tell, gcc doesn't merge stack slots that came from inline functions, as in comment 1, or this example:

void baz (int *, int *, int *, int *, int *, int *);
static inline void foo (int a, int b, int c, int d, int e, int f)
{
    baz (&a, &b, &c, &d, &e, &f);
}
void
bar (int a, int b, int c, int d, int e, int f)
{
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
}

The frame sizes I see here are

gcc-8 -O2:                                            192 bytes
gcc-8 -O2 -fsanitize=address:                        3120 bytes
gcc-8 -O2 -fsanitize=kernel-address:                  192 bytes
gcc-8 -O2 -fsanitize=address asan-stack=0:            192 bytes
gcc-8 -O2 -fsanitize=kernel-address asan-stack=1:    3120 bytes
clang -O2:                                             72 bytes
clang -O2 -fsanitize=address:                          88 bytes
clang -O2 -fsanitize=kernel-address:                  888 bytes
clang -O2 -fsanitize=address asan-stack=0:            104 bytes
clang -O2 -fsanitize=kernel-address asan-stack=0:     104 bytes

(note: clang -fsanitize=kernel-address defaults to asan-stack=1, while gcc defaults to asan-stack=0. gcc-5 and gcc-8 have identical output).
Comment 11 Jakub Jelinek 2017-09-20 12:17:30 UTC
(In reply to Arnd Bergmann from comment #10)
> As far as I can tell, gcc doesn't merge stack slots that came from inline
> functions, as in comment 1, or this example:
> 
> void baz (int *, int *, int *, int *, int *, int *);
> static inline void foo (int a, int b, int c, int d, int e, int f)
> {
>     baz (&a, &b, &c, &d, &e, &f);
> }
> void
> bar (int a, int b, int c, int d, int e, int f)
> {
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
>     foo (a, b, c, d, e, f);
> }
> 
> The frame sizes I see here are
> 
> gcc-8 -O2:                                            192 bytes
> gcc-8 -O2 -fsanitize=address:                        3120 bytes
> gcc-8 -O2 -fsanitize=kernel-address:                  192 bytes
> gcc-8 -O2 -fsanitize=address asan-stack=0:            192 bytes
> gcc-8 -O2 -fsanitize=kernel-address asan-stack=1:    3120 bytes
> clang -O2:                                             72 bytes
> clang -O2 -fsanitize=address:                          88 bytes
> clang -O2 -fsanitize=kernel-address:                  888 bytes
> clang -O2 -fsanitize=address asan-stack=0:            104 bytes
> clang -O2 -fsanitize=kernel-address asan-stack=0:     104 bytes
> 
> (note: clang -fsanitize=kernel-address defaults to asan-stack=1, while gcc
> defaults to asan-stack=0. gcc-5 and gcc-8 have identical output).

That is unrelated to sanitization, we don't merge them with just -O2 either.
We do handle:
void baz (int *, int *, int *, int *, int *, int *);
static inline void foo (int a, int b, int c, int d, int e, int f)
{
    int a2 = a, b2 = b, c2 = c, d2 = d, e2 = e, f2 = f;
    baz (&a2, &b2, &c2, &d2, &e2, &f2);
}
void
bar (int a, int b, int c, int d, int e, int f)
{
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
    foo (a, b, c, d, e, f);
}
though; the problem is that while for this testcase we have CLOBBER stmts for the addressable variables, when inlining a function which has addressable arguments the inliner doesn't add CLOBBER stmts for the arguments turned into variables.  Let me fix that.
Comment 12 Jakub Jelinek 2017-09-20 15:53:05 UTC
Created attachment 42212 [details]
gcc8-pr81715.patch

Untested fix.
Comment 13 Arnd Bergmann 2017-09-20 19:58:42 UTC
Tested the fix with an x86 allmodconfig kernel (linux-next, with -fsanitize-address-use-after-scope disabled manually). With an arbitrary limit of 1500 bytes (the default is no limit when -fsanitize=kernel-address is used), I get 46 warnings in 22 files without the fix, including the two files I attached earlier.

With the patch applied, only six warnings remain, and they are all below 2048 bytes (which I would then suggest as the new warning limit for sanitized kernels):

net/caif/cfctrl.c:555:1: error: the frame size of 1568 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/crypto/qat/qat_common/qat_hal.c:963:1: error: the frame size of 1800 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/input/mouse/cyapa_gen5.c:2434:1: error: the frame size of 1920 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/isdn/hardware/eicon/message.c:5984:1: error: the frame size of 2016 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/media/tuners/tda8290.c:310:1: error: the frame size of 1664 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]
drivers/staging/rtl8712/rtl871x_ioctl_linux.c:335:1: error: the frame size of 1712 bytes is larger than 1500 bytes [-Werror=frame-larger-than=]

This is wonderful, it almost solves the entire problem that I have spent several  weeks on finding workarounds for over the past year.

I would still like to see the redzone size change we discussed earlier, to help with the remaining warnings (I'd have to look at the six files I mentioned to see if they would benefit, will open another PR if I see a third problem in there) and with the -fsanitize-address-use-after-scope case. I'll also try to work around unpatched gcc-5/6/7 compilers by using the local variable trick from comment 11 for the worst cases.
Comment 14 Jakub Jelinek 2017-09-20 20:11:25 UTC
Note the patch failed bootstrap, but just the first hunk from it passed bootstrap and is being regtested right now.  I'll need to debug what's wrong with the retval clobbers tomorrow.
Comment 15 Jakub Jelinek 2017-09-21 12:27:06 UTC
Author: jakub
Date: Thu Sep 21 12:26:34 2017
New Revision: 253065

URL: https://gcc.gnu.org/viewcvs?rev=253065&root=gcc&view=rev
Log:
	PR sanitizer/81715
	* tree-inline.c (expand_call_inline): Emit clobber stmts for
	VAR_DECLs to which addressable non-volatile parameters are mapped
	and for id->retvar after the return value assignment.  Clear
	id->retval and id->retbnd after inlining.

	* g++.dg/tree-ssa/pr8781.C (noop): Change argument type from
	const predicate to const predicate & to avoid UB.
	* g++.dg/opt/pr81715.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/opt/pr81715.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr8781.C
    trunk/gcc/tree-inline.c
Comment 16 Jakub Jelinek 2017-10-27 13:57:47 UTC
Did this change help?  Would it be beneficial to backport to 7.x or even 6.x?  As is it is certainly too dangerous, but wonder about additionally guarding the 3 hunks either with (flag_sanitize & SANITIZE_ADDRESS) != 0 or even (flag_sanitize & SANITIZE_KERNEL_ADDRESS) != 0, so it would affect only people using -fsanitize={,kernel-}address or even just -fsanitize=kernel-address.
Comment 17 Jakub Jelinek 2017-10-27 20:34:07 UTC
Author: jakub
Date: Fri Oct 27 20:33:35 2017
New Revision: 254179

URL: https://gcc.gnu.org/viewcvs?rev=254179&root=gcc&view=rev
Log:
	Backported from mainline
	2017-09-21  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/81715
	* tree-inline.c (expand_call_inline): Emit clobber stmts for
	VAR_DECLs to which addressable non-volatile parameters are mapped
	and for id->retvar after the return value assignment, though
	for -fsanitize=kernel-address only.  Clear id->retval and id->retbnd
	after inlining.

	* g++.dg/asan/pr81715.C: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/g++.dg/asan/pr81715.C
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
    branches/gcc-7-branch/gcc/tree-inline.c
Comment 18 Jakub Jelinek 2018-01-18 20:31:17 UTC
Author: jakub
Date: Thu Jan 18 20:30:33 2018
New Revision: 256861

URL: https://gcc.gnu.org/viewcvs?rev=256861&root=gcc&view=rev
Log:
	PR sanitizer/81715
	PR testsuite/83882
	* function.h (gimplify_parameters): Add gimple_seq * argument.
	* function.c: Include gimple.h and options.h.
	(gimplify_parameters): Add cleanup argument, add CLOBBER stmts
	for the added local temporaries if needed.
	* gimplify.c (gimplify_body): Adjust gimplify_parameters caller,
	if there are any parameter cleanups, wrap whole body into a
	try/finally with the cleanups.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/function.c
    trunk/gcc/function.h
    trunk/gcc/gimplify.c
Comment 19 Martin Liška 2018-02-19 14:19:11 UTC
As it's fixed on GCC-7 and currect trunk, can we Jakub close that?
Comment 20 Jakub Jelinek 2018-02-19 14:22:04 UTC
I haven't heard any answer to #c16 whether it actually helped the kernel or not.
Comment 21 Martin Liška 2018-02-19 14:27:40 UTC
I see, let me unassign that.
Comment 22 Arnd Bergmann 2018-02-19 14:59:55 UTC
(In reply to Jakub Jelinek from comment #20)
> I haven't heard any answer to #c16 whether it actually helped the kernel or
> not.

Sorry about that. Yes, it definitely helped the kernel a lot. At this point, we also have localized workarounds to the same effect (using local variables instead of accessing inline function arguments) in all functions that exceeded the arbitrary 2048 byte stack size limit, and backported into the 4.4 kernel and later, but with the newer gcc releases, we also get a lower stack consumption for lots of other functions that were high but below that limit.

I had hoped that we could also do this on gcc-7 branch without KASAN, as high stack consumption is always problematic for the kernel, and the same functions that got bumped over the warning limit with KASAN still suffer from wasted stack space on older compilers without KASAN. Since you consider that too invasive for the stable releases, my current workaround has to suffice.

One side issue that is not solved at all by the patch is -fsanitize-address-use-after-scope, since that still leads to extreme stack usage in the kernel. The problem here is that it forces many local variables into separate stack slots even when they could get reused without -fsanitize-address-use-after-scope, making it still actively dangerous to run kernels built with this option.
My workaround in the kernel is now to have that option disabled by default and only enabled when users explicitly turn it on. I still think it would be nice to address that in the way I originally suggested, by copying the behavior that LLVM uses with its variably sized redzone area.
Comment 23 Martin Liška 2018-02-19 18:39:05 UTC
> One side issue that is not solved at all by the patch is
> -fsanitize-address-use-after-scope, since that still leads to extreme stack
> usage in the kernel. The problem here is that it forces many local variables
> into separate stack slots even when they could get reused without
> -fsanitize-address-use-after-scope, making it still actively dangerous to
> run kernels built with this option.

Note that's crucial to have separate stack slots to properly catch usage of a stack variable that's our of scope.

> My workaround in the kernel is now to have that option disabled by default
> and only enabled when users explicitly turn it on. I still think it would be
> nice to address that in the way I originally suggested, by copying the
> behavior that LLVM uses with its variably sized redzone area.

That's definitely possible for GCC 9. Question is whether such change will be sufficient for you. Do you expect it will reduce stack usage in the desired way?
Comment 24 Arnd Bergmann 2018-02-20 08:57:24 UTC
(In reply to Martin Liška from comment #23)

> That's definitely possible for GCC 9. Question is whether such change will
> be sufficient for you. Do you expect it will reduce stack usage in the
> desired way?

I've recreated my original finding, comparing a clang-5 release against a recent gcc-8 snapshot. Building an x86 allmodconfig kernel with clang, I get 78 -fsanitize-address-use-after-scope warnings against a 2048 byte limit, the largest ones are:

drivers/usb/misc/sisusbvga/sisusb.c:1880:12: warning: stack frame size of 6776 bytes in function 'sisusb_init_gfxcore' [-Wframe-larger-than=]
drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c:1521:1: warning: stack frame size of 5176 bytes in function 'gk104_ram_new_' [-Wframe-larger-than=]
drivers/usb/misc/sisusbvga/sisusb.c:1750:12: warning: stack frame size of 5112 bytes in function 'sisusb_set_default_mode' [-Wframe-larger-than=]
drivers/net/wireless/atmel/atmel.c:1307:5: warning: stack frame size of 5016 bytes in function 'atmel_open' [-Wframe-larger-than=]
net/core/ethtool.c:2549:5: warning: stack frame size of 4568 bytes in function 'dev_ethtool' [-Wframe-larger-than=]
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:19216:6: warning: stack frame size of 4312 bytes in function 'wlc_phy_init_nphy' [-Wframe-larger-than=]
drivers/media/usb/em28xx/em28xx-dvb.c:1129:12: warning: stack frame size of 3992 bytes in function 'em28xx_dvb_init' [-Wframe-larger-than=]
drivers/staging/media/atomisp/pci/atomisp2/css2400/sh_css.c:6802:24: warning: stack frame size of 3960 bytes in function 'load_capture_binaries' [-Wframe-larger-than=]
drivers/staging/wlan-ng/cfg80211.c:454:12: warning: stack frame size of 3864 bytes in function 'prism2_connect' [-Wframe-larger-than=]
drivers/staging/wilc1000/host_interface.c:2480:13: warning: stack frame size of 3704 bytes in function 'host_if_work' [-Wframe-larger-than=]


With gcc-8, the same configuration has 179 warnings, including:

drivers/net/wireless/ralink/rt2x00/rt2800lib.c:5650:1: warning: the frame size of 23632 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:4515:1: warning: the frame size of 14056 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:3879:1: warning: the frame size of 11504 bytes is larger than 2048 bytes [-Wframe-larger-than=]
lib/atomic64_test.c:250:1: warning: the frame size of 11192 bytes is larger than 2048 bytes [-Wframe-larger-than=]
lib/atomic64_test.c:148:1: warning: the frame size of 10360 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/wireless/ralink/rt2x00/rt73usb.c:1294:1: warning: the frame size of 8680 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/fscache/stats.c:287:1: warning: the frame size of 6536 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:8655:1: warning: the frame size of 6456 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/media/dvb-frontends/stv090x.c:3090:1: warning: the frame size of 5872 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/wireless/ralink/rt2x00/rt61pci.c:1647:1: warning: the frame size of 5792 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/media/dvb-frontends/stv090x.c:1595:1: warning: the frame size of 5304 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/scsi/fnic/fnic_trace.c:451:1: warning: the frame size of 5000 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:2417:1: warning: the frame size of 4912 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/media/dvb-frontends/stv090x.c:4265:1: warning: the frame size of 4840 bytes is larger than 2048 bytes [-Wframe-larger-than=]

Comparing against a 3072 byte limit, I get 18 warnings for clang vs 54 for gcc-8. The detailed analysis of some of those warnings last year had shown that the difference can be traced almost entirely to simple scalar variables that use 64 bytes redzone with gcc but only 16 bytes with clang.
Comment 25 Martin Liška 2018-02-20 16:14:29 UTC
(In reply to Arnd Bergmann from comment #24)
> (In reply to Martin Liška from comment #23)
> 
> > That's definitely possible for GCC 9. Question is whether such change will
> > be sufficient for you. Do you expect it will reduce stack usage in the
> > desired way?
> 
> I've recreated my original finding, comparing a clang-5 release against a
> recent gcc-8 snapshot. Building an x86 allmodconfig kernel with clang, I get
> 78 -fsanitize-address-use-after-scope warnings against a 2048 byte limit,
> the largest ones are:
> 
> drivers/usb/misc/sisusbvga/sisusb.c:1880:12: warning: stack frame size of
> 6776 bytes in function 'sisusb_init_gfxcore' [-Wframe-larger-than=]
> drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c:1521:1: warning: stack
> frame size of 5176 bytes in function 'gk104_ram_new_' [-Wframe-larger-than=]
> drivers/usb/misc/sisusbvga/sisusb.c:1750:12: warning: stack frame size of
> 5112 bytes in function 'sisusb_set_default_mode' [-Wframe-larger-than=]
> drivers/net/wireless/atmel/atmel.c:1307:5: warning: stack frame size of 5016
> bytes in function 'atmel_open' [-Wframe-larger-than=]
> net/core/ethtool.c:2549:5: warning: stack frame size of 4568 bytes in
> function 'dev_ethtool' [-Wframe-larger-than=]
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:19216:6:
> warning: stack frame size of 4312 bytes in function 'wlc_phy_init_nphy'
> [-Wframe-larger-than=]
> drivers/media/usb/em28xx/em28xx-dvb.c:1129:12: warning: stack frame size of
> 3992 bytes in function 'em28xx_dvb_init' [-Wframe-larger-than=]
> drivers/staging/media/atomisp/pci/atomisp2/css2400/sh_css.c:6802:24:
> warning: stack frame size of 3960 bytes in function 'load_capture_binaries'
> [-Wframe-larger-than=]
> drivers/staging/wlan-ng/cfg80211.c:454:12: warning: stack frame size of 3864
> bytes in function 'prism2_connect' [-Wframe-larger-than=]
> drivers/staging/wilc1000/host_interface.c:2480:13: warning: stack frame size
> of 3704 bytes in function 'host_if_work' [-Wframe-larger-than=]
> 
> 
> With gcc-8, the same configuration has 179 warnings, including:
> 
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:5650:1: warning: the frame
> size of 23632 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:4515:1: warning: the frame
> size of 14056 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:3879:1: warning: the frame
> size of 11504 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> lib/atomic64_test.c:250:1: warning: the frame size of 11192 bytes is larger
> than 2048 bytes [-Wframe-larger-than=]
> lib/atomic64_test.c:148:1: warning: the frame size of 10360 bytes is larger
> than 2048 bytes [-Wframe-larger-than=]
> drivers/net/wireless/ralink/rt2x00/rt73usb.c:1294:1: warning: the frame size
> of 8680 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> fs/fscache/stats.c:287:1: warning: the frame size of 6536 bytes is larger
> than 2048 bytes [-Wframe-larger-than=]
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:8655:1: warning: the frame
> size of 6456 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/media/dvb-frontends/stv090x.c:3090:1: warning: the frame size of
> 5872 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/net/wireless/ralink/rt2x00/rt61pci.c:1647:1: warning: the frame size
> of 5792 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/media/dvb-frontends/stv090x.c:1595:1: warning: the frame size of
> 5304 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/scsi/fnic/fnic_trace.c:451:1: warning: the frame size of 5000 bytes
> is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:2417:1: warning: the frame
> size of 4912 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/media/dvb-frontends/stv090x.c:4265:1: warning: the frame size of
> 4840 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> 
> Comparing against a 3072 byte limit, I get 18 warnings for clang vs 54 for
> gcc-8. The detailed analysis of some of those warnings last year had shown
> that the difference can be traced almost entirely to simple scalar variables
> that use 64 bytes redzone with gcc but only 16 bytes with clang.

Ok, I don't have problem to implement the similar behavior in GCC 9. Looks most
of warnings are in drivers. That should not be problem as I guess KASAN build is
mainly used in a qemu machine (with syzkaller)? Thus exotic drivers should not
be needed?

I have question about clang where they use for instance:
else if (Size <= 128) Res = Size + 32;

So:

$ cat test.c
int
main (void)
{
  char my_char[N];
  char *ptr, *ptr2;
  {
    char my_char2[N];
    ptr = &my_char[N - 1];
    ptr2 = &my_char2[N - 1];
  }

  return *(ptr2+8);
}

$ clang -v
clang version 5.0.1 (tags/RELEASE_501/final 312548)

$ clang test.c  -fsanitize-address-use-after-scope -fsanitize=address -D N=128 && ./a.out
=================================================================
==18745==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffdb27 at pc 0x0000004e7f29 bp 0x7fffffffd9d0 sp 0x7fffffffd9c8
READ of size 1 at 0x7fffffffdb27 thread T0
    #0 0x4e7f28 in main (/tmp/a.out+0x4e7f28)
    #1 0x7ffff6eb1f49 in __libc_start_main (/lib64/libc.so.6+0x20f49)
    #2 0x41c199 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120

Address 0x7fffffffdb27 is located in stack of thread T0 at offset 327 in frame
    #0 0x4e7d9f in main (/tmp/a.out+0x4e7d9f)

  This frame has 2 object(s):
    [32, 160) 'my_char'
    [192, 320) 'my_char2' <== Memory access at offset 327 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/tmp/a.out+0x4e7f28) in main
Shadow bytes around the buggy address:
  0x10007fff7b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b30: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x10007fff7b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b50: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
=>0x10007fff7b60: f8 f8 f8 f8[f3]f3 f3 f3 00 00 00 00 00 00 00 00
  0x10007fff7b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7bb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The middle red zone is only 32B. So I don't understand why 'Size' not used for red zone
calculation?
Comment 26 Arnd Bergmann 2018-02-20 21:11:23 UTC
(In reply to Martin Liška from comment #25)
> (In reply to Arnd Bergmann from comment #24)
> 
> Ok, I don't have problem to implement the similar behavior in GCC 9. Looks
> most
> of warnings are in drivers. That should not be problem as I guess KASAN
> build is
> mainly used in a qemu machine (with syzkaller)? Thus exotic drivers should
> not
> be needed?

I actually have no idea in what other ways it may be used, though I didn't
think that running syzkaller was the only use case. It always feels like most
bugs in the kernel are in obscure drivers, but then most of the kernel code
consists of obscure drivers ;-)

Here are some warnings in code that is actually being run. For the full output
I see on linux-next, have a look at https://pastebin.com/CMJiUsuR. There
are a couple of other warnings mixed in there as well that I'm working on
addressing, but it's mainly the stack overflow once I turn on CONFIG_KASAN_EXTRA.

arch/x86/kernel/cpu/mshyperv.c:261:1: warning: the frame size of 2704 bytes is larger than 2048 bytes [-Wframe-larger-than=]
arch/x86/kvm/emulate.c:2552:1: warning: the frame size of 2128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/acpi/nfit/core.c:3168:1: warning: the frame size of 3952 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/firmware/efi/test/efi_test.c:688:1: warning: the frame size of 2400 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/gpu/drm/amd/amdgpu/../display/dc/bios/command_table.c:83:1: warning: the frame size of 3760 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/md/md.c:8561:1: warning: the frame size of 2544 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/bonding/bond_netlink.c:677:1: warning: the frame size of 2096 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/btrfs/relocation.c:1202:1: warning: the frame size of 4272 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/fscache/stats.c:287:1: warning: the frame size of 6536 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/jbd2/commit.c:1128:1: warning: the frame size of 3728 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/nfs/pnfs.c:1892:1: warning: the frame size of 2672 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/ntfs/mft.c:2756:1: warning: the frame size of 2352 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/userfaultfd.c:1824:1: warning: the frame size of 2256 bytes is larger than 2048 bytes [-Wframe-larger-than=]
fs/xfs/libxfs/xfs_rmap.c:1334:1: warning: the frame size of 2384 bytes is larger than 2048 bytes [-Wframe-larger-than=]
kernel/rcu/tree.c:2282:1: warning: the frame size of 3160 bytes is larger than 2048 bytes [-Wframe-larger-than=]
lib/rbtree.c:481:1: warning: the frame size of 2520 bytes is larger than 2048 bytes [-Wframe-larger-than=]
mm/khugepaged.c:1560:1: warning: the frame size of 2976 bytes is larger than 2048 bytes [-Wframe-larger-than=]
mm/migrate.c:2129:1: warning: the frame size of 2104 bytes is larger than 2048 bytes [-Wframe-larger-than=]
mm/page_alloc.c:3247:1: warning: the frame size of 4584 bytes is larger than 2048 bytes [-Wframe-larger-than=]
mm/vmscan.c:1350:1: warning: the frame size of 5072 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/bridge/br_netlink.c:1446:1: warning: the frame size of 2592 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/core/ethtool.c:2832:1: warning: the frame size of 3376 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/core/rtnetlink.c:1631:1: warning: the frame size of 2272 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/mac80211/util.c:2188:1: warning: the frame size of 2464 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/rxrpc/recvmsg.c:603:1: warning: the frame size of 2424 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/sctp/socket.c:7271:1: warning: the frame size of 2704 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/wireless/nl80211.c:1938:1: warning: the frame size of 4248 bytes is larger than 2048 bytes [-Wframe-larger-than=]

> The middle red zone is only 32B. So I don't understand why 'Size' not used
> for red zone
> calculation?

No idea.
Comment 27 Martin Liška 2018-02-20 21:42:52 UTC
Let me decrypt how clang generates the red zones. I can probably quickly come up with a patch that will do the dynamic red zone size allocation. Having that you'll be able to rebuild kernel and catch remaining issues.
Comment 28 Martin Liška 2018-02-21 10:24:57 UTC
(In reply to Martin Liška from comment #27)
> Let me decrypt how clang generates the red zones. I can probably quickly
> come up with a patch that will do the dynamic red zone size allocation.
> Having that you'll be able to rebuild kernel and catch remaining issues.

So the functions looks as follows:

static size_t VarAndRedzoneSize(size_t Size, size_t Granularity,
                                size_t Alignment) {
  size_t Res = 0;
  if (Size <= 4)  Res = 16;
  else if (Size <= 16) Res = 32;
  else if (Size <= 128) Res = Size + 32;
  else if (Size <= 512) Res = Size + 64;
  else if (Size <= 4096) Res = Size + 128;
  else                   Res = Size + 256;
  return alignTo(std::max(Res, 2 * Granularity), Alignment);
}

It's bit confusing as it's size of variable plus red zone in shadow memory. So to calculate just the size of redzone one has to subtract Size. And there's some rounding up according to alignment.

I've prepared table with size of variable and corresponding middle end red zone size:

+---------------+-----+------+---------+
| Variable size | GCC | LLVM |  ratio  |
+---------------+-----+------+---------+
|             1 |  63 |   15 | 23.81%  |
|             4 |  60 |   12 | 20.00%  |
|             8 |  56 |   24 | 42.86%  |
|            12 |  52 |   20 | 38.46%  |
|            16 |  48 |   16 | 33.33%  |
|            32 |  32 |   32 | 100.00% |
|            40 |  56 |   40 | 71.43%  |
|            50 |  46 |   46 | 100.00% |
|            64 |  32 |   32 | 100.00% |
|            96 |  32 |   32 | 100.00% |
|           128 |  32 |   32 | 100.00% |
|           129 |  63 |   79 | 125.40% |
|           196 |  60 |   76 | 126.67% |
|           256 |  32 |   64 | 200.00% |
|           257 |  63 |   79 | 125.40% |
|           511 |  33 |   65 | 196.97% |
|           512 |  32 |   64 | 200.00% |
|          1024 |  32 |  128 | 400.00% |
|          1025 |  63 |  143 | 226.98% |
|          2048 |  32 |  128 | 400.00% |
|          4096 |  32 |  128 | 400.00% |
|          8192 |  32 |  256 | 800.00% |
+---------------+-----+------+---------+

As seen (and mentioned by you), for small variables we have bigger red zones.