Bug 55739 - asan doesn't work on common symbols
Summary: asan doesn't work on common symbols
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: sanitizer (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-19 13:20 UTC by H.J. Lu
Modified: 2013-01-17 17:48 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
A prototype (1.23 KB, application/octet-stream)
2013-01-14 22:11 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2012-12-19 13:20:07 UTC
[hjl@gnu-6 bound-1]$ cat c.c
#include <stdio.h>

int c[30];

void
foo (int a[])
{
  printf ("10: %d\n", a[10]);
}
[hjl@gnu-6 bound-1]$ cat m.c
#include <stdio.h>

extern int c[];

void foo (int []);

int
main ()
{
  c[30] = 1;
  foo (c);
  printf ("30: %d\n", c[30]);
  return 0;
}
[hjl@gnu-6 bound-1]$ make
/export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan   -c -o m.o m.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan   -c -o c.o c.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -o x m.o c.o
./x
10: 0
30: 1
[hjl@gnu-6 bound-1]$
Comment 1 Jakub Jelinek 2012-12-19 13:26:34 UTC
That is not a bug, it can't.  Asan needs to insert padding after it, but you don't know if the symbol will be defined by the current CU, or some other, and whether there will be padding inserted after it or not.  Just use -fno-common if the program doesn't assume common symbol behavior to instrument those.
Comment 2 H.J. Lu 2012-12-19 13:42:12 UTC
If upper address or size of the common symbol is
available to ASAN at compile time as a special
symbol generated by assembler/linker, will it
help?
Comment 3 Jakub Jelinek 2012-12-19 13:49:35 UTC
That would look like too big hack.
Perhaps we could emit the common symbols as .bss .weak objects with padding, and register as global a local alias to those symbols.  Not sure if it would have the right semantics of common symbols though (non-common definitions would be strong symbols and thus would win over the common ones, or if only common definitions are used, then one of them would win).  Would bloat .bss of course, and could do surprising things if one uses a weak non-common definition in one CU and common (non-weak) symbol in another one.
Comment 4 H.J. Lu 2012-12-19 16:58:34 UTC
The symbol size is always available at link-time or run-time.
We just never find a use for it in program itself.  We can add
relocations for "foo@BOUND", which resolve to original
calculation of symbol "foo" + size of "foo".
Comment 5 Kostya Serebryany 2012-12-21 09:46:41 UTC
Just for the record:
llvm implementation of asan does not catch these either for the same reason.
It would be interesting to find a way to implement this in both compilers.
Comment 6 H.J. Lu 2013-01-14 22:11:16 UTC
Created attachment 29165 [details]
A prototype

If as, ld and ld.so provide size info via "symbol@size", we can do

.LASAN0:
# __beg:
	.quad	common_data
# __size:
	.quad	common_data@size
# __size_with_redzone:
	.quad	common_data@size + 40
# __name:
	.quad	.LC0
# __has_dynamic_init:
	.quad	0

[hjl@gnu-6 pr55739]$ make
/export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan   -c -o m.o m.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan   -c -o c.o c.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -c size.S
/export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -o x m.o c.o size.o
./x
10: 0
29: 1
=================================================================
==1454== ERROR: AddressSanitizer: global-buffer-overflow on address 0x000002e70a18 at pc 0x401dc5 bp 0x7fffffffde90 sp 0x7fffffffde70
WRITE of size 4 at 0x000002e70a18 thread T0
    #0 0x401dc4 (/export/home/hjl/bugs/gcc/pr55739/x+0x401dc4)
0x000002e70a18 is located 0 bytes to the right of global variable 'c (d.c)' (0x2e709a0) of size 120
  'c (d.c)' is ascii string ''
Shadow bytes around the buggy address:
  0x1000005ce0f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000005ce140: 00 00 00[f9]f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0x1000005ce150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000005ce190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
Stats: 0M malloced (0M for red zones) by 0 calls
Stats: 0M realloced by 0 calls
Stats: 0M freed by 0 calls
Stats: 0M really freed by 0 calls
Stats: 0M (0M-0M) mmaped; 0 maps, 0 unmaps
  mmaps   by size class: 
  mallocs by size class: 
  frees   by size class: 
  rfrees  by size class: 
Stats: malloc large: 0 small slow: 0
Stats: StackDepot: 0 ids; 0M mapped
==1454== ABORTING
make: *** [all] Error 1
[hjl@gnu-6 pr55739]$ readelf -s c.o | grep common_data
    17: 0000000000000020   120 OBJECT  GLOBAL DEFAULT  COM common_data
[hjl@gnu-6 pr55739]$
Comment 7 H.J. Lu 2013-01-15 02:19:55 UTC
There are already

R_386_SIZE32         38      word32          Z + A
R_X86_64_SIZE32      32      word32          Z + A
R_X86_64_SIZE64      33      word64          Z + A

They should work here.
Comment 8 H.J. Lu 2013-01-17 16:40:48 UTC
(In reply to comment #7)
> There are already
> 
> R_386_SIZE32         38      word32          Z + A
> R_X86_64_SIZE32      32      word32          Z + A
> R_X86_64_SIZE64      33      word64          Z + A

Their support has been checked into glibc and binutils.
Can address sanitizer use them?
Comment 9 Andrew Pinski 2013-01-17 16:43:46 UTC
(In reply to comment #8)
> Their support has been checked into glibc and binutils.
> Can address sanitizer use them?

What about all the other targets that asan now supports?
Comment 10 H.J. Lu 2013-01-17 16:48:53 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Their support has been checked into glibc and binutils.
> > Can address sanitizer use them?
> 
> What about all the other targets that asan now supports?

Those targets won't support common symbols until they implement
size relocation.
Comment 11 Jakub Jelinek 2013-01-17 16:49:58 UTC
Ugh, no, that is way too premature.  This really shouldn't be a dynamic relocation.  And asan shouldn't be registering the same (common or in the end non-common) var multiple times from different shared libraries or binaries.

For this we should have a way to refer using non-dynamic relocation to the common symbol (like R_*_32 or R_*_64, but resolved to the local copy of the symbol, not global one; or can we have .hidden aliases to common symbols?), and this size relocation should be also resolved at link time.
Comment 12 H.J. Lu 2013-01-17 16:57:29 UTC
Size relocation means that all instances of

# __beg:
    .quad    common_data
# __size:
    .quad    common_data@size
# __size_with_redzone:
    .quad    common_data@size + 40

resolve to the same address and size at link-time or run-time.
Can we take advantage of it?
Comment 13 Jakub Jelinek 2013-01-17 17:48:18 UTC
No idea why you keep mentioning
       .quad    common_data
       .quad    common_data@size
       .quad    common_data@size + 40
That is nothing even close to what asan needs.  Asan for what is normally
emitted e.g. as:
.comm common_data,15,4
needs to emit
.comm common_data,64,32
instead, and then a hidden alias to common_data (currently not allowed by as) or some new relocations which will result in the address of the common_data copy in current executable resp. shared library, followed by 15 (the size of the real common_data), followed by common_data@size (which will be either 15 or 64, depending on how large is common_data actually allocated).
This still relies on the the linker always choosing the highest alignment from
all the alignments of the same var (because libasan relies that all such variables are at least 32 bytes aligned), and if merging of common vars in whatever order always results in highest size being picked, then we even don't need any of this @size stuff at all.  The problem is that if you link say a shared library or executable where you have some -fsanitize=address compiled common vars and then some non-sanitized object with the var initialized (i.e. non-common), then the non-common var wins, but a) doesn't get appropriately aligned (so, impossible to be passed to asan register function), and b) not appropriately sized.
Consider: 1.c:
asm (".globl foo; .comm foo,4,4;");
2.c:
asm (".globl foo; .comm foo,64,32;");
3.c:
asm (".globl foo; .comm foo,4,4;");
4.c:
int foo = 6;

(1.c and 3.c emulate -fno-sanitize=address common var, 4.c non-sanitized non-common var, 2.c sanitized common var).  Now,
gcc -shared -fpic -o test.so 1.c 2.c 3.c
seems to DTRT, foo is 32-byte aligned and 64-bytes long, so we could register
it as foo's hidden alias, 4, 64.  But if you
gcc -shared -fpic -o test.so 1.c 2.c 3.c 4.c
you get:
/usr/bin/ld: Warning: alignment 4 of symbol `foo' in /tmp/ccoadfJM.o is smaller than 32 in /tmp/cc8Dhbe7.o
/usr/bin/ld: Warning: size of symbol `foo' changed from 64 in /tmp/cc8Dhbe7.o to 4 in /tmp/ccoadfJM.o

No @size stuff helps with this.