[hjl@gnu-6 bound-1]$ cat c.c #include <stdio.h> int c[30]; void foo (int a[]) { printf ("10: %d\n", a[10]); } [hjl@gnu-6 bound-1]$ cat m.c #include <stdio.h> extern int c[]; void foo (int []); int main () { c[30] = 1; foo (c); printf ("30: %d\n", c[30]); return 0; } [hjl@gnu-6 bound-1]$ make /export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -c -o m.o m.c /export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -c -o c.o c.c /export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -o x m.o c.o ./x 10: 0 30: 1 [hjl@gnu-6 bound-1]$
That is not a bug, it can't. Asan needs to insert padding after it, but you don't know if the symbol will be defined by the current CU, or some other, and whether there will be padding inserted after it or not. Just use -fno-common if the program doesn't assume common symbol behavior to instrument those.
If upper address or size of the common symbol is available to ASAN at compile time as a special symbol generated by assembler/linker, will it help?
That would look like too big hack. Perhaps we could emit the common symbols as .bss .weak objects with padding, and register as global a local alias to those symbols. Not sure if it would have the right semantics of common symbols though (non-common definitions would be strong symbols and thus would win over the common ones, or if only common definitions are used, then one of them would win). Would bloat .bss of course, and could do surprising things if one uses a weak non-common definition in one CU and common (non-weak) symbol in another one.
The symbol size is always available at link-time or run-time. We just never find a use for it in program itself. We can add relocations for "foo@BOUND", which resolve to original calculation of symbol "foo" + size of "foo".
Just for the record: llvm implementation of asan does not catch these either for the same reason. It would be interesting to find a way to implement this in both compilers.
Created attachment 29165 [details] A prototype If as, ld and ld.so provide size info via "symbol@size", we can do .LASAN0: # __beg: .quad common_data # __size: .quad common_data@size # __size_with_redzone: .quad common_data@size + 40 # __name: .quad .LC0 # __has_dynamic_init: .quad 0 [hjl@gnu-6 pr55739]$ make /export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -c -o m.o m.c /export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -c -o c.o c.c /export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -c size.S /export/build/gnu/gcc/build-x86_64-linux/gcc/../../release/usr/gcc-4.8.0/bin/gcc -fsanitize=address -O -static-libasan -o x m.o c.o size.o ./x 10: 0 29: 1 ================================================================= ==1454== ERROR: AddressSanitizer: global-buffer-overflow on address 0x000002e70a18 at pc 0x401dc5 bp 0x7fffffffde90 sp 0x7fffffffde70 WRITE of size 4 at 0x000002e70a18 thread T0 #0 0x401dc4 (/export/home/hjl/bugs/gcc/pr55739/x+0x401dc4) 0x000002e70a18 is located 0 bytes to the right of global variable 'c (d.c)' (0x2e709a0) of size 120 'c (d.c)' is ascii string '' Shadow bytes around the buggy address: 0x1000005ce0f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x1000005ce140: 00 00 00[f9]f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x1000005ce150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000005ce190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe Stats: 0M malloced (0M for red zones) by 0 calls Stats: 0M realloced by 0 calls Stats: 0M freed by 0 calls Stats: 0M really freed by 0 calls Stats: 0M (0M-0M) mmaped; 0 maps, 0 unmaps mmaps by size class: mallocs by size class: frees by size class: rfrees by size class: Stats: malloc large: 0 small slow: 0 Stats: StackDepot: 0 ids; 0M mapped ==1454== ABORTING make: *** [all] Error 1 [hjl@gnu-6 pr55739]$ readelf -s c.o | grep common_data 17: 0000000000000020 120 OBJECT GLOBAL DEFAULT COM common_data [hjl@gnu-6 pr55739]$
There are already R_386_SIZE32 38 word32 Z + A R_X86_64_SIZE32 32 word32 Z + A R_X86_64_SIZE64 33 word64 Z + A They should work here.
(In reply to comment #7) > There are already > > R_386_SIZE32 38 word32 Z + A > R_X86_64_SIZE32 32 word32 Z + A > R_X86_64_SIZE64 33 word64 Z + A Their support has been checked into glibc and binutils. Can address sanitizer use them?
(In reply to comment #8) > Their support has been checked into glibc and binutils. > Can address sanitizer use them? What about all the other targets that asan now supports?
(In reply to comment #9) > (In reply to comment #8) > > Their support has been checked into glibc and binutils. > > Can address sanitizer use them? > > What about all the other targets that asan now supports? Those targets won't support common symbols until they implement size relocation.
Ugh, no, that is way too premature. This really shouldn't be a dynamic relocation. And asan shouldn't be registering the same (common or in the end non-common) var multiple times from different shared libraries or binaries. For this we should have a way to refer using non-dynamic relocation to the common symbol (like R_*_32 or R_*_64, but resolved to the local copy of the symbol, not global one; or can we have .hidden aliases to common symbols?), and this size relocation should be also resolved at link time.
Size relocation means that all instances of # __beg: .quad common_data # __size: .quad common_data@size # __size_with_redzone: .quad common_data@size + 40 resolve to the same address and size at link-time or run-time. Can we take advantage of it?
No idea why you keep mentioning .quad common_data .quad common_data@size .quad common_data@size + 40 That is nothing even close to what asan needs. Asan for what is normally emitted e.g. as: .comm common_data,15,4 needs to emit .comm common_data,64,32 instead, and then a hidden alias to common_data (currently not allowed by as) or some new relocations which will result in the address of the common_data copy in current executable resp. shared library, followed by 15 (the size of the real common_data), followed by common_data@size (which will be either 15 or 64, depending on how large is common_data actually allocated). This still relies on the the linker always choosing the highest alignment from all the alignments of the same var (because libasan relies that all such variables are at least 32 bytes aligned), and if merging of common vars in whatever order always results in highest size being picked, then we even don't need any of this @size stuff at all. The problem is that if you link say a shared library or executable where you have some -fsanitize=address compiled common vars and then some non-sanitized object with the var initialized (i.e. non-common), then the non-common var wins, but a) doesn't get appropriately aligned (so, impossible to be passed to asan register function), and b) not appropriately sized. Consider: 1.c: asm (".globl foo; .comm foo,4,4;"); 2.c: asm (".globl foo; .comm foo,64,32;"); 3.c: asm (".globl foo; .comm foo,4,4;"); 4.c: int foo = 6; (1.c and 3.c emulate -fno-sanitize=address common var, 4.c non-sanitized non-common var, 2.c sanitized common var). Now, gcc -shared -fpic -o test.so 1.c 2.c 3.c seems to DTRT, foo is 32-byte aligned and 64-bytes long, so we could register it as foo's hidden alias, 4, 64. But if you gcc -shared -fpic -o test.so 1.c 2.c 3.c 4.c you get: /usr/bin/ld: Warning: alignment 4 of symbol `foo' in /tmp/ccoadfJM.o is smaller than 32 in /tmp/cc8Dhbe7.o /usr/bin/ld: Warning: size of symbol `foo' changed from 64 in /tmp/cc8Dhbe7.o to 4 in /tmp/ccoadfJM.o No @size stuff helps with this.
Note -fno-common is default since r10-4867 so I think it is best if we close this as won't fix.