Created attachment 51167 [details] the preprocessed file Is an issue with GCC 10.3 and GCC 11.1 C language x86_64-linux-gnu Ubuntu 11.1.0-1ubuntu1~21.04 Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.1.0-1ubuntu1~21.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Command: gcc bug.i -o bug -fsanitize=address -pthread && ./bug Output: ==164550==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/asan/asan_thread.cpp:367 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0) #0 0x7f649a4fdd68 in AsanCheckFailed ../../../../src/libsanitizer/asan/asan_rtl.cpp:74 #1 0x7f649a51e69e in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:78 #2 0x7f649a5034ec in __asan::AsanThread::GetStackFrameAccessByAddr(unsigned long, __asan::AsanThread::StackFrameAccess*) ../../../../src/libsanitizer/asan/asan_thread.cpp:367 #3 0x7f649a46deab in __asan::GetStackAddressInformation(unsigned long, unsigned long, __asan::StackAddressDescription*) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:203 #4 0x7f649a46f2e8 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:455 #5 0x7f649a46f2e8 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:439 #6 0x7f649a471a94 in __asan::ErrorGeneric::ErrorGeneric(unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long) ../../../../src/libsanitizer/asan/asan_errors.cpp:389 #7 0x7f649a4fd385 in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) ../../../../src/libsanitizer/asan/asan_report.cpp:476 #8 0x7f649a494038 in __interceptor_sigaltstack ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:9996 #9 0x7f649a512bfd in __sanitizer::UnsetAlternateSignalStack() ../../../../src/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:194 #10 0x7f649a5029ec in __asan::AsanThread::Destroy() ../../../../src/libsanitizer/asan/asan_thread.cpp:104 #11 0x7f649a426430 in __nptl_deallocate_tsd nptl/pthread_create.c:303 #12 0x7f649a427470 in __nptl_deallocate_tsd nptl/pthread_create.c:258 #13 0x7f649a427470 in start_thread nptl/pthread_create.c:484 #14 0x7f649a349d52 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x117d52) .i file attached. Additional notes: does not trigger a segfault when not using a sanitizer. Removing the sanitizer and then adding the -fstack-protector-all flag does not crash the program. That makes me think it might not be a problem with the code.
Cannot reproduce that with gcc version 10.3.1 20210707 [revision 048117e16c77f82598fca9af585500572d46ad73] (SUSE Linux) and gcc version 11.1.1 20210625 [revision 62bbb113ae68a7e724255e17143520735bcb9ec9] (SUSE Linux)
I have the very same crash with the multi-threaded app. The test-case from this ticket doesn't reproduce it for me either, but my app crashes nevertheless. So I debugged it a bit myself. gcc-11.2.1. The crash happens here: https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc#L10168 Here asan checks that sigaltstack() didn't corrupt anything while writing the "old setting" to "oss" ptr. Next, some check is later fails here: https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#340 Asan failed to find the canary value kCurrentStackFrameMagic. The search was done the following way: it walks the shadow stack down, and looks for the kAsanStackLeftRedzoneMagic to find the bottom of redzone. Then, at the bottom of redzone, it looks for the canary value. I checked that the lowest canary value is overwritten by the call to GetAltStackSize(). It uses SIGSTKSZ macro: https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp.html#170 which expands into a getconf() call, so eats up quite a lot. Now I am not entirely sure what conclusion can be derived out of that. I think that the culprit is probably here: https://code.woboq.org/gcc/libsanitizer/asan/asan_interceptors_memintrinsics.h.html#26 They say that they expect 16 bytes of a redzone, but it seems to be completely exhausted with all canaries overwritten. Does something of the above makes sense? This is the first time I am looking into an asan code.
Why does it check for a redzone on a non-leaf function? GetAltStackSize() calls to a glibc's getconf and that overwrites a canary. Maybe it shouldn't use/check the redzone on a non-leaf function?
Thread 3 "X ev" hit Breakpoint 4, __sanitizer::UnsetAlternateSignalStack () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190 190 void UnsetAlternateSignalStack() { (gdb) n 194 altstack.ss_size = GetAltStackSize(); // Some sane value required on Darwin. (gdb) p /x $rsp $128 = 0x7fffee0a0ce0 (gdb) p &oldstack $129 = (stack_t *) 0x7fffee0a0d00 (gdb) p /x *(int *)0x7fffee0a0cc0 <== canary address $130 = 0x41b58ab3 (gdb) p 0x7fffee0a0ce0-0x7fffee0a0cc0 $132 = 32 Here we can see that before a call to GetAltStackSize(), rsp is 32 bytes above the lowest canary value. After the call, there is no more canary because 32 bytes are quickly overwritten by a call to getconf().
Another problem here seems to be that pthread_cancel() doesn't unpoison the cancelled thread's stack. This causes dtors to run on a randomly poisoned stack, depending on where the cancellation happened. That explains the "random" nature of a crash, and the fact that pthread_cancel() is in a test-case attached to that ticket, and in my program as well. So, the best diagnostic I can come up with, is that after pthread_cancel() we have this: --- #0 __sanitizer::UnsetAlternateSignalStack () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190 #1 0x00007ffff7672f0d in __asan::AsanThread::Destroy (this=0x7ffff358e000) at ../../../../libsanitizer/asan/asan_thread.cpp:104 #2 0x00007ffff69d2c61 in __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:74 #3 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23 #4 0x00007ffff69d5948 in start_thread (arg=<optimized out>) at pthread_create.c:446 #5 0x00007ffff6a5a640 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 --- And its running on a stack previously poisoned before pthread_cancel(). Then it detects the access to poisoned area and is trying to do a stack trace. But that fails too because the redzone canary is overwritten. So all we get is a crash.
I think the fix (of at least 1 problem here) would be to move this line: https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#109 upwards, before this: https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#103 It will then unpoison stack before playing its sigaltstack games. But I don't know how to test that idea.
Created attachment 52221 [details] test case This is a reproducer for both problems. $ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O1 to see the canary overwrite problem. $ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O0 to see the poisoned stack after pthread_cancel() problem.
Please report the problem to upstream libsanitizer project: https://github.com/llvm/llvm-project/issues
(In reply to Martin Liška from comment #8) > Please report the problem to upstream libsanitizer project: > https://github.com/llvm/llvm-project/issues I already did: https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891 But URL is different, should I also report that to llvm-project?
(In reply to Stas Sergeev from comment #9) > (In reply to Martin Liška from comment #8) > > Please report the problem to upstream libsanitizer project: > > https://github.com/llvm/llvm-project/issues > > I already did: > https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891 > But URL is different, should I also report > that to llvm-project? That location is fine, however, they have a duplicated bugzilla.
The third bug here seems to be that __asan_handle_no_return: https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L602 also calls sigaltstack() before unpoisoning stacks. I believe this makes the problem much more reproducible, for example the test-case with longjmp() is likely possible too. I've found about that instance by trying to call __asan_handle_no_return() manually as a pthread cleanup handler, in a hope to work around the destructor bug. But it appears __asan_handle_no_return() does the same thing. So the fix should be to move this line: https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L607 above PlatformUnpoisonStacks() call.
*** Bug 103978 has been marked as a duplicate of this bug. ***
Found another problem. https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix.cpp#L53 The comment above that line talks about SS_AUTODISARM, but the line itself does not account for any flags. In a mean time, linux returns SS_DISABLE in combination with flags, like SS_AUTODISARM. So the "!=" check should not be used. My app probes for SS_AUTODISARM by trying to set it, and after that, asan breaks. This is quite cludgy though. Should the check be changed to if (!(signal_stack.ss_flags & SS_DISABLE)) or maybe linux should not return any flags together with SS_DISABLE? man page talks "strange things" on that subject.
(In reply to Stas Sergeev from comment #13) > Found another problem. > https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix. > cpp#L53 > The comment above that line talks about > SS_AUTODISARM, but the line itself does > not account for any flags. In a mean time, > linux returns SS_DISABLE in combination > with flags, like SS_AUTODISARM. So the > "!=" check should not be used. > > My app probes for SS_AUTODISARM by trying > to set it, and after that, asan breaks. > This is quite cludgy though. > Should the check be changed to > if (!(signal_stack.ss_flags & SS_DISABLE)) > or maybe linux should not return any flags > together with SS_DISABLE? > man page talks "strange things" on that subject. Please report to upstream as well.
(In reply to Martin Liška from comment #14) > Please report to upstream as well. I'd like some guidance on how should that be addressed, because that will allow to specify the upstream. I am not entirely sure that linux is doing the right thing, and I am not sure man page even makes sense saying that: --- The old_ss.ss_flags may return either of the following values: SS_ONSTACK SS_DISABLE SS_AUTODISARM --- ... because what I see is the return of "SS_DISABLE|SS_AUTODISARM", which is what I write to flags for probing. This is cludgy. Does anyone know what fix should that get?
I think I'll propose to apply something like this to linux kernel: diff --git a/kernel/signal.c b/kernel/signal.c index 6f3476dc7873..0549212a8dd6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -4153,6 +4153,7 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp, if (ss_mode == SS_DISABLE) { ss_size = 0; ss_sp = NULL; + ss_flags = SS_DISABLE; } else { if (unlikely(ss_size < min_ss_size)) ret = -ENOMEM;
I sent the small patch-set here: https://lore.kernel.org/lkml/20220126191441.3380389-1-stsp2@yandex.ru/ but it is so far ignored by kernel developers. Someone from this bugzilla should give me an Ack or Review, or this won't float.
(In reply to Stas Sergeev from comment #5) > And its running on a stack previously > poisoned before pthread_cancel(). And the reason for that is because the glibc in use is the one not built with -fsanitize=address. When it calls its __do_cancel() which has attribute "noreturn", __asan_handle_noreturn() is not being called. Therefore the canceled thread remains with the poison below SP. I believe the glibc re-built with asan would not exhibit the crash. Note: all URLs above where I was pointing to the code, now either are a dead links or point to wrong lines. Its quite a shame that such a bug remains unfixed after a complete explanation was provided, but now that explanation is rotten...
*** Bug 109198 has been marked as a duplicate of this bug. ***