Bug 101476 - AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
Summary: AddressSanitizer check failed, points out a (potentially) non-existing stack ...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: sanitizer (show other bugs)
Version: 11.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 103978 109198 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-07-16 19:43 UTC by Franek
Modified: 2023-03-19 16:58 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-07-22 00:00:00


Attachments
the preprocessed file (506 bytes, text/plain)
2021-07-16 19:43 UTC, Franek
Details
test case (231 bytes, text/plain)
2022-01-18 23:01 UTC, Stas Sergeev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Franek 2021-07-16 19:43:56 UTC
Created attachment 51167 [details]
the preprocessed file

Is an issue with GCC 10.3 and GCC 11.1
C language
x86_64-linux-gnu
Ubuntu 11.1.0-1ubuntu1~21.04

Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.1.0-1ubuntu1~21.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2

Command: gcc bug.i -o bug -fsanitize=address -pthread && ./bug
Output:

==164550==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/asan/asan_thread.cpp:367 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)
    #0 0x7f649a4fdd68 in AsanCheckFailed ../../../../src/libsanitizer/asan/asan_rtl.cpp:74
    #1 0x7f649a51e69e in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:78
    #2 0x7f649a5034ec in __asan::AsanThread::GetStackFrameAccessByAddr(unsigned long, __asan::AsanThread::StackFrameAccess*) ../../../../src/libsanitizer/asan/asan_thread.cpp:367
    #3 0x7f649a46deab in __asan::GetStackAddressInformation(unsigned long, unsigned long, __asan::StackAddressDescription*) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:203
    #4 0x7f649a46f2e8 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:455
    #5 0x7f649a46f2e8 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:439
    #6 0x7f649a471a94 in __asan::ErrorGeneric::ErrorGeneric(unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long) ../../../../src/libsanitizer/asan/asan_errors.cpp:389
    #7 0x7f649a4fd385 in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) ../../../../src/libsanitizer/asan/asan_report.cpp:476
    #8 0x7f649a494038 in __interceptor_sigaltstack ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:9996
    #9 0x7f649a512bfd in __sanitizer::UnsetAlternateSignalStack() ../../../../src/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:194
    #10 0x7f649a5029ec in __asan::AsanThread::Destroy() ../../../../src/libsanitizer/asan/asan_thread.cpp:104
    #11 0x7f649a426430 in __nptl_deallocate_tsd nptl/pthread_create.c:303
    #12 0x7f649a427470 in __nptl_deallocate_tsd nptl/pthread_create.c:258
    #13 0x7f649a427470 in start_thread nptl/pthread_create.c:484
    #14 0x7f649a349d52 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x117d52)


.i file attached.

Additional notes: does not trigger a segfault when not using a sanitizer. Removing the sanitizer and then adding the -fstack-protector-all flag does not crash the program. That makes me think it might not be a problem with the code.
Comment 1 Martin Liška 2021-07-22 10:19:24 UTC
Cannot reproduce that with

gcc version 10.3.1 20210707 [revision 048117e16c77f82598fca9af585500572d46ad73] (SUSE Linux) 

and

gcc version 11.1.1 20210625 [revision 62bbb113ae68a7e724255e17143520735bcb9ec9] (SUSE Linux)
Comment 2 Stas Sergeev 2022-01-18 17:01:29 UTC
I have the very same crash with the
multi-threaded app. The test-case from
this ticket doesn't reproduce it for
me either, but my app crashes nevertheless.
So I debugged it a bit myself.
gcc-11.2.1.

The crash happens here:
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc#L10168
Here asan checks that sigaltstack()
didn't corrupt anything while writing
the "old setting" to "oss" ptr.
Next, some check is later fails here:
https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#340
Asan failed to find the canary value
kCurrentStackFrameMagic. The search
was done the following way: it walks
the shadow stack down, and looks for
the kAsanStackLeftRedzoneMagic to find
the bottom of redzone. Then, at the
bottom of redzone, it looks for the
canary value. I checked that the lowest
canary value is overwritten by the call
to GetAltStackSize(). It uses SIGSTKSZ
macro:
https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp.html#170
which expands into a getconf()
call, so eats up quite a lot.

Now I am not entirely sure what conclusion
can be derived out of that. I think that
the culprit is probably here:
https://code.woboq.org/gcc/libsanitizer/asan/asan_interceptors_memintrinsics.h.html#26
They say that they expect 16 bytes of
a redzone, but it seems to be completely
exhausted with all canaries overwritten.

Does something of the above makes sense?
This is the first time I am looking into
an asan code.
Comment 3 Stas Sergeev 2022-01-18 17:56:23 UTC
Why does it check for a redzone
on a non-leaf function? GetAltStackSize()
calls to a glibc's getconf and that
overwrites a canary.
Maybe it shouldn't use/check the redzone
on a non-leaf function?
Comment 4 Stas Sergeev 2022-01-18 18:14:28 UTC
Thread 3 "X ev" hit Breakpoint 4, __sanitizer::UnsetAlternateSignalStack () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190
190	void UnsetAlternateSignalStack() {
(gdb) n
194	  altstack.ss_size = GetAltStackSize();  // Some sane value required on Darwin.
(gdb) p /x $rsp
$128 = 0x7fffee0a0ce0
(gdb) p &oldstack
$129 = (stack_t *) 0x7fffee0a0d00
(gdb) p /x *(int *)0x7fffee0a0cc0  <== canary address
$130 = 0x41b58ab3
(gdb) p 0x7fffee0a0ce0-0x7fffee0a0cc0
$132 = 32

Here we can see that before a
call to GetAltStackSize(), rsp
is 32 bytes above the lowest
canary value. After the call,
there is no more canary because
32 bytes are quickly overwritten
by a call to getconf().
Comment 5 Stas Sergeev 2022-01-18 20:24:47 UTC
Another problem here seems to be
that pthread_cancel() doesn't unpoison
the cancelled thread's stack.
This causes dtors to run on a
randomly poisoned stack, depending
on where the cancellation happened.
That explains the "random" nature of
a crash, and the fact that pthread_cancel()
is in a test-case attached to that ticket,
and in my program as well.

So, the best diagnostic I can come up
with, is that after pthread_cancel() we
have this:
---
#0  __sanitizer::UnsetAlternateSignalStack ()
    at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190
#1  0x00007ffff7672f0d in __asan::AsanThread::Destroy (this=0x7ffff358e000)
    at ../../../../libsanitizer/asan/asan_thread.cpp:104
#2  0x00007ffff69d2c61 in __GI___nptl_deallocate_tsd ()
    at nptl_deallocate_tsd.c:74
#3  __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
#4  0x00007ffff69d5948 in start_thread (arg=<optimized out>)
    at pthread_create.c:446
#5  0x00007ffff6a5a640 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
---

And its running on a stack previously
poisoned before pthread_cancel().
Then it detects the access to poisoned
area and is trying to do a stack trace.
But that fails too because the redzone
canary is overwritten.
So all we get is a crash.
Comment 6 Stas Sergeev 2022-01-18 22:23:15 UTC
I think the fix (of at least 1 problem here)
would be to move this line:
https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#109
upwards, before this:
https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#103
It will then unpoison stack before
playing its sigaltstack games.
But I don't know how to test that idea.
Comment 7 Stas Sergeev 2022-01-18 23:01:35 UTC
Created attachment 52221 [details]
test case

This is a reproducer for both problems.

$ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O1
to see the canary overwrite problem.

$ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O0
to see the poisoned stack after pthread_cancel()
problem.
Comment 8 Martin Liška 2022-01-19 09:04:49 UTC
Please report the problem to upstream libsanitizer project:
https://github.com/llvm/llvm-project/issues
Comment 9 Stas Sergeev 2022-01-19 14:02:24 UTC
(In reply to Martin Liška from comment #8)
> Please report the problem to upstream libsanitizer project:
> https://github.com/llvm/llvm-project/issues

I already did:
https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891
But URL is different, should I also report
that to llvm-project?
Comment 10 Martin Liška 2022-01-19 14:13:54 UTC
(In reply to Stas Sergeev from comment #9)
> (In reply to Martin Liška from comment #8)
> > Please report the problem to upstream libsanitizer project:
> > https://github.com/llvm/llvm-project/issues
> 
> I already did:
> https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891
> But URL is different, should I also report
> that to llvm-project?

That location is fine, however, they have a duplicated bugzilla.
Comment 11 Stas Sergeev 2022-01-20 09:58:56 UTC
The third bug here seems to be
that __asan_handle_no_return:
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L602
also calls sigaltstack() before
unpoisoning stacks. I believe this
makes the problem much more reproducible,
for example the test-case with longjmp()
is likely possible too. I've found about
that instance by trying to call
__asan_handle_no_return() manually as a
pthread cleanup handler, in a hope to
work around the destructor bug. But it
appears __asan_handle_no_return() does
the same thing.
So the fix should be to move this line:
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L607
above PlatformUnpoisonStacks() call.
Comment 12 Andrew Pinski 2022-01-21 09:43:17 UTC
*** Bug 103978 has been marked as a duplicate of this bug. ***
Comment 13 Stas Sergeev 2022-01-25 10:35:35 UTC
Found another problem.
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix.cpp#L53
The comment above that line talks about
SS_AUTODISARM, but the line itself does
not account for any flags. In a mean time,
linux returns SS_DISABLE in combination
with flags, like SS_AUTODISARM. So the
"!=" check should not be used.

My app probes for SS_AUTODISARM by trying
to set it, and after that, asan breaks.
This is quite cludgy though.
Should the check be changed to
if (!(signal_stack.ss_flags & SS_DISABLE))
or maybe linux should not return any flags
together with SS_DISABLE?
man page talks "strange things" on that subject.
Comment 14 Martin Liška 2022-01-25 11:25:15 UTC
(In reply to Stas Sergeev from comment #13)
> Found another problem.
> https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix.
> cpp#L53
> The comment above that line talks about
> SS_AUTODISARM, but the line itself does
> not account for any flags. In a mean time,
> linux returns SS_DISABLE in combination
> with flags, like SS_AUTODISARM. So the
> "!=" check should not be used.
> 
> My app probes for SS_AUTODISARM by trying
> to set it, and after that, asan breaks.
> This is quite cludgy though.
> Should the check be changed to
> if (!(signal_stack.ss_flags & SS_DISABLE))
> or maybe linux should not return any flags
> together with SS_DISABLE?
> man page talks "strange things" on that subject.

Please report to upstream as well.
Comment 15 Stas Sergeev 2022-01-25 11:31:34 UTC
(In reply to Martin Liška from comment #14)
> Please report to upstream as well.

I'd like some guidance on how should that
be addressed, because that will allow to
specify the upstream.
I am not entirely sure that linux is doing
the right thing, and I am not sure man page
even makes sense saying that:
---
The old_ss.ss_flags may return either of the following values:

       SS_ONSTACK
       SS_DISABLE
       SS_AUTODISARM
---

... because what I see is the return of
"SS_DISABLE|SS_AUTODISARM", which is what I
write to flags for probing.
This is cludgy.
Does anyone know what fix should that get?
Comment 16 Stas Sergeev 2022-01-25 18:28:13 UTC
I think I'll propose to apply something like this to linux kernel:

diff --git a/kernel/signal.c b/kernel/signal.c
index 6f3476dc7873..0549212a8dd6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4153,6 +4153,7 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
                if (ss_mode == SS_DISABLE) {
                        ss_size = 0;
                        ss_sp = NULL;
+                       ss_flags = SS_DISABLE;
                } else {
                        if (unlikely(ss_size < min_ss_size))
                                ret = -ENOMEM;
Comment 17 Stas Sergeev 2022-02-11 12:45:30 UTC
I sent the small patch-set here:
https://lore.kernel.org/lkml/20220126191441.3380389-1-stsp2@yandex.ru/
but it is so far ignored by kernel developers.
Someone from this bugzilla should give me an
Ack or Review, or this won't float.
Comment 18 Stas Sergeev 2022-10-18 18:03:21 UTC
(In reply to Stas Sergeev from comment #5)
> And its running on a stack previously
> poisoned before pthread_cancel().

And the reason for that is because
the glibc in use is the one not built
with -fsanitize=address. When it calls
its __do_cancel() which has attribute
"noreturn", __asan_handle_noreturn()
is not being called. Therefore the
canceled thread remains with the
poison below SP.
I believe the glibc re-built with asan
would not exhibit the crash.

Note: all URLs above where I was pointing
to the code, now either are a dead links
or point to wrong lines. Its quite a shame
that such a bug remains unfixed after a
complete explanation was provided, but now
that explanation is rotten...
Comment 19 Andrew Pinski 2023-03-19 16:58:33 UTC
*** Bug 109198 has been marked as a duplicate of this bug. ***