101476 – AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel

Bug 101476 - AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel

Summary: AddressSanitizer check failed, points out a (potentially) non-existing stack ...

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	sanitizer (show other bugs)
Version:	11.1.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Duplicates (2):	103978 109198 (view as bug list)
Depends on:
Blocks:

Reported:	2021-07-16 19:43 UTC by Franek
Modified:	2023-03-19 16:58 UTC (History)
CC List:	9 users (show)

See Also:	https://github.com/google/sanitizers/issues/1171
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-07-22 00:00:00

Attachments
the preprocessed file (506 bytes, text/plain) 2021-07-16 19:43 UTC, Franek	Details
test case (231 bytes, text/plain) 2022-01-18 23:01 UTC, Stas Sergeev	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Franek 2021-07-16 19:43:56 UTC

Created attachment 51167 [details]
the preprocessed file

Is an issue with GCC 10.3 and GCC 11.1
C language
x86_64-linux-gnu
Ubuntu 11.1.0-1ubuntu1~21.04

Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.1.0-1ubuntu1~21.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2

Command: gcc bug.i -o bug -fsanitize=address -pthread && ./bug
Output:

==164550==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/asan/asan_thread.cpp:367 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)
    #0 0x7f649a4fdd68 in AsanCheckFailed ../../../../src/libsanitizer/asan/asan_rtl.cpp:74
    #1 0x7f649a51e69e in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:78
    #2 0x7f649a5034ec in __asan::AsanThread::GetStackFrameAccessByAddr(unsigned long, __asan::AsanThread::StackFrameAccess*) ../../../../src/libsanitizer/asan/asan_thread.cpp:367
    #3 0x7f649a46deab in __asan::GetStackAddressInformation(unsigned long, unsigned long, __asan::StackAddressDescription*) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:203
    #4 0x7f649a46f2e8 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:455
    #5 0x7f649a46f2e8 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:439
    #6 0x7f649a471a94 in __asan::ErrorGeneric::ErrorGeneric(unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long) ../../../../src/libsanitizer/asan/asan_errors.cpp:389
    #7 0x7f649a4fd385 in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) ../../../../src/libsanitizer/asan/asan_report.cpp:476
    #8 0x7f649a494038 in __interceptor_sigaltstack ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:9996
    #9 0x7f649a512bfd in __sanitizer::UnsetAlternateSignalStack() ../../../../src/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:194
    #10 0x7f649a5029ec in __asan::AsanThread::Destroy() ../../../../src/libsanitizer/asan/asan_thread.cpp:104
    #11 0x7f649a426430 in __nptl_deallocate_tsd nptl/pthread_create.c:303
    #12 0x7f649a427470 in __nptl_deallocate_tsd nptl/pthread_create.c:258
    #13 0x7f649a427470 in start_thread nptl/pthread_create.c:484
    #14 0x7f649a349d52 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x117d52)


.i file attached.

Additional notes: does not trigger a segfault when not using a sanitizer. Removing the sanitizer and then adding the -fstack-protector-all flag does not crash the program. That makes me think it might not be a problem with the code.

Comment 1 Martin Liška 2021-07-22 10:19:24 UTC

Cannot reproduce that with

gcc version 10.3.1 20210707 [revision 048117e16c77f82598fca9af585500572d46ad73] (SUSE Linux) 

and

gcc version 11.1.1 20210625 [revision 62bbb113ae68a7e724255e17143520735bcb9ec9] (SUSE Linux)

Comment 2 Stas Sergeev 2022-01-18 17:01:29 UTC

I have the very same crash with the
multi-threaded app. The test-case from
this ticket doesn't reproduce it for
me either, but my app crashes nevertheless.
So I debugged it a bit myself.
gcc-11.2.1.

The crash happens here:
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc#L10168
Here asan checks that sigaltstack()
didn't corrupt anything while writing
the "old setting" to "oss" ptr.
Next, some check is later fails here:
https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#340
Asan failed to find the canary value
kCurrentStackFrameMagic. The search
was done the following way: it walks
the shadow stack down, and looks for
the kAsanStackLeftRedzoneMagic to find
the bottom of redzone. Then, at the
bottom of redzone, it looks for the
canary value. I checked that the lowest
canary value is overwritten by the call
to GetAltStackSize(). It uses SIGSTKSZ
macro:
https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp.html#170
which expands into a getconf()
call, so eats up quite a lot.

Now I am not entirely sure what conclusion
can be derived out of that. I think that
the culprit is probably here:
https://code.woboq.org/gcc/libsanitizer/asan/asan_interceptors_memintrinsics.h.html#26
They say that they expect 16 bytes of
a redzone, but it seems to be completely
exhausted with all canaries overwritten.

Does something of the above makes sense?
This is the first time I am looking into
an asan code.

Comment 3 Stas Sergeev 2022-01-18 17:56:23 UTC

Why does it check for a redzone
on a non-leaf function? GetAltStackSize()
calls to a glibc's getconf and that
overwrites a canary.
Maybe it shouldn't use/check the redzone
on a non-leaf function?

Comment 4 Stas Sergeev 2022-01-18 18:14:28 UTC

Thread 3 "X ev" hit Breakpoint 4, __sanitizer::UnsetAlternateSignalStack () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190
190	void UnsetAlternateSignalStack() {
(gdb) n
194	  altstack.ss_size = GetAltStackSize();  // Some sane value required on Darwin.
(gdb) p /x $rsp
$128 = 0x7fffee0a0ce0
(gdb) p &oldstack
$129 = (stack_t *) 0x7fffee0a0d00
(gdb) p /x *(int *)0x7fffee0a0cc0  <== canary address
$130 = 0x41b58ab3
(gdb) p 0x7fffee0a0ce0-0x7fffee0a0cc0
$132 = 32

Here we can see that before a
call to GetAltStackSize(), rsp
is 32 bytes above the lowest
canary value. After the call,
there is no more canary because
32 bytes are quickly overwritten
by a call to getconf().

Comment 5 Stas Sergeev 2022-01-18 20:24:47 UTC

Another problem here seems to be
that pthread_cancel() doesn't unpoison
the cancelled thread's stack.
This causes dtors to run on a
randomly poisoned stack, depending
on where the cancellation happened.
That explains the "random" nature of
a crash, and the fact that pthread_cancel()
is in a test-case attached to that ticket,
and in my program as well.

So, the best diagnostic I can come up
with, is that after pthread_cancel() we
have this:
---
#0  __sanitizer::UnsetAlternateSignalStack ()
    at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190
#1  0x00007ffff7672f0d in __asan::AsanThread::Destroy (this=0x7ffff358e000)
    at ../../../../libsanitizer/asan/asan_thread.cpp:104
#2  0x00007ffff69d2c61 in __GI___nptl_deallocate_tsd ()
    at nptl_deallocate_tsd.c:74
#3  __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
#4  0x00007ffff69d5948 in start_thread (arg=<optimized out>)
    at pthread_create.c:446
#5  0x00007ffff6a5a640 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
---

And its running on a stack previously
poisoned before pthread_cancel().
Then it detects the access to poisoned
area and is trying to do a stack trace.
But that fails too because the redzone
canary is overwritten.
So all we get is a crash.

Comment 6 Stas Sergeev 2022-01-18 22:23:15 UTC

I think the fix (of at least 1 problem here)
would be to move this line:
https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#109
upwards, before this:
https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#103
It will then unpoison stack before
playing its sigaltstack games.
But I don't know how to test that idea.

Comment 7 Stas Sergeev 2022-01-18 23:01:35 UTC

Created attachment 52221 [details]
test case

This is a reproducer for both problems.

$ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O1
to see the canary overwrite problem.

$ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O0
to see the poisoned stack after pthread_cancel()
problem.

Comment 8 Martin Liška 2022-01-19 09:04:49 UTC

Please report the problem to upstream libsanitizer project:
https://github.com/llvm/llvm-project/issues

Comment 9 Stas Sergeev 2022-01-19 14:02:24 UTC

(In reply to Martin Liška from comment #8)
> Please report the problem to upstream libsanitizer project:
> https://github.com/llvm/llvm-project/issues

I already did:
https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891
But URL is different, should I also report
that to llvm-project?

Comment 10 Martin Liška 2022-01-19 14:13:54 UTC

(In reply to Stas Sergeev from comment #9)
> (In reply to Martin Liška from comment #8)
> > Please report the problem to upstream libsanitizer project:
> > https://github.com/llvm/llvm-project/issues
> 
> I already did:
> https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891
> But URL is different, should I also report
> that to llvm-project?

That location is fine, however, they have a duplicated bugzilla.

Comment 11 Stas Sergeev 2022-01-20 09:58:56 UTC

The third bug here seems to be
that __asan_handle_no_return:
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L602
also calls sigaltstack() before
unpoisoning stacks. I believe this
makes the problem much more reproducible,
for example the test-case with longjmp()
is likely possible too. I've found about
that instance by trying to call
__asan_handle_no_return() manually as a
pthread cleanup handler, in a hope to
work around the destructor bug. But it
appears __asan_handle_no_return() does
the same thing.
So the fix should be to move this line:
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L607
above PlatformUnpoisonStacks() call.

Comment 12 Andrew Pinski 2022-01-21 09:43:17 UTC

*** Bug 103978 has been marked as a duplicate of this bug. ***

Comment 13 Stas Sergeev 2022-01-25 10:35:35 UTC

Found another problem.
https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix.cpp#L53
The comment above that line talks about
SS_AUTODISARM, but the line itself does
not account for any flags. In a mean time,
linux returns SS_DISABLE in combination
with flags, like SS_AUTODISARM. So the
"!=" check should not be used.

My app probes for SS_AUTODISARM by trying
to set it, and after that, asan breaks.
This is quite cludgy though.
Should the check be changed to
if (!(signal_stack.ss_flags & SS_DISABLE))
or maybe linux should not return any flags
together with SS_DISABLE?
man page talks "strange things" on that subject.

Comment 14 Martin Liška 2022-01-25 11:25:15 UTC

(In reply to Stas Sergeev from comment #13)
> Found another problem.
> https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix.
> cpp#L53
> The comment above that line talks about
> SS_AUTODISARM, but the line itself does
> not account for any flags. In a mean time,
> linux returns SS_DISABLE in combination
> with flags, like SS_AUTODISARM. So the
> "!=" check should not be used.
> 
> My app probes for SS_AUTODISARM by trying
> to set it, and after that, asan breaks.
> This is quite cludgy though.
> Should the check be changed to
> if (!(signal_stack.ss_flags & SS_DISABLE))
> or maybe linux should not return any flags
> together with SS_DISABLE?
> man page talks "strange things" on that subject.

Please report to upstream as well.

Comment 15 Stas Sergeev 2022-01-25 11:31:34 UTC

(In reply to Martin Liška from comment #14)
> Please report to upstream as well.

I'd like some guidance on how should that
be addressed, because that will allow to
specify the upstream.
I am not entirely sure that linux is doing
the right thing, and I am not sure man page
even makes sense saying that:
---
The old_ss.ss_flags may return either of the following values:

       SS_ONSTACK
       SS_DISABLE
       SS_AUTODISARM
---

... because what I see is the return of
"SS_DISABLE|SS_AUTODISARM", which is what I
write to flags for probing.
This is cludgy.
Does anyone know what fix should that get?

Comment 16 Stas Sergeev 2022-01-25 18:28:13 UTC

I think I'll propose to apply something like this to linux kernel:

diff --git a/kernel/signal.c b/kernel/signal.c
index 6f3476dc7873..0549212a8dd6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4153,6 +4153,7 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
                if (ss_mode == SS_DISABLE) {
                        ss_size = 0;
                        ss_sp = NULL;
+                       ss_flags = SS_DISABLE;
                } else {
                        if (unlikely(ss_size < min_ss_size))
                                ret = -ENOMEM;

Comment 17 Stas Sergeev 2022-02-11 12:45:30 UTC

I sent the small patch-set here:
https://lore.kernel.org/lkml/20220126191441.3380389-1-stsp2@yandex.ru/
but it is so far ignored by kernel developers.
Someone from this bugzilla should give me an
Ack or Review, or this won't float.

Comment 18 Stas Sergeev 2022-10-18 18:03:21 UTC

(In reply to Stas Sergeev from comment #5)
> And its running on a stack previously
> poisoned before pthread_cancel().

And the reason for that is because
the glibc in use is the one not built
with -fsanitize=address. When it calls
its __do_cancel() which has attribute
"noreturn", __asan_handle_noreturn()
is not being called. Therefore the
canceled thread remains with the
poison below SP.
I believe the glibc re-built with asan
would not exhibit the crash.

Note: all URLs above where I was pointing
to the code, now either are a dead links
or point to wrong lines. Its quite a shame
that such a bug remains unfixed after a
complete explanation was provided, but now
that explanation is rotten...

Comment 19 Andrew Pinski 2023-03-19 16:58:33 UTC

*** Bug 109198 has been marked as a duplicate of this bug. ***