Bug 71890 - when using setjmp&longjmp do context switch, libgcc crash the process when do unwind in thread cancel signal handler on X86_64
Summary: when using setjmp&longjmp do context switch, libgcc crash the process when do...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcc (show other bugs)
Version: unknown
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-15 07:08 UTC by wgkun
Modified: 2016-07-19 01:55 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
test case to reproduce crash problem (3.70 KB, text/plain)
2016-07-15 07:08 UTC, wgkun
Details

Note You need to log in before you can comment on or make changes to this bug.
Description wgkun 2016-07-15 07:08:18 UTC
Created attachment 38911 [details]
test case to reproduce crash problem

This problem happens when we implement a user space context switching framework by setjmp&longjmp. 
The attached file is a simple case can reproduce this problem. We create a thread by pthread_create and mmap two memory blocks as the stack pool of it. And then use setjmp&longjmp to make the thread switch between these two stacks. We call the stack which the pthread_create allocate for the thread as original stack, and the other two mmap stacks as stack 1 and stack 2. The thread only switchs from original stack to stack 1 once after it created and then only switchs between stack 1 and stack 2. Then the result is that if release stack 1 when the thread runs on stack 2 and cancel the thread, libgcc will crash the process when do unwind in cancel handler. It try to visit some where on stack 1 which has been released. However whenever we release stack 2 and cancel the thread, libgcc will run ok.
We first found this problem on Wind River's commercial version and then reproduce on other free release.
We have tested on X86_64, MIPS, PPC and found it only happens on X86_64.
Compile the case file simply with "gcc -lpthread my_test.c -o my_test"
If use -fno-asynchronous-unwind-tables to not generate the unwind table, the process will not crash.

the version infomation:
1. 
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/4.9.3/lto-wrapper
Target: x86_64-linux-gnu
Configured with: /build/distro/work/shared/gcc-4.9.3/configure --build=none --host=x86_64-linux-gnu --target=x86_64-linux-gnu --prefix=/usr --with-sysroot=/ --with-build-sysroot=/build/distro/work/x86_64/rootfs/x86_64-linux-gnu --disable-nls --disable-bootstrap --enable-languages=c,c++ --with-system-zlib --enable-shared --disable-static --with-pkgversion=distro-v2.5-sctpmh --disable-install-libiberty --with-arch=core2 --disable-multilib
Thread model: posix
gcc version 4.9.3 (distro-v2.5-sctpmh) 

2.
$ gcc -v
Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.5/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-java-awt=gtk --host=i386-redhat-linux
Thread model: posix
gcc version 3.4.5 20051201 (Red Hat 3.4.5-2)

3.
Thread model: posix
gcc version 4.4.1 (Wind River Linux Sourcery G++ 4.4a-450)
Comment 1 Andrew Pinski 2016-07-15 07:26:16 UTC
I don't think this is a valid thing to do with setjmp and longjmp.

Why are you not using makecontext/setcontext/getcontext/swapcontext instead?

Also why do you think this is a libgcc bug because if you try to unwind the stack using gdb, you will get the same behavior because the stack was that one thread is now on the other one but the that thread has now exited.
Comment 2 wgkun 2016-07-19 01:55:58 UTC
(In reply to Andrew Pinski from comment #1)
> I don't think this is a valid thing to do with setjmp and longjmp.
> 
> Why are you not using makecontext/setcontext/getcontext/swapcontext instead?
> 
> Also why do you think this is a libgcc bug because if you try to unwind the
> stack using gdb, you will get the same behavior because the stack was that
> one thread is now on the other one but the that thread has now exited.

Thanks. I turn to makecontext/swapcontext and works well.

But, back to this problem, I still think it is something wrong in the tool chain, gcc or libgcc.
gcc generate the asynchronous-unwind-tables and libgcc use them to do unwind. As my understanding, when do unwind for a thread, you shall not visit other context not belong to this thread now.
And why this only happens on X86_64? Is it related to the special definition of unwind tables according to the X86_64 ABI which has some difference with formal DWARF?