Bug 91737 - On Alpine Linux (libmusl) a statically linked C++ program which throws the first exception in two threads at the same time can busy spin on shutdown after main().
Summary: On Alpine Linux (libmusl) a statically linked C++ program which throws the fi...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcc (show other bugs)
Version: 8.3.0
: P3 normal
Target Milestone: 10.3
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-11 10:29 UTC by Max Neunhöffer
Modified: 2020-07-23 06:51 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-09-17 00:00:00


Attachments
Program exposing the problem when compiled under Alpine Linux and linked statically. (444 bytes, text/x-csrc)
2019-09-11 10:29 UTC, Max Neunhöffer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Max Neunhöffer 2019-09-11 10:29:40 UTC
Created attachment 46870 [details]
Program exposing the problem when compiled under Alpine Linux and linked statically.

My statically linked program runs a busy loop after main() has terminated, provided that the very first exception that is thrown in the run is thrown in two threads at the very same time. The attached program shows the problem.

This only happens on Linux, and only if one does not use glibc as C-library, and only if the executable is statically linked and does not explicitly use the `pthread_cancel` function.

I tested with g++ 8.3.0, but I think the problem is present in other versions as well.

Here is what is happening: In the file `libgcc/unwind-dw2-fde.c` in the function `_Unwind_Find_FDE` there is a mutex which is only acquired if the underlying program is detected to be "multi-threaded". This test for multi-threadedness is done differently on various platforms (see file `libgcc/gthr-posix.h` lines 156 to 306). On Linux without glibc and if it is not the Android bionic C library, a weak reference to the symbol `pthread_cancel` is used. If the underlying program does not explicitly use `pthread_cancel` (and few C++ programs will, because cancelling threads is not in the C++ standard), and the linking is done statically, the program will link `libpthread` but not have a symbol `pthread_cancel`. In this case the mutex is not used at all.

If now the first exception in the program happens in two exceptions concurrently, the function `_Unwind_Find_FDE` will move an object from the static list `unseen_objects` to the static list `seen_objects` and a data race occurs. Sometimes, the same object is moved twice from one list to the other. This leads to the fact that the `seen_objects` list ends in an object which points to itself (with the `next` pointer).

In this case, on shutdown, well after main() and all static destructors, the function `__deregister_frame_info_bases` will busy loop running around the circular data structure `seen_objects`.

I think this is overoptimized and the multi-threadedness detection does not work for many statically linked programs when libmusl is used as underlying C-library.
Comment 1 Andrew Pinski 2019-09-13 23:22:24 UTC
This has been discussed over and over again.  This is an issue in the libc and not in GCC.  In your case in libmusl.
Glibc has a similar bug and been discussed how to fix it.
The way Glibc is going to fix it (though it has not yet) is that libpthread.a will be really just include one object file which includes all of the pthread library.

libmusl should fix it a similar way.
Comment 2 Rich Felker 2019-09-17 14:01:25 UTC
This is absolutely a bug in libgcc, not musl. Weak references are not a valid way to determine if a program is multithreaded. Some distros build all of glibc's libpthread.a as a single object file to *work around* bugs in libgcc and other software, which largely defeats the purpose of static linking and is not an option for musl. If gcc refuses to fix this we can ship patches, but I'd rather get it fixed correctly.
Comment 3 Rich Felker 2019-09-17 14:16:09 UTC
Please reopen. (I thought I could, but apparently I can't...?) RESOLVED/MOVED makes no sense. It should either be opened or CLOSED/WONTFIX if the latter is really going to be gcc's position on this issue.
Comment 4 Rich Felker 2019-09-17 14:27:51 UTC
The corresponding fixes for libgfortran and libstdc++ were made back in 2015. From the converted repo mirror I use, it looks like this was svn revision 222329 but I may be mistaken (really looking forward to official move to git...).

I was aware of this because we used to have the patch in the musl-cross-make patchset and it was removed at some point because it was upstreamed. It looks like it was just overlooked that libgcc[_eh] also had this kind of weak reference use, or maybe the use was introduced later and not noticed.
Comment 5 nsz 2019-09-17 14:30:51 UTC
(In reply to Andrew Pinski from comment #1)
> Glibc has a similar bug and been discussed how to fix it.
> The way Glibc is going to fix it (though it has not yet) is that
> libpthread.a will be really just include one object file which includes all
> of the pthread library.

citation needed.

the plan in glibc is to provide a "is single threaded" api.
https://sourceware.org/ml/libc-alpha/2019-08/msg00438.html

once that's in then in principle any library (like libstdc++)
can do single thread optimizations without hacks.

(another glibc plan is to move libpthread.so into libc.so
so there are no awkward internal abis between them and then
avoiding pthread dependency is no longer relevant.)

i think that should work for the unwinder in libgcc too.

on the musl side, we want to disable this hack before that
happens, it's better to not do any single thread optimizations
than silently breaking things.

so the right fix is something equivalent to
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=222329
i.e. libgcc should be compiled with GTHREAD_USE_WEAK=0 on *musl*.
Comment 6 nsz 2019-11-18 12:49:14 UTC
fixed in r278399 for gcc-10
Comment 7 Jakub Jelinek 2020-05-07 11:56:10 UTC
GCC 10.1 has been released.
Comment 8 Richard Biener 2020-07-23 06:51:51 UTC
GCC 10.2 is released, adjusting target milestone.