Bug 58142

Summary: _pthread_tsd_cleanup called before destructors are called
Product: gcc Reporter: Soonho Kong <soonhok>
Component: libstdc++Assignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: normal CC: daniel.kruegler, der-martin-stumpf, drikosev, egallager, iains, webrown.cpp
Priority: P3    
Version: 4.8.1   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78968
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84497
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80816
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52268
Host: Target: darwin
Build: Known to work:
Known to fail: Last reconfirmed:
Attachments: preprocessed input file

Description Soonho Kong 2013-08-12 21:48:10 UTC
Created attachment 30638 [details]
preprocessed input file

It seems when a thread is finished, its thread cleanup routine is called before destructors of TLS(Thread Local Storage) variables are called and it causes (possible) segmentation faults. I provided a simplified small program which reproduces the behavior. Even if it doesn't generate a segmentation fault, running valgrind over it indicates the same problem is going on in run-time.

This problem happens only on OSX. When I tried the same C++ code on Ubuntu12.04 with g++-4.8.1. There was no problem. I also tried with clang++-3.3 on OSX. There was no problem either.


1. The exact version of GCC:

   gcc-4.8.1

2. the system type: 

   OSX 10.8.4, Darwin air 12.4.0 Darwin Kernel Version 12.4.0

3. the options given when GCC was configured/built:

   g++-4.8 -std=c++11 thread.cpp -O thread

4. the complete command line that triggers the bug;

   valgrind thread

5. the compiler output (error messages, warnings, etc.); and
the preprocessed file (*.i*) that triggers the bug, generated by adding -save-temps to the complete compilation command, or, in the case of a bug report for the GNAT front end, a complete set of source files (see below).

Preprocessed file is attached. Here is the original source code (much shorter):
======================================================================
#include <thread>
#include <iostream>
#include <mutex>
#include <vector>

static void foo() {
    static thread_local std::vector<int> v(1024);
    if (v.size() != 1024) {
        std::cerr << "Error\n";
        exit(1);
    }
}

static void tst1() {
    unsigned n = 5;
    for (unsigned i = 0; i < n; i++) {
        std::thread t([](){ foo(); });
        t.join();
    }
}

int main() {
    tst1();
}
======================================================================

The following is the output of valgrind:

...

==18408== Invalid read of size 8
==18408==    at 0x1000021B3: std::_Vector_base<int, std::allocator<int> >::~_Vector_base() (in ./a.out)
==18408==    by 0x100002054: std::vector<int, std::allocator<int> >::~vector() (in ./a.out)
==18408==    by 0xB9F5: (anonymous namespace)::run(void*) (in /usr/local/lib/libstdc++.6.dylib)
==18408==    by 0x29CA01: _pthread_exit (in /usr/lib/system/libsystem_c.dylib)
==18408==    by 0x29C7AC: _pthread_start (in /usr/lib/system/libsystem_c.dylib)
==18408==    by 0x2891E0: thread_start (in /usr/lib/system/libsystem_c.dylib)
==18408==  Address 0x1000257a8 is 8 bytes inside a block of size 32 free'd
==18408==    at 0x5632: free (in /usr/local/Cellar/valgrind/3.8.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==18408==    by 0x1CAA12: emutls_destroy (in /usr/local/lib/libgcc_s.1.dylib)
==18408==    by 0x101: ???
==18408==    by 0xB0080E9F: ???
==18408==    by 0xB008186F: ???
==18408==    by 0x2A34DF: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_c.dylib)
==18408==    by 0xB008105F: ???

...
Comment 1 Martin Stumpf 2017-12-12 14:41:19 UTC
I can confirm that this in fact causes segfaults, which is the reason why we had to switch to clang++ under MacOS.

Source Code: =========================
#include <thread>
#include <vector>
#include <unordered_map>

class MyClass
{
    public:
        MyClass(){
            printf("%p\n", this);
        }
        ~MyClass(){
            printf("~%p\n", this);
        }

        std::unordered_map<int, int> member;
};

thread_local MyClass threadLocalObject;

int main(){
    std::vector<std::thread> threads;

    for (int i = 0; i < 40; ++i){
        threads.emplace_back([&](){
            printf("%ld\n", threadLocalObject.member.size());
        });
    }

    for (auto &thread : threads)
        thread.join();

    return 0;
}
======================

Compiled with "g++-6 -std=c++11 -O0 -g" on MacOS 10.12.6, g++-6 (Homebrew GCC 6.4.0) 6.4.0.

Segfaults with:
malloc: *** error for object 0xa38333130303962: pointer being freed was not allocated
Comment 2 Ev Drikos 2019-02-01 20:30:28 UTC
Hello,

I cannot reproduce this problem in a newer MacOS (10.12 & 10.13) when a
program has been compiled by the newer GNU GCC versions (ie 7.3 & 8.2).

To my understanding, the problem discussed in PR/58142 has been solved
for the newer Darwin systems by the solution applied to PR/78968.

Is PR/58142 still open just for older OS X systems?

Thanks,
Ev. Drikos
Comment 3 Jonathan Wakely 2019-02-04 09:32:04 UTC
I think it's still open because nobody has looked at it again recently, not because of any conscious decision.

I'm changing the component to libstdc++ as the runtime library is responsible for the cleanup.

Iain, what do you think? If comment 2 is right, then GCC 7.1 and later will use __cxa_thread_atexit from libc and the destruction order will be correct. I don't know when Darwin libc acquired that function, maybe it's in all supported versions of Darwin but we just didn't use it until the PR 78968 change.
Comment 4 Iain Sandoe 2019-02-04 09:50:44 UTC
(In reply to Jonathan Wakely from comment #3)
> I think it's still open because nobody has looked at it again recently, not
> because of any conscious decision.
> 
> I'm changing the component to libstdc++ as the runtime library is
> responsible for the cleanup.
> 
> Iain, what do you think? If comment 2 is right, then GCC 7.1 and later will
> use __cxa_thread_atexit from libc and the destruction order will be correct.
> I don't know when Darwin libc acquired that function, maybe it's in all
> supported versions of Darwin but we just didn't use it until the PR 78968
> change.

A quick look says that __cxa_thread_atexit exists in libc from Darwin13, macOS 10.9 / Mavericks onwards.

So it's not present in the system mentioned in the OP and Comment #1.
.. it will not be present in Darwin9 and 10 which I still build and test for, but it's present in all versions "supported" by the vendor.

We have previously worked around such issues by having a version-specific CRT, but not sure if that's applicable in this case.

NOTE 1: Darwin uses GCC's emulatedTLS, and I have some concerns that there might be C++ issues with initialisers (and, thus possibly DTORs) anyway (see 84497).

NOTE 2: (I don't think it's relevant, but just for completeness) Darwin's linker doesn't support ctor/dtor priorities.
Comment 5 Jonathan Wakely 2019-02-04 10:03:11 UTC
Thanks (and thanks Ev. Drikos for the update).

I'm tempted to close this either FIXED or WONTFIX then. I don't think fixing it is really possible for older versions of Darwin so not much point keeping the bug open to pretend we're going to do anything about it.
Comment 6 Iain Sandoe 2019-02-04 11:18:47 UTC
So .. I had a look in the sources for libc.
cxa_thread_atexit is implemented as a wrapper function that calls _tlv_atexit (provided in lib/system/libdyld.dylib, which is part of the umbrella "libSystem" which includes libc).  dyld is Darwin's dynamic linker, BTW.

That function exists on Darwin11 and 12 so (in principle) we could provide the missing wrapper as a CRT for Darwin11 and 12.

However, I don't think that going to happen any time soon - someone wishing to make it work locally could effectively do the same thing.

my_cxa_thread_at_exit (void (*fun)(void*), void *a)
{
  _tlv_atexit (fun, a);
}


So, I'd say [at least] we could demote this to P4 or lower, I'd not object to closing it as "WONTFIX" either

Apparently it's "just working" with emuTLS - but only when the OS has the callback on thread exit.  There _might_ be some other thread exit callback that could be used - I'd need to see what dyld is actually implementing for tlv_atexit.
Comment 7 Iain Sandoe 2019-02-04 11:24:14 UTC
(In reply to Iain Sandoe from comment #6)
> I'd need to see what dyld is actually implementing for
> tlv_atexit.

Seems to be layered on top of pthread_get/set_specific - so, in principle, we could do something similar for earlier systems.

However, not high on (my) priority list at present.
Comment 8 Ev Drikos 2019-02-05 20:44:07 UTC
(In reply to Iain Sandoe from comment #4)
> ...
> 
> A quick look says that __cxa_thread_atexit exists in libc from Darwin13,
> macOS 10.9 / Mavericks onwards.
> ...

This is a very thorough analysis that essentially answers my question. Thanks

BTW, perhaps this problem wasn't OS X specific only but was perhaps affecting MinGW users as well. The PR/80816 has been reported also with similar symptoms to PR/58142 as explained at:
  
https://sourceforge.net/p/mingw-w64/bugs/727/

Ev. Drikos
Comment 9 Jonathan Wakely 2019-02-05 20:49:13 UTC
Yes, it's almost certainly not specific to OS X, but will affect other targets without __cxa_thread_atexit or __cxa_thread_atexit_impl in libc (possibly only ones with emulated TLS, I'm not sure).
Comment 10 Ev Drikos 2019-02-06 08:40:05 UTC
(In reply to Jonathan Wakely from comment #9)
> ...

I'm not sure either. But I've confirmed in macOS (10.13) the following simple hack.

If the routine that initialises the Emulated TLS created two keys and then deleted
the first, immediately after, the former key would be reassigned to the one used in
function "__cxa_thread_atexit". The destructors would run in the desired order.

Ideally, this routine would like to know, if the following two symbols are defined:
 _GLIBCXX_HAVE___CXA_THREAD_ATEXIT,     and 
 _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL 

To avoid the hack then, one would need to know which "__cxa_thread_atexit" is used. 

Ev. Drikos
Comment 11 Eric Gallager 2019-03-01 13:36:04 UTC
(In reply to Iain Sandoe from comment #4)
> (In reply to Jonathan Wakely from comment #3)
> > I think it's still open because nobody has looked at it again recently, not
> > because of any conscious decision.
> > 
> > I'm changing the component to libstdc++ as the runtime library is
> > responsible for the cleanup.
> > 
> > Iain, what do you think? If comment 2 is right, then GCC 7.1 and later will
> > use __cxa_thread_atexit from libc and the destruction order will be correct.
> > I don't know when Darwin libc acquired that function, maybe it's in all
> > supported versions of Darwin but we just didn't use it until the PR 78968
> > change.
> 
> A quick look says that __cxa_thread_atexit exists in libc from Darwin13,
> macOS 10.9 / Mavericks onwards.
> 
> So it's not present in the system mentioned in the OP and Comment #1.
> .. it will not be present in Darwin9 and 10 which I still build and test
> for, but it's present in all versions "supported" by the vendor.
> 
> We have previously worked around such issues by having a version-specific
> CRT, but not sure if that's applicable in this case.
> 
> NOTE 1: Darwin uses GCC's emulatedTLS, and I have some concerns that there
> might be C++ issues with initialisers (and, thus possibly DTORs) anyway (see
> 84497).

and also bug 52268 

> 
> NOTE 2: (I don't think it's relevant, but just for completeness) Darwin's
> linker doesn't support ctor/dtor priorities.