The following piece of code around emutls.c:138 uses double checked locking to initialize a tls offset to the next available value. pointer offset = obj->loc.offset; if (__builtin_expect (offset == 0, 0)) { static __gthread_once_t once = __GTHREAD_ONCE_INIT; __gthread_once (&once, emutls_init); __gthread_mutex_lock (&emutls_mutex); offset = obj->loc.offset; if (offset == 0) { offset = ++emutls_size; obj->loc.offset = offset; } __gthread_mutex_unlock (&emutls_mutex); } struct __emutls_array *arr = __gthread_getspecific (emutls_key); This code needs more barriers to be correct. For example, it is entirely possible for emutls_key value used in the last line of the code snippet to be prefetched before obj->loc.offset is loaded, and, consequently, before emutls_init is called. In short, there needs to be a happens-before relationship between obj->loc.offset assignment and the unprotected read at the first line. This looks unlikely on x86, but it may be a much bigger deal on ARM. This was detected with ThreadSanitizer (http://code.google.com/p/data-race-test/wiki/ThreadSanitizer) on an Android device. ==822== WARNING: Possible data race during read of size 4 at 0x44B74: {{{ ==822== T14 (L{}): ==822== #0 0xB7E0: __emutls_get_address gcc/emutls.c:138 ==822== #1 0x1BFD5: NegativeTests_PerThreadTest::RealWorker() unittest/racecheck_unittest.cc:5665 ==822== #2 0x80107324: ThreadSanitizerStartThread tsan/ts_valgrind_intercepts.c:679 ==822== Concurrent write(s) happened at (OR AFTER) these points: ==822== T12 (L{L312}): ==822== #0 0xB80C: __emutls_get_address gcc/emutls.c:145 ==822== #1 0x1BFD5: NegativeTests_PerThreadTest::RealWorker() unittest/racecheck_unittest.cc:5665 ==822== #2 0x80107324: ThreadSanitizerStartThread tsan/ts_valgrind_intercepts.c:679 ==822== Address 0x44B74 is 8 bytes inside data symbol "__emutls_v._ZN27NegativeTests_PerThreadTestL17per_thread_globalE" ==822== Locks involved in this report (reporting last lock sites): {L312} ==822== L312 (0x44C18) ==822== #0 0x80108084: pthread_mutex_lock tsan/ts_valgrind_intercepts.c:934 ==822== #1 0xB808: __emutls_get_address gcc/gthr-posix.h:768 ==822== #2 0x1BFD5: NegativeTests_PerThreadTest::RealWorker() unittest/racecheck_unittest.cc:5665 ==822== #3 0x80107324: ThreadSanitizerStartThread tsan/ts_valgrind_intercepts.c:679 ==822== Race verifier data: 0xB7E0,0xB80C ==822== }}}
>This looks unlikely on x86, but it may be a much bigger deal on ARM. This code should not be used on GNU/Linux on most targets anyways. ARM Linux supports TLS natively. I am not saying this is not a bug but I think you should figure out why native TLS is not be used for arm for your code.
Are you sure this isn't a false-positive? The way I read this code, it is certainly possible for the optimizer (or the processor) to prefetch emutls_key before the load of offset: __gthread_key_t prefetch = emutls_key; pointer offset = obj->loc.offset; if (__builtin_expect (offset == 0, 0)) { ... } struct __emutls_array *arr = __gthread_getspecific (prefetch); But the compiler should see the memory barriers within the if path and insert if (__builtin_expect (offset == 0, 0)) { ... __gthread_mutex_unlock (&emutls_mutex); prefetch = emutls_key; } and the processor had better cancel any speculative prefetch when it sees the explicit barriers. I'm also assuming here that Android is using the same gthr-posix.h that is used by desktop glibc linux, and so there isn't a mistake in the gthread macros causing a lack of barrier...
Nevermind. Talking this over with Torvald I see the error. You are correct.
Created attachment 28798 [details] proposed patch This should fix the race that I eventually saw (read), as well as one that Torvald pointed out that I didn't see (write; acq_rel to unrelated address?).
Looks good to me
There seems to be a similar bug in code generated for function static variables. The fast-path load is a plain load rather than atomic acquire load. On Sat, Nov 24, 2012 at 3:06 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote: >>> That's what llvm does as well. But it inserts a fast path before >>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the >>> same? >> >> Yes, except it isn't __atomic_load_*, but plain memory read. >> _3 = MEM[(char *)&_ZGVZ3foovE1a]; >> if (_3 == 0) >> goto <bb 3>; >> else >> goto <bb 8>; >> >> <bb 8>: >> fast path, whatever; >> >> <bb 3>: >> _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a); >> ... >> >> So, right now tsan would just instrument it as __tsan_read1 from >> &_ZGVZ3foovE1a rather than any atomic load. > > > Looks like a bug. That needs to be load-acquire with proper compiler > and hardware memory ordering.
(In reply to comment #6) > There seems to be a similar bug in code generated for function static > variables. > The fast-path load is a plain load rather than atomic acquire load. Haven't looked at the details, but this indeed looks similar.
Author: rth Date: Wed Nov 28 21:01:26 2012 New Revision: 193907 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193907 Log: PR libgcc/48076 * emutls.c (__emutls_get_address): Avoid race condition between obj->loc.offset read and emutls_key initialization. Modified: trunk/libgcc/ChangeLog trunk/libgcc/emutls.c
Author: rth Date: Thu Nov 29 21:06:02 2012 New Revision: 193958 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193958 Log: PR libgcc/48076 * emutls.c (__emutls_get_address): Avoid race condition between obj->loc.offset read and emutls_key initialization. Modified: branches/gcc-4_7-branch/libgcc/ChangeLog branches/gcc-4_7-branch/libgcc/emutls.c
Author: rth Date: Thu Nov 29 21:11:00 2012 New Revision: 193959 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193959 Log: PR libgcc/48076 * emutls.c (__emutls_get_address): Add memory barrier before referencing emutls_key. Modified: branches/gcc-4_6-branch/gcc/ChangeLog branches/gcc-4_6-branch/gcc/emutls.c
All open branches fixed.