Bug 48076

Summary: Unsafe double checked locking in __emutls_get_address
Product: gcc Reporter: Evgeniy Stepanov <eugeni.stepanov>
Component: libgccAssignee: Richard Henderson <rth>
Status: RESOLVED FIXED    
Severity: normal CC: dodji, dvyukov, ian, jakub, jyasskin, kcc, rth, torvald
Priority: P3    
Version: 4.4.3   
Target Milestone: 4.6.4   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2012-11-27 00:00:00
Attachments: proposed patch

Description Evgeniy Stepanov 2011-03-11 16:01:21 UTC
The following piece of code around emutls.c:138 uses double checked locking to initialize a tls offset to the next available value.

  pointer offset = obj->loc.offset;

  if (__builtin_expect (offset == 0, 0))
    {
      static __gthread_once_t once = __GTHREAD_ONCE_INIT;
      __gthread_once (&once, emutls_init);
      __gthread_mutex_lock (&emutls_mutex);
      offset = obj->loc.offset;
      if (offset == 0)
        {
          offset = ++emutls_size;
          obj->loc.offset = offset;
        }
      __gthread_mutex_unlock (&emutls_mutex);
    }

  struct __emutls_array *arr = __gthread_getspecific (emutls_key);

This code needs more barriers to be correct. For example, it is entirely possible for emutls_key value used in the last line of the code snippet to be prefetched before obj->loc.offset is loaded, and, consequently, before emutls_init is called. In short, there needs to be a happens-before relationship between obj->loc.offset assignment and the unprotected read at the first line.

This looks unlikely on x86, but it may be a much bigger deal on ARM.

This was detected with ThreadSanitizer (http://code.google.com/p/data-race-test/wiki/ThreadSanitizer) on an Android device.

==822== WARNING: Possible data race during read of size 4 at 0x44B74: {{{
==822==    T14 (L{}):
==822==     #0  0xB7E0: __emutls_get_address gcc/emutls.c:138
==822==     #1  0x1BFD5: NegativeTests_PerThreadTest::RealWorker() unittest/racecheck_unittest.cc:5665
==822==     #2  0x80107324: ThreadSanitizerStartThread tsan/ts_valgrind_intercepts.c:679
==822==   Concurrent write(s) happened at (OR AFTER) these points:
==822==    T12 (L{L312}):
==822==     #0  0xB80C: __emutls_get_address gcc/emutls.c:145
==822==     #1  0x1BFD5: NegativeTests_PerThreadTest::RealWorker() unittest/racecheck_unittest.cc:5665
==822==     #2  0x80107324: ThreadSanitizerStartThread tsan/ts_valgrind_intercepts.c:679
==822==   Address 0x44B74 is 8 bytes inside data symbol "__emutls_v._ZN27NegativeTests_PerThreadTestL17per_thread_globalE"
==822==   Locks involved in this report (reporting last lock sites): {L312}
==822==    L312 (0x44C18)
==822==     #0  0x80108084: pthread_mutex_lock tsan/ts_valgrind_intercepts.c:934
==822==     #1  0xB808: __emutls_get_address gcc/gthr-posix.h:768
==822==     #2  0x1BFD5: NegativeTests_PerThreadTest::RealWorker() unittest/racecheck_unittest.cc:5665
==822==     #3  0x80107324: ThreadSanitizerStartThread tsan/ts_valgrind_intercepts.c:679
==822==    Race verifier data: 0xB7E0,0xB80C
==822== }}}
Comment 1 Andrew Pinski 2011-03-11 16:33:49 UTC
>This looks unlikely on x86, but it may be a much bigger deal on ARM.

This code should not be used on GNU/Linux on most targets anyways.  ARM Linux supports TLS natively.

I am not saying this is not a bug but I think you should figure out why native TLS is not be used for arm for your code.
Comment 2 Richard Henderson 2012-11-27 18:01:17 UTC
Are you sure this isn't a false-positive?

The way I read this code, it is certainly possible for the optimizer
(or the processor) to prefetch emutls_key before the load of offset:

  __gthread_key_t prefetch = emutls_key;
  pointer offset = obj->loc.offset;

  if (__builtin_expect (offset == 0, 0))
    {
      ...
    }

  struct __emutls_array *arr = __gthread_getspecific (prefetch);

But the compiler should see the memory barriers within the if path
and insert

  if (__builtin_expect (offset == 0, 0))
    {
      ...
      __gthread_mutex_unlock (&emutls_mutex);
      prefetch = emutls_key;
    }

and the processor had better cancel any speculative prefetch when
it sees the explicit barriers.

I'm also assuming here that Android is using the same gthr-posix.h
that is used by desktop glibc linux, and so there isn't a mistake
in the gthread macros causing a lack of barrier...
Comment 3 Richard Henderson 2012-11-27 18:18:23 UTC
Nevermind.  Talking this over with Torvald I see the error.
You are correct.
Comment 4 Richard Henderson 2012-11-27 22:05:34 UTC
Created attachment 28798 [details]
proposed patch

This should fix the race that I eventually saw (read), as well as one that
Torvald pointed out that I didn't see (write; acq_rel to unrelated address?).
Comment 5 Dmitry Vyukov 2012-11-28 03:51:50 UTC
Looks good to me
Comment 6 Dmitry Vyukov 2012-11-28 03:56:50 UTC
There seems to be a similar bug in code generated for function static variables.
The fast-path load is a plain load rather than atomic acquire load.

On Sat, Nov 24, 2012 at 3:06 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
>>> That's what llvm does as well. But it inserts a fast path before
>>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
>>> same?
>>
>> Yes, except it isn't __atomic_load_*, but plain memory read.
>>   _3 = MEM[(char *)&_ZGVZ3foovE1a];
>>   if (_3 == 0)
>>     goto <bb 3>;
>>   else
>>     goto <bb 8>;
>>
>>   <bb 8>:
>>   fast path, whatever;
>>
>>   <bb 3>:
>>   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
>>   ...
>>
>> So, right now tsan would just instrument it as __tsan_read1 from
>> &_ZGVZ3foovE1a rather than any atomic load.
>
>
> Looks like a bug. That needs to be load-acquire with proper compiler
> and hardware memory ordering.
Comment 7 torvald 2012-11-28 14:29:47 UTC
(In reply to comment #6)
> There seems to be a similar bug in code generated for function static
> variables.
> The fast-path load is a plain load rather than atomic acquire load.

Haven't looked at the details, but this indeed looks similar.
Comment 8 Richard Henderson 2012-11-28 21:01:29 UTC
Author: rth
Date: Wed Nov 28 21:01:26 2012
New Revision: 193907

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193907
Log:
PR libgcc/48076
        * emutls.c (__emutls_get_address): Avoid race condition between
        obj->loc.offset read and emutls_key initialization.

Modified:
    trunk/libgcc/ChangeLog
    trunk/libgcc/emutls.c
Comment 9 Richard Henderson 2012-11-29 21:06:07 UTC
Author: rth
Date: Thu Nov 29 21:06:02 2012
New Revision: 193958

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193958
Log:
PR libgcc/48076
        * emutls.c (__emutls_get_address): Avoid race condition between
        obj->loc.offset read and emutls_key initialization.

Modified:
    branches/gcc-4_7-branch/libgcc/ChangeLog
    branches/gcc-4_7-branch/libgcc/emutls.c
Comment 10 Richard Henderson 2012-11-29 21:11:05 UTC
Author: rth
Date: Thu Nov 29 21:11:00 2012
New Revision: 193959

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193959
Log:
PR libgcc/48076
        * emutls.c (__emutls_get_address): Add memory barrier before
        referencing emutls_key.

Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/emutls.c
Comment 11 Richard Henderson 2012-11-29 21:14:10 UTC
All open branches fixed.