Bug 54129 - emulated __thread variables and pthread_*specific data
Summary: emulated __thread variables and pthread_*specific data
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2012-07-30 19:29 UTC by blucia
Modified: 2012-07-30 21:00 UTC (History)
0 users

See Also:
Known to work:
Known to fail:
Last reconfirmed:

A test program that shows the behavior. (333 bytes, application/octet-stream)
2012-07-30 19:29 UTC, blucia

Note You need to log in before you can comment on or make changes to this bug.
Description blucia 2012-07-30 19:29:43 UTC
Created attachment 27899 [details]
A test program that shows the behavior.

I have written a short program that works on Linux but does not work on Mac OSX.  The program uses __thread variables, which I understand are emulated on Darwin.  The program also uses pthread_key_create(..., thread_destructor) to register the function thread_destructor() to run when threads end.  I have attached a test program that works on Linux and does not work on OS X.

thread_destructor() accesses the __thread variables.  It looks like what is happening is the __thread variables are being zeroed before thread_destructor() is running.  Some of those __thread variables are pointers, leading to segfaults when thread_destructor dereferences them.  

This may not be a bug.  I have not read the relevant parts of the language specification.  It may be undefined to mix pthread_setspecific and __thread.  I may be using some library functions wrong. However, it is definitely the case that the behavior on Darwin is different from the behavior on Linux.

Note that this simple program is not the only place I've seen this.  I am building a debugging/monitoring runtime system that makes heavy use of these facilities and it first experienced the problem.  The attached program is (pretty much) minimal to manifest the problem.

Program compiled with:
gcc -std=c99 ./TLSBug.c -o TLSBug

Output of test program on Mac OS X 10.6:
redoldfence:MacTLSBugTest blucia0a$ ./TLSBug 
Thread says foo == 0x100200130
Thread says *foo == 17
Thread Destructor Called
Thread says foo == 0x0
Segmentation fault

Output of test program on Linux:
blucia@pango:~$ ./TLSBug 
Thread says foo == 0x2564160
Thread says *foo == 17
Thread Destructor Called
Thread says foo == 0x2564160
Thread says *foo == 17
Comment 1 Andrew Pinski 2012-07-30 19:56:19 UTC
I don't think this is even defined behavior.  

Is the order which pthread_*specific data destroyed defined?  This is the biggest issue I think.
Comment 2 blucia 2012-07-30 20:25:52 UTC
The man page for pthread_key_create says:

"An optional destructor function may be associated with each key value.  ...  The order of destructor calls is unspecified if more than one destructor exists for a thread when it exits."

That's fine, but I did not register any destructor function for the __thread variables that are getting zeroed!  In my program, only one destructor function exists.  

It seems a little weird that those __thread variables are being zeroed at all.  Again, I'm not sure what the spec says, but it seems like a better default behavior would be to let them retain their values until termination, unless explicitly altered elsewhere.  Doubly so, because it already does that on Linux, and copying that behavior makes code more portable.
Comment 3 Andrew Pinski 2012-07-30 20:29:05 UTC
> In my program, only one destructor function exists.  

Yes in your source only has one but the code really there is two.  One for the __thread implementation and one you have in your source.

So I think we might declare this as being unspecified behavior.  The same way the order of the destructor is unspecified.
Comment 4 blucia 2012-07-30 20:40:31 UTC
I don't really see your point.  Where is the code in the destructor for the __thread variables?  For the pthread_key_create vars, I wrote down what I want to do to the data, and the destructor does it (in thread_destructor).   

I could write a destructor that doesn't do anything, and just keeps getting called on my non-NULL thread specific data.   The man page suggests I can handle the arbitrary order problem by cycling through my destructors up to PTHREAD_DESTRUCTOR_ITERATIONS times without NULLing any of my data.

From the man page:
"If, after all the destructors have been called for all non-NULL values with associated destructors, there are still some non-NULL values with associated destructors, then the process is repeated.  If, after at least [PTHREAD_DESTRUCTOR_ITERATIONS] iterations of destructor calls for outstanding non-NULL values, there are still some non-NULL values with associated destructors, the implementation stops calling destructors."

The problem is that the current behavior always NULLs out the __thread data on the first whack.  I want to be able to let the __thread destructor run (in any order) and NOT zero the data on its first time through (regardless of the order).  Then, eventually, my pthread_key_create-registered destructor runs, and accesses the non-zero data.  Later, like the manpage says, my destructors get called again, and eventually the __thread destructor NULLs out the __thread data (perhaps based on some global state that indicates things are safe to delete).
Comment 5 Andrew Pinski 2012-07-30 20:50:21 UTC
>Where is the code in the destructor for the __thread variables? 

in libgcc/emutls.c .

The code is:
static void
emutls_destroy (void *ptr)
  struct __emutls_array *arr = ptr;
  pointer size = arr->size;
  pointer i;

  for (i = 0; i < size; ++i)
      if (arr->data[i])
	free (arr->data[i][-1]);

  free (ptr);

static void
emutls_init (void)
  __GTHREAD_MUTEX_INIT_FUNCTION (&emutls_mutex);
  if (__gthread_key_create (&emutls_key, emutls_destroy) != 0)
    abort ();

--- CUT ----
So it does is free the current thread memory.

__gthread_key_create is a simple wrapper around pthread_key_create.
Comment 6 blucia 2012-07-30 21:00:56 UTC
Thanks for pointing out where that code is. 

I still think this is weird (i.e., possibly a bug) for two reasons:
1)Differs from Linux behavior.  I'm sure lots of things differ though, so I understand pushing it off.
2)Inflexibility in how __thread vars are cleaned up.   Is it possible to virtualize the emutls cleanup function?   I understand that might be crazy and complex, so I understand pushing that off too.  

Thanks again for discussing this.  I suspect you'll close it as not-a-bug, but it is disappointing that this portability problem exists.