This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE_FOR


> -----Original Message-----
> From: gcc-owner On Behalf Of Alan Modra
> Sent: 31 August 2004 08:38

> The library asks each of its threads to shutdown. The threads 
> respond by 
> calling a framework function which notifies the waiting 
> thread and then calls 
> pthread_exit(). As soon as all slave threads have notified 
> the master, the 
> master unloads their shared library. This can happen between 
> the notification 
> and the call the pthread_exit(), leading to a situation where 
> pthread_exit() 
> is called on a stack which contains frames from unloaded functions.

  As far as I can see, this is a completely ordinary bog-standard race
condition design fault in the application, which is attempting to do
something completely invalid.  This is a textbook example of how not to do
what they're trying to do.

  There is a time window between the slave thread sending the notification
and it actually exiting and being in a terminated state; therefore it is
utterly incorrect for the master to try and deduce that the thread is in a
terminated state from the fact that the notification has been sent.

  It's hard to say for sure without full design details, but from what's
written above I can't see any protection either against the case where the
very final thread sends a notification, the scheduler interrupts and
schedules the master, the master receives the notification and unloads the
shared library, and then the slave thread resumes with the same instruction
pointer / context that it had before the scheduler interrupted, and tries
executing in what is now presumably completely invalid and unmapped memory
space.

> This pattern is sufficiently common that Win32 actually 
> provides an API for 
> this case: FreeLibraryAndExitThread. Implementing an 
> equivalent API for NPTL 
> would be difficult, given the current implementation of 
> pthread_exit().

  To me this reads as an admission that they know they've got a race
condition, and their observation that a different OS provides a mechanism
that potentially can be used by programmers in the design of their
applications to avoid a similar race condition[*] does not in any way mean
that their code is valid, merely that other people have noticed and gone to
some trouble to fix the problem that they wish to merely neglect.

  Crossing-your-fingers-and-hoping with race conditions never works.  They
_always_ bite you, and surprisingly often they do so sooner rather than
later.  The only fix for race conditions is to use correct rather than
incorrect application design.

  Why doesn't the application use pthread_join if it wants to know when a
thread has terminated?  That is the *correct* way to detect the condition
that the application is currently falsely deducing from the notification
signal.  The application could even retain its current design and only call
pthread_join on the very final slave thread, and only do so *after* having
received the exit notification anyway, and then nothing would be likely to
go wrong.

    cheers, 
      DaveK

[*] but not the same one: the situation would only be the same, and
FreeLibraryAndExitThread would only be a suitable solution, if it was the
last _library_ thread was responsible for doing the unload, or if the master
thread was itself the last thread running in the library (which amount to
more-or-less the same thing, really)
-- 
Can't think of a witty .sigline today....


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]