This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

Hi Jakub,

Thanks for your feedback! See below.

On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek <> wrote:
> On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote:
>> Got total silence the last 4 times I posted this, and users have been
>> bugging me about it offline, so trying again.
>> This patch fixes a showstopper problem preventing the transparent use
>> of OpenMP in scientific libraries, esp. with Python. Specifically, it
>> is currently not possible to use GNU OpenMP -- even in a limited,
>> temporary manner -- in any program that uses (or might use) fork() for
>> parallelism, even if the fork() and the use of OpenMP occur at totally
>> different times. This limitation is unique to GNU OpenMP -- every
>> competing OpenMP implementation already contains something like this
>> patch. While technically not fully POSIX-compliant (because POSIX
>> gives much much weaker guarantees around fork() than any real Unix),
>> the approach used in this patch (a) performs only POSIX-compliant
>> operations when the host program is itself fully POSIX-compliant, and
>> (b) actually works perfectly reliably in practice on all commonly used
>> platforms I'm aware of.
> 1) gomp_we_are_forked in your patch will attempt to free the pool
>    of the thread that encounters it, which is racy; consider a program
>    after fork calling pthread_create several times, each thread
>    thusly created then ~ at the same time doing #pragma omp parallel
>    and the initial thread too.  You really should clean up the pool
>    data structure only in the initial thread and nowhere else;
>    for native TLS (non-emulated, IE model) the best would be to have a flag
>    in the gomp_thread_pool structure,
>    struct gomp_thread *thr = gomp_thread ();
>    if (thr && thr->thread_pool)
>      thr->thread_pool->after_fork = true;
>    should in that case be safe in the atfork child handler.
>    For !HAVE_TLS or emulated TLS not sure if it is completely safe,
>    it would call pthread_getspecific.  Perhaps just don't register
>    atfork handler on those targets at all?

Good point. The updated patch below takes a slightly different
approach. I moved we_are_forked to the per-thread struct, and then I
moved the setting of it into the *parent* process's fork handlers --
the before-fork handler toggles it to true, then the child spawns off
and inherits this setting, and then the parent after-fork handler
toggles it back again. (Since it's per-thread, there's no race
condition here.) This lets us remove the child after-fork handler
entirely, and -- since the parent handlers aren't subject to any
restrictions on what they can call -- it works on all platforms
regardless of the TLS implementation.

> 2) can you explain why are you removing the cleanups from
>    gomp_free_pool_helper ?

They aren't removed, but rather moved from the helper function (which
runs in the helper threads) into gomp_free_thread_pool (which runs in
the main thread) -- which makes it easier to run the appropriate
cleanups even in the case where the helper threads aren't running.
(But see below -- we might prefer to drop this part of the patch

> 3) you can call pthread_atfork many times (once for each pthread
>    that creates a thread pool), that is undesirable, you want to do that
>    only if the initial thread creates thread pool

Good point. I've moved the pthread_atfork call to initialize_team,
which is an __attribute__((constructor)).

I am a little uncertain whether this is the best approach, though,
because of the comment in team_destructor about wanting to correctly
handle dlopen/dlclose. One of pthread_atfork's many (many) limitations
is that there's no way to unregister handlers, so if dlopen/dlclose is
important (is it?) then we can't call pthread_atfork from

If this is a problem, then we could delay the pthread_atfork until
e.g. the first thread pool is spawned -- would this be preferred?

> 4) the testcase is clearly not portable enough, should be probably limited
>    to *-*-linux* only, fork etc. will likely not work on many targets.

I think it should work on pretty much any target that has fork(); we
definitely care about having this functionality on e.g. OS X. I've
added some genericish target specifications.

> In any case, even with the patch, are you aware that you'll leak megabytes
> of thread stacks etc.?

Well, err, I wasn't, no :-). Thanks for pointing that out.

To me this does clinch the argument that a better approach would be
the one I suggested in
i.e., of tracking whether any threadprivate variables were present,
and if not then simply shutting down the thread pools before forking.
But this would be a much more invasive change to gomp (I wouldn't know
where to start).

In the mean time, the current patch is still worthwhile. The cost is
not that bad: I wouldn't think of it as "leaking" so much as "overhead
of supporting OMP->fork->OMP". No-one forks a child which forks a
child which forks a child etc., so the cost is pretty much bounded in
practice. The most common use case is probably using fork() to spawn a
set of worker processes, which will end up COW-sharing the thread
stacks (which will just end up resting peacefully in swap). And when
doing computational work where working set sizes are often in the
gigabytes, spending a few megabytes is small change -- esp. compared
to the current cost, which involves first wasting hours of programmer
time figuring out why things are just locking up, and then (in many
cases) having to rewrite the code entirely because there's no fix for
this, you just have to redesign you parallelization architecture to
either avoid OMP to avoid fork().

However, the thread stack issue does make me wonder if it's worth
spending so much effort on cleaning up a few semaphores and mutexes.
So I split the patch into two parts. The first enables the basic
functionality and passes the test, but it doesn't even try to clean up
the thread pool -- it just forgets that it existed and moves on. The
second patch goes on top of the first, and adds the best-effort
cleanup of synchronization objects and easily free-able heap. So patch
#1 alone will do the job, and patch #2 is optional -- applying means
we leak a bit yes, but does increase the chance of portability
gremlins cropping up.

Changelog for patch #1:

2014-10-19  Nathaniel J. Smith  <>

        * libgomp.h (struct gomp_thread): New member we_are_forked.
        * team.c (gomp_thread_start): Add we_are_forked to gomp_thread
        (gomp_after_fork_parent_callback): New functions.
        (initialize_team): Register atfork handlers.
        (gomp_team_start): Check for fork on entry, and clear thread state
        if found.
        * testsuite/libgomp.c/fork-1.c: New test.

Changelog for patch #2:

2014-10-19  Nathaniel J. Smith  <>

        * team.c (gomp_free_pool_helper): Move per-thread cleanup to main
        (gomp_free_thread): Delegate implementation to...
        (gomp_free_thread_pool): ...this new function. Like old
        gomp_free_thread, but does per-thread cleanup, and has option to
        skip everything that involves interacting with actual threads,
        which is useful when called after fork.
        (gomp_team_start): Call gomp_free_thread_pool to release (some)
        resources after fork.


Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh

Attachment: gomp-safe-fork-patch-v3-part1.diff
Description: Text document

Attachment: gomp-safe-fork-patch-v3-part2.diff
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]