This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork


Ping^2.

On Tue, Oct 28, 2014 at 6:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
> Ping.
>
> On 19 Oct 2014 23:44, "Nathaniel Smith" <njs@pobox.com> wrote:
>>
>> Hi Jakub,
>>
>> Thanks for your feedback! See below.
>>
>> On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote:
>> >> Got total silence the last 4 times I posted this, and users have been
>> >> bugging me about it offline, so trying again.
>> >>
>> >> This patch fixes a showstopper problem preventing the transparent use
>> >> of OpenMP in scientific libraries, esp. with Python. Specifically, it
>> >> is currently not possible to use GNU OpenMP -- even in a limited,
>> >> temporary manner -- in any program that uses (or might use) fork() for
>> >> parallelism, even if the fork() and the use of OpenMP occur at totally
>> >> different times. This limitation is unique to GNU OpenMP -- every
>> >> competing OpenMP implementation already contains something like this
>> >> patch. While technically not fully POSIX-compliant (because POSIX
>> >> gives much much weaker guarantees around fork() than any real Unix),
>> >> the approach used in this patch (a) performs only POSIX-compliant
>> >> operations when the host program is itself fully POSIX-compliant, and
>> >> (b) actually works perfectly reliably in practice on all commonly used
>> >> platforms I'm aware of.
>> >
>> > 1) gomp_we_are_forked in your patch will attempt to free the pool
>> >    of the thread that encounters it, which is racy; consider a program
>> >    after fork calling pthread_create several times, each thread
>> >    thusly created then ~ at the same time doing #pragma omp parallel
>> >    and the initial thread too.  You really should clean up the pool
>> >    data structure only in the initial thread and nowhere else;
>> >    for native TLS (non-emulated, IE model) the best would be to have a
>> > flag
>> >    in the gomp_thread_pool structure,
>> >    struct gomp_thread *thr = gomp_thread ();
>> >    if (thr && thr->thread_pool)
>> >      thr->thread_pool->after_fork = true;
>> >    should in that case be safe in the atfork child handler.
>> >    For !HAVE_TLS or emulated TLS not sure if it is completely safe,
>> >    it would call pthread_getspecific.  Perhaps just don't register
>> >    atfork handler on those targets at all?
>>
>> Good point. The updated patch below takes a slightly different
>> approach. I moved we_are_forked to the per-thread struct, and then I
>> moved the setting of it into the *parent* process's fork handlers --
>> the before-fork handler toggles it to true, then the child spawns off
>> and inherits this setting, and then the parent after-fork handler
>> toggles it back again. (Since it's per-thread, there's no race
>> condition here.) This lets us remove the child after-fork handler
>> entirely, and -- since the parent handlers aren't subject to any
>> restrictions on what they can call -- it works on all platforms
>> regardless of the TLS implementation.
>>
>> > 2) can you explain why are you removing the cleanups from
>> >    gomp_free_pool_helper ?
>>
>> They aren't removed, but rather moved from the helper function (which
>> runs in the helper threads) into gomp_free_thread_pool (which runs in
>> the main thread) -- which makes it easier to run the appropriate
>> cleanups even in the case where the helper threads aren't running.
>> (But see below -- we might prefer to drop this part of the patch
>> entirely.)
>>
>> > 3) you can call pthread_atfork many times (once for each pthread
>> >    that creates a thread pool), that is undesirable, you want to do that
>> >    only if the initial thread creates thread pool
>>
>> Good point. I've moved the pthread_atfork call to initialize_team,
>> which is an __attribute__((constructor)).
>>
>> I am a little uncertain whether this is the best approach, though,
>> because of the comment in team_destructor about wanting to correctly
>> handle dlopen/dlclose. One of pthread_atfork's many (many) limitations
>> is that there's no way to unregister handlers, so if dlopen/dlclose is
>> important (is it?) then we can't call pthread_atfork from
>> initialize_team.
>>
>> If this is a problem, then we could delay the pthread_atfork until
>> e.g. the first thread pool is spawned -- would this be preferred?
>>
>> > 4) the testcase is clearly not portable enough, should be probably
>> > limited
>> >    to *-*-linux* only, fork etc. will likely not work on many targets.
>>
>> I think it should work on pretty much any target that has fork(); we
>> definitely care about having this functionality on e.g. OS X. I've
>> added some genericish target specifications.
>>
>> > In any case, even with the patch, are you aware that you'll leak
>> > megabytes
>> > of thread stacks etc.?
>>
>> Well, err, I wasn't, no :-). Thanks for pointing that out.
>>
>> To me this does clinch the argument that a better approach would be
>> the one I suggested in
>>    https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html
>> i.e., of tracking whether any threadprivate variables were present,
>> and if not then simply shutting down the thread pools before forking.
>> But this would be a much more invasive change to gomp (I wouldn't know
>> where to start).
>>
>> In the mean time, the current patch is still worthwhile. The cost is
>> not that bad: I wouldn't think of it as "leaking" so much as "overhead
>> of supporting OMP->fork->OMP". No-one forks a child which forks a
>> child which forks a child etc., so the cost is pretty much bounded in
>> practice. The most common use case is probably using fork() to spawn a
>> set of worker processes, which will end up COW-sharing the thread
>> stacks (which will just end up resting peacefully in swap). And when
>> doing computational work where working set sizes are often in the
>> gigabytes, spending a few megabytes is small change -- esp. compared
>> to the current cost, which involves first wasting hours of programmer
>> time figuring out why things are just locking up, and then (in many
>> cases) having to rewrite the code entirely because there's no fix for
>> this, you just have to redesign you parallelization architecture to
>> either avoid OMP to avoid fork().
>>
>> However, the thread stack issue does make me wonder if it's worth
>> spending so much effort on cleaning up a few semaphores and mutexes.
>> So I split the patch into two parts. The first enables the basic
>> functionality and passes the test, but it doesn't even try to clean up
>> the thread pool -- it just forgets that it existed and moves on. The
>> second patch goes on top of the first, and adds the best-effort
>> cleanup of synchronization objects and easily free-able heap. So patch
>> #1 alone will do the job, and patch #2 is optional -- applying means
>> we leak a bit yes, but does increase the chance of portability
>> gremlins cropping up.
>>
>> Changelog for patch #1:
>>
>> 2014-10-19  Nathaniel J. Smith  <njs@pobox.com>
>>
>>         * libgomp.h (struct gomp_thread): New member we_are_forked.
>>         * team.c (gomp_thread_start): Add we_are_forked to gomp_thread
>>         initialization.
>>         (gomp_before_fork_callback)
>>         (gomp_after_fork_parent_callback): New functions.
>>         (initialize_team): Register atfork handlers.
>>         (gomp_team_start): Check for fork on entry, and clear thread state
>>         if found.
>>         * testsuite/libgomp.c/fork-1.c: New test.
>>
>> Changelog for patch #2:
>>
>> 2014-10-19  Nathaniel J. Smith  <njs@pobox.com>
>>
>>         * team.c (gomp_free_pool_helper): Move per-thread cleanup to main
>>         thread.
>>         (gomp_free_thread): Delegate implementation to...
>>         (gomp_free_thread_pool): ...this new function. Like old
>>         gomp_free_thread, but does per-thread cleanup, and has option to
>>         skip everything that involves interacting with actual threads,
>>         which is useful when called after fork.
>>         (gomp_team_start): Call gomp_free_thread_pool to release (some)
>>         resources after fork.
>>
>> -n
>>
>> --
>> Nathaniel J. Smith
>> Postdoctoral researcher - Informatics - University of Edinburgh
>> http://vorpus.org



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]