This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] [libgomp] make it possible to use OMP on both sides of a fork


Problem: A common use care for OMP is to accelerate the internal
workings of an otherwise serial interface. For example, OpenBLAS in
some settings will internally use OMP to accelerate the implementation
of matrix-matrix multiply (DGEMM). When DGEMM is called, then an OMP
section is started, the work is done, then the OMP section exits, the
program returns to serial mode, and DGEMM returns. All this is
entirely transparent to the user -- in fact, it's common for users to
switch between different linear algebra cores (BLAS libraries) without
recompiling, so it's impossible for code that uses linear algebra to
know which underlying library is in use, or how it has been compiled.

However, in order to support some corners of the OMP spec, it is
important that the threads that were started to implement an OMP
parallel section be kept around, in case another OMP section has
started. (AFAICT this is only true when "threadprivate" variables are
in use. Unfortunately AFAICT there is currently no way to determine
whether this is the case -- such variables are handled directly by GCC
without calling into libgomp, so we can't tell at runtime whether they
exist.)

And, this causes a big problem and abstraction leak: it means that if
you use OMP (e.g., by multiplying two matrices), and then fork, and
then the child also uses OMP (e.g., by also multiplying two matrices),
then the child immediately deadlocks (as OMP waits for threads that it
thinks still exist, but that disappeared during the fork). The result
is that it simply *is not possible to know* whether fork() will
actually work as advertised, even when writing purely serial code, if
that code happens to do seemingly innocent things like linear algebra.
And this then ends up causing surprising wreakage in far-flung parts
of the numerical ecosystem (e.g., here's someone trying to figure
figure out why their web site's task manager crashes whenever they try
to plot a graph: https://github.com/celery/celery/issues/1842).

(Somewhat more impassioned rant and references to previous discussions
here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035)

In practice, GOMP seems to be the only OMP implementation that suffers
from this problem; people who encounter this problem are often advised
to switch to icc.

There does not appear to be any fully POSIX-compliant way to solve
this problem (not least because in a strict reading of the POSIX spec,
you aren't guaranteed to be able to do practically *anything* after a
fork() in any program which has ever called a pthreads_* function). In
a less strict reading, we might expect to be okay if no threads are
actually running at the time that fork() is called -- but, we can't
shut down OMP threads before forking, because of the issue with
threadprivate variables -- it might change the behaviour of compliant
programs.

But in practice, if the fork() occurs at a time when every thread is
just sitting waiting on a barrier, then we can be pretty sure that
libc etc. will be in a generally thread-consistent state. And in
practice, the few truly dangerous operations we need to clean up after
the fact -- e.g., destroying that barrier -- do seem to work, at least
on Linux. The attached patch, therefore, takes this strategy.
Crucially, it should have no impact on compliant programs, because it
doesn't actually do anything except set/check a single global variable
until the user actually enters an OMP section in the child, at which
case they have already violated POSIX, so we might as well cross our
fingers and hope for the best. (At the very least, the included test
does fail on Linux x86-64 without the patch, and passes with the
patch.)

Other options that might be worth considering:
-- Adding some way for libgomp to determine whether threadprivate
variables are in use, and then using this information to shut down
threads in a pre-fork handler iff doing so is safe.
-- Instead of trying to clean up the various mutex/barrier/semaphore
detritus left in the child by the evaporating threads, we could simply
leak them. I don't know which is worse in practice: a small leak (once
per child process), or the risk that the various *_destroy functions
will blow up (as POSIX allows them to do).

ChangeLog:

2014-02-12  Nathaniel J. Smith  <njs@pobox.com>

    * team.c (gomp_free_pool_helper): Move per-thread cleanup to main
    thread.
    (gomp_free_thread): Delegate implementation to...
    (gomp_free_thread_pool): ...this new function. Like old
    gomp_free_thread, but does per-thread cleanup, and has option to
    skip everything that involves interacting with actual threads,
    which is useful when called after fork.
    (gomp_after_fork_callback): New function.
    (gomp_team_start): Register atfork handler, and check for fork on
    entry.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Attachment: gomp-safe-fork-patch.diff
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]