Bug 60035 - [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
Summary: [PATCH] make it possible to use OMP on both sides of a fork (without violatin...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: libgomp (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-02 22:51 UTC by Nathaniel J. Smith
Modified: 2018-12-13 17:52 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
patch to make openmp -> quiesce -> fork -> openmp work (2.65 KB, patch)
2014-02-02 22:51 UTC, Nathaniel J. Smith
Details | Diff
patch to make openmp -> quiesce -> fork -> openmp work (updated) (2.77 KB, patch)
2014-04-05 13:21 UTC, Nathaniel J. Smith
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nathaniel J. Smith 2014-02-02 22:51:28 UTC
Created attachment 32019 [details]
patch to make openmp -> quiesce -> fork -> openmp work

This is a re-open of #52303 and #58378, with more arguments, and a proposed patch that fixes the problem without violating the openmp standard.

Background: Almost all scientific/numerical code delegates linear algebra operations to some optimized BLAS library. Currently, the main contenders for this library are:
1) ATLAS: free software, but uses extensive build-time configuration, which means it must be re-compiled from source by every user to achieve competitive performance.
2) MKL: proprietary, but technically excellent.
3) OpenBLAS: free software, but uses OpenMP for threading, which means that any program which does linear algebra and also expects fork() to work is screwed [1], at least when using GCC.

This means that for projects like numpy, which are used in a very large range of downstream products, we are pretty much screwed too. Many of our users use fork(), for various good reasons that I can elaborate if desired, so we can't just recommend OpenBLAS in general -- ATLAS or MKL are superior for . But recompiling ATLAS is difficult, so we can't recommend that as a general solution, or provide it in pre-compiled downloads. So what we end up doing is shipping slow, unoptimized BLAS, while all the major "scientific python" distros ship MKL; and we also get constantly pressured by users to either ship binaries with MKL or with OpenBLAS built with icc; and we field a new bug report every week or two from people who use OpenBLAS without realizing it and are experiencing mysterious hangs. (Or sometimes other projects get caught in the crossfire, e.g. [2] which is someone trying to figure out why their web-app can't generate plot graphics when using the celery job queue manager.) Meanwhile people are waiting with bated breath for clang to get an openmp implementation so that they can shift their whole stack over there, solely because of this one bug.

Basically the current situation is causing ongoing pain for a wide range of people and makes free software uncompetitive with proprietary software for scientific code using Python in general. But it doesn't have to be this way! In actual practice on real implementations -- regardless of what POSIX says -- it's perfectly safe to use arbitrary POSIX APIs after fork, so long as all threads are in a known, quiescent state when the fork occurs.

The attached patch has essentially no impact on compliant OpenMP-using programs; in particular, and unlike the patch in #58378, it has no affect on the behavior of the parent process, and in the child process it does nothing that violates POSIX unless the user has violated POSIX first. But it makes it safe in practice to use OpenMP encapsulated within a serial library API, without mysterious breakage depending on far away parts of the program, and in particular should fix the OpenBLAS issue.

Test case included in patch is by Olivier Grisel, from #58378. Patch is against current gcc svn trunk (r206297).

[1] https://github.com/xianyi/OpenBLAS/issues/294
[2] https://github.com/celery/celery/issues/1842
Comment 1 Conrad S 2014-02-07 05:08:38 UTC
It would be a good idea to post this to the gcc-patches mailing list.

See http://gcc.gnu.org/lists.html
Comment 2 Nathaniel J. Smith 2014-02-12 22:38:29 UTC
Good point -- sent.

http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html
Comment 3 Gregory R. Warnes 2014-02-14 00:03:48 UTC
+1   

I've just spent several days tracking down the cause of the mysterious hangs in processes forked by R (http://www.r-project.org).   A resolution of this issue would be very helpful.
Comment 4 larsmans 2014-04-05 12:47:21 UTC
Nathaniel, could you apply the cosmetic changes suggested at http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00860.html? I'd hate to see this patch go to waste.
Comment 5 Nathaniel J. Smith 2014-04-05 13:21:32 UTC
Created attachment 32548 [details]
patch to make openmp -> quiesce -> fork -> openmp work (updated)

Updated based on feedback from Richard Henderson
Comment 6 Nathaniel J. Smith 2014-04-05 13:23:41 UTC
(In reply to larsmans from comment #4)
> Nathaniel, could you apply the cosmetic changes suggested at
> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00860.html? I'd hate to see
> this patch go to waste.

If you look at that thread then you'll see I did resend the patch with those fixes -- I've just attached the updated patch to this bug report as well, thanks for the catch.

My guess is that no-one will pay much attention to this until gcc re-enters phase 1 in any case.
Comment 7 larsmans 2014-04-05 13:29:34 UTC
Phase 1? (Not familiar with the GCC dev cycle.)
Comment 8 Nathaniel J. Smith 2014-04-05 13:43:08 UTC
(In reply to larsmans from comment #7)
> Phase 1? (Not familiar with the GCC dev cycle.)

Sorry, meant "stage 1". GCC trunk is (IIUC) currently in RC-bug-fixes-only pre-release freeze mode.

http://gcc.gnu.org/develop.html
Comment 9 pouar 2017-11-18 01:35:08 UTC
Any news on this? as I would like to use OpenMP with BLAS but I'm currently stuck with pthreads because of this.
Comment 10 Jakub Jelinek 2018-12-12 18:21:35 UTC
GCC 9 has omp_pause_resource and omp_pause_resource_all APIs as required by OpenMP 5.0 for this.
Comment 11 Jeff Hammond 2018-12-13 14:46:26 UTC
Thanks for sharing. I’ve seen that bug or closely related ones before. This
is definitely one of the motivating examples for this feature set.

The only question is how many years before it gets adopted (which requires
everybody to use the latest OpenMP implementations, of course)...

Jeff

On Wed, Dec 12, 2018 at 7:21 PM jakub at gcc dot gnu.org <
gcc-bugzilla@gcc.gnu.org> wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035
>
> --- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> GCC 9 has omp_pause_resource and omp_pause_resource_all APIs as required by
> OpenMP 5.0 for this.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 12 Jeff Hammond 2018-12-13 14:49:52 UTC
I apologize for stupidly misinterpreting the automated message as something
else. My email client did not show the true sender address.
Comment 13 Nathaniel J. Smith 2018-12-13 17:52:12 UTC
Unfortunately, AFAICT, the omp_pause_resource APIs don't actually solve the problem.

They're fine and useful if you have a single piece of code that wants to use both omp and fork(). But, this was never a *huge* issue, because if you knew you were using omp then you also knew that fork() wasn't going to work, and could use some workaround.

The really nasty case is when the code using omp and the code using fork() are in entirely different pieces of code, that don't know about each other (e.g., different shared libraries). That's the motivating use case for this patch. I don't see how the omp_pause_resource APIs help with this. The best you could do is to set up a pre-fork hook to call omp_pause_resource_all, but that would be equivalent to my first patch that got rejected for breaking standards-compliant programs.

(In practice, the effect of this issue has been that the whole scientific python ecosystem simply avoids omp where-ever possible. That's why no one's been nagging about this patch. It still seems like a shame to me that all this work goes into the omp runtime and then it gets ruled out for so many users for such a trivial thing, but so it goes I guess.)