Created attachment 32019 [details]
patch to make openmp -> quiesce -> fork -> openmp work
This is a re-open of #52303 and #58378, with more arguments, and a proposed patch that fixes the problem without violating the openmp standard.
Background: Almost all scientific/numerical code delegates linear algebra operations to some optimized BLAS library. Currently, the main contenders for this library are:
1) ATLAS: free software, but uses extensive build-time configuration, which means it must be re-compiled from source by every user to achieve competitive performance.
2) MKL: proprietary, but technically excellent.
3) OpenBLAS: free software, but uses OpenMP for threading, which means that any program which does linear algebra and also expects fork() to work is screwed , at least when using GCC.
This means that for projects like numpy, which are used in a very large range of downstream products, we are pretty much screwed too. Many of our users use fork(), for various good reasons that I can elaborate if desired, so we can't just recommend OpenBLAS in general -- ATLAS or MKL are superior for . But recompiling ATLAS is difficult, so we can't recommend that as a general solution, or provide it in pre-compiled downloads. So what we end up doing is shipping slow, unoptimized BLAS, while all the major "scientific python" distros ship MKL; and we also get constantly pressured by users to either ship binaries with MKL or with OpenBLAS built with icc; and we field a new bug report every week or two from people who use OpenBLAS without realizing it and are experiencing mysterious hangs. (Or sometimes other projects get caught in the crossfire, e.g.  which is someone trying to figure out why their web-app can't generate plot graphics when using the celery job queue manager.) Meanwhile people are waiting with bated breath for clang to get an openmp implementation so that they can shift their whole stack over there, solely because of this one bug.
Basically the current situation is causing ongoing pain for a wide range of people and makes free software uncompetitive with proprietary software for scientific code using Python in general. But it doesn't have to be this way! In actual practice on real implementations -- regardless of what POSIX says -- it's perfectly safe to use arbitrary POSIX APIs after fork, so long as all threads are in a known, quiescent state when the fork occurs.
The attached patch has essentially no impact on compliant OpenMP-using programs; in particular, and unlike the patch in #58378, it has no affect on the behavior of the parent process, and in the child process it does nothing that violates POSIX unless the user has violated POSIX first. But it makes it safe in practice to use OpenMP encapsulated within a serial library API, without mysterious breakage depending on far away parts of the program, and in particular should fix the OpenBLAS issue.
Test case included in patch is by Olivier Grisel, from #58378. Patch is against current gcc svn trunk (r206297).
It would be a good idea to post this to the gcc-patches mailing list.
Good point -- sent.
I've just spent several days tracking down the cause of the mysterious hangs in processes forked by R (http://www.r-project.org). A resolution of this issue would be very helpful.
Nathaniel, could you apply the cosmetic changes suggested at http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00860.html? I'd hate to see this patch go to waste.
Created attachment 32548 [details]
patch to make openmp -> quiesce -> fork -> openmp work (updated)
Updated based on feedback from Richard Henderson
(In reply to larsmans from comment #4)
> Nathaniel, could you apply the cosmetic changes suggested at
> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00860.html? I'd hate to see
> this patch go to waste.
If you look at that thread then you'll see I did resend the patch with those fixes -- I've just attached the updated patch to this bug report as well, thanks for the catch.
My guess is that no-one will pay much attention to this until gcc re-enters phase 1 in any case.
Phase 1? (Not familiar with the GCC dev cycle.)
(In reply to larsmans from comment #7)
> Phase 1? (Not familiar with the GCC dev cycle.)
Sorry, meant "stage 1". GCC trunk is (IIUC) currently in RC-bug-fixes-only pre-release freeze mode.
Any news on this? as I would like to use OpenMP with BLAS but I'm currently stuck with pthreads because of this.