This is the mail archive of the
mailing list for the GNU Fortran project.
Re: forall vs OpenMP
Anton Shterenlikht wroter:
On Sun, Apr 08, 2012 at 08:10:39PM +0200, Thomas Koenig wrote:
This is a very strong statement.
I tried to find any generic
guidelines regarding when
forall is preferable to OpenMP
parallelisation for simple
loops, but couldn't find any.
For gfortran, the general guideline is to avoid forall. It does not
parallelize on its own. It is likely not to be any better than the
equivalent DO loop, and sometimes it is much worse.
Why is this? Is it a "design feature"
of the compiler or simply because
forall is poorly implemented in
gfortran right now?
A bit of all.
First - and most importantly: The FORALL construct is similar to an
assignment - and in assignments, the right-hand side has to be evaluated
before it is assigned to the left-hand side.
For example, if you have:
integer, pointer :: a(:), b(:)
A = B ! Or something similar with FORALL
This can be converted into a DO loop - however, only if one knows that
the RHS and the LHS are different variables. If you had:
A => B(10:1:-1)
a loop of the kind
do i = 1, 10
A = B
will give the wrong result. Thus, the compiler has to create a temporary
variable - which is slow.
If you write manually a DO loop, you implicitly apply the assumption
that there is no nontrivial dependency between the LHS and the RHS.
gfortran tries to optimize assignments and FORALL statements, but in
some cases (as with the example above), the compiler simply cannot know
whether there is some dependency - and will generate slower code. On the
other hand, FORALL is internally translated into a loop.
While some work has spend on optimizing FORALL, it is only rarely used
and optimizing is difficult. Thus, there is room for improvement for
gfortran (and other compilers). By contrast, loops are very widely used.
Thus, even if there were more difficult to optimize, the machinery is
implemented in compilers. Bottom line: The performance of FORALL is the
same as for loops - unless the compiler has to use a temporary variable
with FORALL - then it is much slower.
Fortran 2008 has a better replacement for FORALL - which is also more
powerful: DO CONCURRENT. Here, the user ensures that there is no
dependency, allowing to run through the loop in any order. This allows
some more optimizations on the compiler side (though most compilers do
not make use of it). It also allows better (auto)parallelization or
optionally parallelization. (Currently, DO CONCURRENT is handled as a
normal DO loop in gfortran, though improvements are planned - including
optional thread-based parallelization.)
* FORALL is intrinsically slower than manual loops, unless the compiler
knows that the FORALL assignment can be translated trivally (no
difficult dependency). [Language constraints]
* FORALL does not imply parallelization any more than a normal loop -
and is internally translated into such a loop. [Implementation choice,
shared by nearly all compilers. Maybe some vector computers handled it
* Compilers can via autoparallelization parallelize normal loops (incl.
FORALL and DO CONCURRENT), though they often don't do a good job. But
without extra flags, normal loops, FORALL and DO CONCURRENT are not run
in parallel (with most compilers, incl. gfortran)
* DO CONCURRENT of Fortran 2008 is FORALL done correctly. It allows some
optimizations for serial code and makes automatic parallelization easier
for the compiler. (Currently handled as normal loop in gfortran. Could
be optionally parallelized or better optimized, but that's not yet
* OpenMP: An explicit parallelization typically works best as one can
then manually balance the cost of forking new threads vs. saving time
through parallel processing. Note that that's independent of the use of
a normal DO loop, FORALL and DO CONCURRENT. However, gfortran currently
does not support OMP WORKSHARE for FORALL - only for assignments.