This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: forall vs OpenMP


Anton Shterenlikht wroter:
On Sun, Apr 08, 2012 at 08:10:39PM +0200, Thomas Koenig wrote:
I tried to find any generic
guidelines regarding when
forall is preferable to OpenMP
parallelisation for simple
loops, but couldn't find any.
For gfortran, the general guideline is to avoid forall.  It does not
parallelize on its own.  It is likely not to be any better than the
equivalent DO loop, and sometimes it is much worse.
This is a very strong statement.

Why is this? Is it a "design feature"
of the compiler or simply because
forall is poorly implemented in
gfortran right now?

A bit of all.


First - and most importantly: The FORALL construct is similar to an assignment - and in assignments, the right-hand side has to be evaluated before it is assigned to the left-hand side.

For example, if you have:

subroutine foo(a,b)
  integer, pointer :: a(:), b(:)
  A = B   ! Or something similar with FORALL

This can be converted into a DO loop - however, only if one knows that the RHS and the LHS are different variables. If you had:
A => B(10:1:-1)
a loop of the kind
do i = 1, 10
A = B
end do
will give the wrong result. Thus, the compiler has to create a temporary variable - which is slow.


If you write manually a DO loop, you implicitly apply the assumption that there is no nontrivial dependency between the LHS and the RHS.

gfortran tries to optimize assignments and FORALL statements, but in some cases (as with the example above), the compiler simply cannot know whether there is some dependency - and will generate slower code. On the other hand, FORALL is internally translated into a loop.

While some work has spend on optimizing FORALL, it is only rarely used and optimizing is difficult. Thus, there is room for improvement for gfortran (and other compilers). By contrast, loops are very widely used. Thus, even if there were more difficult to optimize, the machinery is implemented in compilers. Bottom line: The performance of FORALL is the same as for loops - unless the compiler has to use a temporary variable with FORALL - then it is much slower.


Fortran 2008 has a better replacement for FORALL - which is also more powerful: DO CONCURRENT. Here, the user ensures that there is no dependency, allowing to run through the loop in any order. This allows some more optimizations on the compiler side (though most compilers do not make use of it). It also allows better (auto)parallelization or optionally parallelization. (Currently, DO CONCURRENT is handled as a normal DO loop in gfortran, though improvements are planned - including optional thread-based parallelization.)



Bottom line:


* FORALL is intrinsically slower than manual loops, unless the compiler knows that the FORALL assignment can be translated trivally (no difficult dependency). [Language constraints]

* FORALL does not imply parallelization any more than a normal loop - and is internally translated into such a loop. [Implementation choice, shared by nearly all compilers. Maybe some vector computers handled it differently]

* Compilers can via autoparallelization parallelize normal loops (incl. FORALL and DO CONCURRENT), though they often don't do a good job. But without extra flags, normal loops, FORALL and DO CONCURRENT are not run in parallel (with most compilers, incl. gfortran)

* DO CONCURRENT of Fortran 2008 is FORALL done correctly. It allows some optimizations for serial code and makes automatic parallelization easier for the compiler. (Currently handled as normal loop in gfortran. Could be optionally parallelized or better optimized, but that's not yet implemented).

* OpenMP: An explicit parallelization typically works best as one can then manually balance the cost of forking new threads vs. saving time through parallel processing. Note that that's independent of the use of a normal DO loop, FORALL and DO CONCURRENT. However, gfortran currently does not support OMP WORKSHARE for FORALL - only for assignments.


Tobias



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]