Bug 60661 - DO CONCURRENT with MASK: Avoid using a temporary for the mask
Summary: DO CONCURRENT with MASK: Avoid using a temporary for the mask
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 44646
  Show dependency treegraph
 
Reported: 2014-03-25 22:52 UTC by Tobias Burnus
Modified: 2018-10-30 08:57 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-08-10 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2014-03-25 22:52:25 UTC
Currently, gfortran generates a temporary as shown below. However, the question is whether one cannot do without a temporary by moving the mask expression into the loop.

I think that usually works - but not always. It works when:
a) The variable in the mask does not occur on the LHS of an assignment or as intent([in]out) argument of a pure subroutine
b) If the variable only occurs with the same array index as later in the body of the DO CONCURRENT loop

I am not sure whether something with FORALL prevents this optimization.

I think the simplest fix would be to transform
  DO CONCURRENT(i=1:n, mask(i))
     ...
to
  DO CONCURRENT(i=1:n)
    IF (.not. mask(i)) CYCLE
in the FE optimization


"7.2.4.2.3 Evaluation of the mask expression
The scalar-mask-expr, if any, is evaluated for each combination of index-name values. If there is no scalar-mask-expr, it is as if it appeared with the value true. The index-name variables may be primaries in the
scalar-mask-expr. The set of active combinations of index-name values is the subset of all possible combinations (7.2.4.2.2) for which
the scalar-mask-expr has the value true."

C736 (R752) The scalar-mask-expr shall be scalar and of type logical.
C737 (R752) Any procedure referenced in the scalar-mask-expr , including one referenced by a defined operation,
shall be a pure procedure (12.7).


    forall (i=start:end:stride; maskexpr)
      e<i> = f<i>
      g<i> = h<i>
    end forall
   (where e,f,g,h<i> are arbitrary expressions possibly involving i)
   Translates to:
    count = ((end + 1 - start) / stride)
    masktmp(:) = maskexpr(:)

    maskindex = 0;
    for (i = start; i <= end; i += stride)
      {
        if (masktmp[maskindex++])
          e<i> = f<i>
      }
    maskindex = 0;
    for (i = start; i <= end; i += stride)
      {
        if (masktmp[maskindex++])
          g<i> = h<i>
      }
Comment 1 Tobias Burnus 2014-03-25 23:00:42 UTC
Note that one needs to be careful to handle OpenACC/OpenMP correctly to make sure that, e.g., "!$acc loop" remains attached to the loop it belongs to.
Comment 2 Tobias Burnus 2014-03-27 06:54:46 UTC
Quote from the standard: http://mailman.j3-fortran.org/pipermail/j3/2014-March/007259.html

The key paragraph is [176:22]:

"At the completion of the execution of the DO statement, the execution cycle begins."

Figuring out the list of index values is part of the execution of the DO CONCURRENT statement [176:20-21].
Comment 3 Thomas Koenig 2014-03-30 22:20:39 UTC
We have to be a bit careful about statement like

  do concurrent(i=1:n, a(i)>sum(a)/n)
    a(i) = a(i) * 0.5
  end do

which really have to be before the execution
of the loop body itself.
Comment 4 Thomas Koenig 2014-08-10 15:50:18 UTC
For

do concurrent(i=1:n, a(i)>sum(a)/n)

we currently evaluate the sum every time.  This can
definitely be improved, by taking out expressions which
do not depend on the index variable out of the mask.