This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] [parloop branch] Supporting reductions for automatic parallelization


...
> > >
> >
> > Dan,
> > What are the reduction patterns not recognized by
vect_is_simple_reduction
> > tha tcan be found
> > using the SCC?
>
> Anything that involves multiple operations

a couple examples for this appear in PR32824 (where intermediate
assignments appear before and after the actual summation stmt, leftovers
from LIM) and PR25809 and PR29170 (where type casts appear before and after
the actual summation:
      # sum_23 = PHI <sum_11(4), 0(2)>
      sum.0_9 = (short unsigned int) sum_23;
      D.2031_10 = D.2029_8 + sum.0_9;
      sum_11 = (short int) D.2031_10;
)

> (even if they are the same
> operation) for the reduction (IE sum = sum + foo + bar)

an example for this is all the "*no-reassoc*.c" testcases in the vectorizer
testsuite, where the following summation:
      sum += (i + j);
looks like this (if reassociation is enabled):
      # sum_22 = PHI <sum_8(3), 0(7)>
      D.2511_7 = j_21 + sum_22;
      sum_8 = D.2511_7 + i_20;
whereas our reduction-pattern detection detects only this (same, without
reassociation):
      # sum_22 = PHI <sum_8(3), 0(7)>
      D.2511_7 = i_20 + j_21;
      sum_8 = D.2511_7 + sum_22;

> Anything that also uses the phi result in the loop (but not the sum
result)

not sure what you mean by this?

> Anything that involves using a variable from an outer loop in an inner
> loop sum reduction.

Is this what you meant?:
      for (i = 0; i < N; i++) {
        for (j = 0; j < N; j++) {
            sum += (i + j);
        }
      }
This reduction indeed is not captured in the outer-loop context, but is
detected and vectorized in the inner-loop context. We definitely need
something more generic to detect the reduction at the outer-loop level.

>
> Really, any cycle that is not absolutely completely trivial.  This is
> okay to do simple pattern matching, but if you really want to try to
> win on all the reductions you can, you should do it right.  (I
> understand the vectorizer does what it does because it was simple and
> gets 80% of all cases).

I think by now we have collected enough examples to justify going for
something more generic, especially that there's another pass (autopar) that
needs it (as also agreed at the
http://gcc.gnu.org/wiki/LoopOptimizationsBOF)

dorit

>
> --Dan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]