This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimizations documentation


Dorit Nuzman/Haifa/IBM wrote on 03/01/2008 18:53:25:

> > g++-4.3 seems well ahead of other compilers in ability to vectorize STL

> > iterators:
> >
http://softwarecommunity.intel.com/Wiki/HighPerformanceComputing/688.htm
> >

> good to know.
> I browsed over the posting above; do you have a summary table where
> one could easily see which loops get vectorized by other compilers
> and not by gcc (and vice versa)?
>

This is an old debt: A while back Tim had sent me a detailed report off
line showing which C++ tests (originally from the Dongara loops suite) were
vectorized by current g++ or icpc, or both, as well as when the
vectorization by icpc required a pragma, or was partial. I went over the
loops that were reported to be vectorized by icc but not by gcc, to see
which features we are missing. There are 23 such loops (out of a total of
77). They fall into the following 7 categories:

(1) scalar evolution analysis fails with "evolution of base is not affine".
This happens in the 3 loops in lines 4267, 4204 and 511.
Here an example:
 for (i__ = 1; i__ <= i__2; ++i__)
        {
          a[i__] = (b[i__] + b[im1] + b[im2]) * .333f;
          im2 = im1;
          im1 = i__;
        }
Missed optimization PR to be opened.

(2) Function calls inside a loop. These are calls to the math functions
sin/cos, which I expect would be vectorized if the proper simd math lib was
available.
This happens in the loop in line 6932.
I think there's an open PR for this one (at least for powerpc/Altivec?) -
need to look/open.

(3) This one is the most dominant missed optimization: if-conversion is
failing to if-convert, most likely due to the very limited handling of
loads/stores (i.e. load/store hoisting/sinking is required).
This happens in the 13 loops in lines 4085, 4025, 3883, 3818, 3631, 355,
3503, 2942, 877, 6740, 6873, 5191, 7943.
There is on going work towards addressing this issue - see
http://gcc.gnu.org/ml/gcc/2007-07/msg00942.html,
http://gcc.gnu.org/ml/gcc/2007-09/msg00308.html. (I think Victor Kaplansky
is currently working on this).

(4) A scalar variable, whose address is taken outside the loop (in an
enclosing outer-loop) is analyzed by the data-references analysis, which
fails because it is invariant.
Here's an example:
  for (nl = 1; nl <= i__1; ++nl)
    {
      sum = 0.f;
      for (i__ = 1; i__ <= i__2; ++i__)
        {
          a[i__] = c__[i__] + d__[i__];
          b[i__] = c__[i__] + e[i__];];
            sum += a[i__] + b[i__];];];
        }
      dummy_ (ld, n, &a[1], &b[1], &c__[1], &d__[1], &e[1], &aa[aa_offset],
              &bb[bb_offset], &cc[cc_offset], &sum);
    }
(Analysis of 'sum' fails with "FAILED as dr address is invariant".
This happens in the 2 loops in lines 5053 and 332.
I think there is a missed optimization PR for this one already. need to
look/open.

(5) Reduction and induction that involve multiplication (i.e. 'prod *= CST'
or 'prod *= a[i]') are currently not supported by the vectorizer. It should
be trivial to add support for this feature (for reduction, it shouldn't be
much more than adding a case for MULT_EXPR in
tree-vectorizer.c:reduction_code_for_scalar_code, I think).
This happens in the 2 loops in lines 4921 and 4632.
A missed-optimization PR to be opened.

(6) loop distribution is required to break a dependence. This may already
be handled by Sebastian's loop-distribution pass that will be incorporated
in 4.4.
Here is an example:
 for (i__ = 2; i__ <= i__2; ++i__)
        {
          a[i__] += c__[i__] * d__[i__];
          b[i__] = a[i__] + d__[i__] + b[i__ - 1];
        }
This happens in the loop in line 2136.
Need to check if we need to open a missed optimization PR for this.

(7) A dependence, similar to such that would be created by predictive
commoning (or even PRE), is present in the loop:
 for (i__ = 1; i__ <= i__2; ++i__)
        {
          a[i__] = (b[i__] + x) * .5f;
          x = b[i__];
        }
This happens in the loop in line 3003.
The vectorizer needs to be extended to handle such cases.
A missed optimization PR to be opened (if doesn't exist already).


Thanks, Tim, for the helpful report,

dorit

> thanks,
> dorit


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]