This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimizations documentation


Hi,

Dorit Nuzman/Haifa/IBM wrote on 14/02/2008 17:02:45:

> This is an old debt: A while back Tim had sent me a detailed report
> off line showing which C++ tests (originally from the Dongara loops
> suite) were vectorized by current g++ or icpc, or both, as well as
> when the vectorization by icpc required a pragma, or was partial. I
> went over the loops that were reported to be vectorized by icc but
> not by gcc, to see which features we are missing. There are 23 such
> loops (out of a total of 77). They fall into the following 7 categories:
>
> (1) scalar evolution analysis fails with "evolution of base is not
affine".
> This happens in the 3 loops in lines 4267, 4204 and 511.
> Here an example:
>  for (i__ = 1; i__ <= i__2; ++i__)
>         {
>           a[i__] = (b[i__] + b[im1] + b[im2]) * .333f;
>           im2 = im1;
>           im1 = i__;
>         }
> Missed optimization PR to be opened.

I opened PR35224.

>
> (2) Function calls inside a loop. These are calls to the math
> functions sin/cos, which I expect would be vectorized if the proper
> simd math lib was available.
> This happens in the loop in line 6932.
> I think there's an open PR for this one (at least for
> powerpc/Altivec?) - need to look/open.

There is PR22226.

>
> (3) This one is the most dominant missed optimization: if-conversion
> is failing to if-convert, most likely due to the very limited
> handling of loads/stores (i.e. load/store hoisting/sinking is required).
> This happens in the 13 loops in lines 4085, 4025, 3883, 3818, 3631,
> 355, 3503, 2942, 877, 6740, 6873, 5191, 7943.
> There is on going work towards addressing this issue - see http:
> //gcc.gnu.org/ml/gcc/2007-07/msg00942.html, http://gcc.gnu.
> org/ml/gcc/2007-09/msg00308.html. (I think Victor Kaplansky is
> currently working on this).
>
> (4) A scalar variable, whose address is taken outside the loop (in
> an enclosing outer-loop) is analyzed by the data-references
> analysis, which fails because it is invariant.
> Here's an example:
>   for (nl = 1; nl <= i__1; ++nl)
>     {
>       sum = 0.f;
>       for (i__ = 1; i__ <= i__2; ++i__)
>         {
>           a[i__] = c__[i__] + d__[i__];
>           b[i__] = c__[i__] + e[i__];];
>             sum += a[i__] + b[i__];];];
>         }
>       dummy_ (ld, n, &a[1], &b[1], &c__[1], &d__[1], &e[1], &aa
[aa_offset],
>               &bb[bb_offset], &cc[cc_offset], &sum);
>     }
> (Analysis of 'sum' fails with "FAILED as dr address is invariant".
> This happens in the 2 loops in lines 5053 and 332.
> I think there is a missed optimization PR for this one already. need
> to look/open.
>

The related PRs are PR33245 and PR33244. Also there is a FIXME comment in
tree-data-ref.c before the failure with "FAILED as dr address is invariant"
error:

      /* FIXME -- data dependence analysis does not work correctly for
objects with
         invariant addresses.  Let us fail here until the problem is fixed.
*/


> (5) Reduction and induction that involve multiplication (i.e. 'prod
> *= CST' or 'prod *= a[i]') are currently not supported by the
> vectorizer. It should be trivial to add support for this feature
> (for reduction, it shouldn't be much more than adding a case for
> MULT_EXPR in tree-vectorizer.c:reduction_code_for_scalar_code, I think).
> This happens in the 2 loops in lines 4921 and 4632.
> A missed-optimization PR to be opened.

Opened PR35226.

>
> (6) loop distribution is required to break a dependence. This may
> already be handled by Sebastian's loop-distribution pass that will
> be incorporated in 4.4.
> Here is an example:
>  for (i__ = 2; i__ <= i__2; ++i__)
>         {
>           a[i__] += c__[i__] * d__[i__];
>           b[i__] = a[i__] + d__[i__] + b[i__ - 1];
>         }
> This happens in the loop in line 2136.
> Need to check if we need to open a missed optimization PR for this.

I don't think that this is a loop distribution issue. The dependence
between the store to a[i] and the load from a[i] doesn't prevent
vectorization. The problematic one is between the store to b[i] and the
load from b[i-1] in the second statement.

>
> (7) A dependence, similar to such that would be created by
> predictive commoning (or even PRE), is present in the loop:
>  for (i__ = 1; i__ <= i__2; ++i__)
>         {
>           a[i__] = (b[i__] + x) * .5f;
>           x = b[i__];
>         }
> This happens in the loop in line 3003.
> The vectorizer needs to be extended to handle such cases.
> A missed optimization PR to be opened (if doesn't exist already).

I opened a new PR - 35229. (PR33244 is somewhat related).

Ira


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]