[Bug tree-optimization/91732] Adding omp simd pragma prevents vectorization

Wed Sep 11 12:32:00 GMT 2019

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91732

--- Comment #3 from Jed Brown <jed at 59A2 dot org> ---
> why not use gsym[Q*2*j+i] instead of g[j][0] and similarly gsym[Q*2-j*Q+i] instead of g[j][1]?

The pattern here is that gsym is packed storage of a symmetric 2x2 matrix,
while g unpacks it so that inner loops (intended for unrolling) can be written
using index notation. This case (a finite element quadrature routine for 2D
anisotropic Poisson) is reduced from more complicated examples (such as 3D
nonlinear solid and fluid mechanics) where this technique provides substantial
clarity and correspondence to mathematical notation. The suggested
transformation (eliminating the temporary g[][] in exchange for fancy indexing
of g) is problematic when representing higher order tensors
(https://en.wikipedia.org/wiki/Voigt_notation#Mnemonic_rule).

It's also sometimes desirable to roll the second loop instead of repeating, in
which case you don't get to have a different indexing rule for g[j][0] and
g[j][1].

  for (int i=0; i<Q; i++) {
    const double g[2][2] = {{gsym[Q*0+i], gsym[Q*2+i]},
                            {gsym[Q*2+i], gsym[Q*1+i]}};
    for (int j=0; j<2; j++) {
      dv[Q*j+i] = 0;
      for (int k=0; k<2; k++)
        dv[Q*j+i] += g[j][k] * du[Q*k+i];
    }
  }
}

Fortunately, we're generally getting good codegen for more complicated cases
when using "GCC ivdep" and hit-or-miss (but good in this case) without any
pragmas (restrict becomes important when there are more arrays).  I filed this
report specifically because adding a semantically-correct "omp simd" prevented
vectorization.