[Bug tree-optimization/98563] [10/11 Regression] vectorization fails while it worked on gcc 9 and earlier since since r10-2271-gd81ab49d0586fca0

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Jan 26 13:34:59 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98563

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> (In reply to Jakub Jelinek from comment #7)
> > I'm afraid no.
> > The vectorization can handle addresses into the simd arrays, but right now
> > only if it accesses the whole element, i.e. when we can turn the simd array
> > into a vector register (or set thereof) that hold the variable.
> > In this case that is not the case, as in the end it uses the real and imag
> > parts separately.
> > So, either it can be handled in SRA, or we'd need to teach the vectorizer to
> > permute those fur us.
> 
> Hmm, I see.  The vectorizer can in theory handle "existing" vectors
> (currently only enabled for basic-block SLP though).  But of course the
> first hurdle is
> to not treat those as memory accesses (thus ignore the data-ref analysis
> failure or somehow make that treat the SIMD_LANE indexing "nicely").
> 
> When we see
> 
>   _13 = .GOMP_SIMD_LANE (simduid.0_12(D), 0);
> 
> can we compute how _13 evolves with loop iteration?  Thus, can we
> SCEV analyze it?  Isn't it sth like { .GOMP_SIMD_LANE_START
> (simduid.0_12(D), .GOMP_SIMD_LANE_STEP (simduid.0_12(D), 0) } thus an affine
> evolution
> in the end?

_13 has modulo semantics in the loop, it gets values 0, 1, ... vf-1, 0, 1, ...
vf-1 etc., where vf is the vectorization factor of the loop.
The intent is that after successful vectorization, the array can be promoted to
a vector containing those (or a set of vectors, it is a software vector rather
than necessarily hardware vector) and on unsuccessful vectorization it will
shrink into a single array variable (scalar).

> Simplified C testcase:
> 
> typedef _Complex double cplx;
> void foo (cplx *);
> void test(cplx* __restrict__ a, const cplx* b, double c, int N)
> {
>   cplx tem;
> #pragma omp simd private (tem)
>   for (int i=0; i<8*N; i++) {
>       __real tem = __real b[i];
>       __imag tem = __imag b[i];
>       __real a[i] = __real tem;
>       __imag a[i] = __imag tem;
>   }
>   foo (&tem);

private clause means undefined at the end of construct, if you want to inspect
the value afterwards, the possible clauses are lastprivate (the scalar variable
receives the value from the last iteration), or reduction (in that case it will
reduce it using some base language reduction operator or user defined function
from all the vector elements).


More information about the Gcc-bugs mailing list