[Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized
rguenther at suse dot de
gcc-bugzilla@gcc.gnu.org
Fri Aug 28 07:46:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 27 Aug 2015, wschmidt at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
>
> --- Comment #22 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #21)
> > (In reply to Bill Schmidt from comment #20)
>
> ...<snip>...
> >
> > I see it only failing due to cost issues (tried ppc64le and -mcpu=power8).
> > The unaligned loads cost 3 and we end up with
> >
> > t.f90:8:0: note: Cost model analysis:
> > Vector inside of loop cost: 40
> > Vector prologue cost: 8
> > Vector epilogue cost: 4
> > Scalar iteration cost: 12
> > Scalar outside cost: 6
> > Vector outside cost: 12
> > prologue iterations: 0
> > epilogue iterations: 0
> > t.f90:8:0: note: cost model: the vector iteration cost = 40 divided by the
> > scalar iteration cost = 12 is greater or equal to the vectorization factor =
> > 1.
> >
> > Note that we are (still) not very good in estimating the SLP cost as we
> > account 4 vector loads here (because we essentially will end up with
> > 4 different permutations used), so the "unaligned" part is accounted for
> > too much and likely the permutation cost as well. Both are a limitation
> > of the SLP data structures and not easily fixable. With
> > -fvect-cost-model=unlimited I see both loops vectorized.
>
> Yes, I get these same results for the loop vectorizer (using -O2
> -ftree-vectorize -mcpu=power8 -ffast-math). But I was looking at the failure
> to do SLP vectorization. In comment 19 you indicated this was now working,
> presumably on x86, but for Power we fail to SLP-vectorize
> fast-math-pr37021.f90:9:0.
Err, I meant loop SLP vectorization as opposed to loop vectorization
with interleaving... Basic-block SLP doesn't work because (at least)
it does not handle reductions yet (I have done some early work here
but wasn't able to finish it)
More information about the Gcc-bugs
mailing list