This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized
- From: "wschmidt at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 27 Aug 2015 20:55:53 +0000
- Subject: [Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized
- Auto-submitted: auto-generated
- References: <bug-37021-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
--- Comment #22 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #21)
> (In reply to Bill Schmidt from comment #20)
...<snip>...
>
> I see it only failing due to cost issues (tried ppc64le and -mcpu=power8).
> The unaligned loads cost 3 and we end up with
>
> t.f90:8:0: note: Cost model analysis:
> Vector inside of loop cost: 40
> Vector prologue cost: 8
> Vector epilogue cost: 4
> Scalar iteration cost: 12
> Scalar outside cost: 6
> Vector outside cost: 12
> prologue iterations: 0
> epilogue iterations: 0
> t.f90:8:0: note: cost model: the vector iteration cost = 40 divided by the
> scalar iteration cost = 12 is greater or equal to the vectorization factor =
> 1.
>
> Note that we are (still) not very good in estimating the SLP cost as we
> account 4 vector loads here (because we essentially will end up with
> 4 different permutations used), so the "unaligned" part is accounted for
> too much and likely the permutation cost as well. Both are a limitation
> of the SLP data structures and not easily fixable. With
> -fvect-cost-model=unlimited I see both loops vectorized.
Yes, I get these same results for the loop vectorizer (using -O2
-ftree-vectorize -mcpu=power8 -ffast-math). But I was looking at the failure
to do SLP vectorization. In comment 19 you indicated this was now working,
presumably on x86, but for Power we fail to SLP-vectorize
fast-math-pr37021.f90:9:0.
However, with today's trunk my SLP dump looks slightly different so I need to
have another look at whether this is still failing due to alignment or
something else. I'll comment again when I've dug into it further.