This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication
- From: "mkuvyrkov at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 12 Dec 2016 10:03:09 +0000
- Subject: [Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication
- Auto-submitted: auto-generated
- References: <bug-18438-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438
--- Comment #12 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #11)
> (In reply to Maxim Kuvyrkov from comment #9)
> > which then becomes for aarch64:
> > .L4:
> > ld2 {v0.2d - v1.2d}, [x1]
> > add w2, w2, 1
> > cmp w2, w7
> > eor v0.16b, v2.16b, v0.16b
> > umov x4, v0.d[1]
> > st1 {v0.d}[0], [x1]
> > add x1, x1, 32
> > str x4, [x1, -16]
> > bcc .L4
>
>
> What I did for thunderx was create a vector cost model which caused this
> loop not be vectorized to get the regression from happening. Not this might
> actually be better code for some micro arch. I need to check with the new
> processor we have in house but that is next week or so. I don't know how
> much I can share next week though.
You are making an orthogonal point to this bug report: whether or not to
vectorize such a loop. But if loop is vectorized, then on any
microarchitecture it is better to have "st2" vs "umov; st1; str".