This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438

--- Comment #12 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #11)
> (In reply to Maxim Kuvyrkov from comment #9)
> > which then becomes for aarch64:
> > .L4:
> > 	ld2	{v0.2d - v1.2d}, [x1]
> > 	add	w2, w2, 1
> > 	cmp	w2, w7
> > 	eor	v0.16b, v2.16b, v0.16b
> > 	umov	x4, v0.d[1]
> > 	st1	{v0.d}[0], [x1]
> > 	add	x1, x1, 32
> > 	str	x4, [x1, -16]
> > 	bcc	.L4
> 
> 
> What I did for thunderx was create a vector cost model which caused this
> loop not be vectorized to get the regression from happening.  Not this might
> actually be better code for some micro arch. I need to check with the new
> processor we have in house but that is next week or so.  I don't know how
> much I can share next week though.

You are making an orthogonal point to this bug report: whether or not to
vectorize such a loop.  But if loop is vectorized, then on any
microarchitecture it is better to have "st2" vs "umov; st1; str".

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]