This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication
- From: "pinskia at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 28 Jan 2017 08:36:31 +0000
- Subject: [Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication
- Auto-submitted: auto-generated
- References: <bug-18438-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438
--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #12)
> You are making an orthogonal point to this bug report: whether or not to
> vectorize such a loop. But if loop is vectorized, then on any
> microarchitecture it is better to have "st2" vs "umov; st1; str".
Yes but thinking about the problem some more I do think there are some vector
cost model issue in the aarch64 backend where we don't model int vs floating
point cost differences. For an example ^ for scalar int might be one cycle but
vector it is 4 cycles but for floating point scalar addition, it is 4 cycles
while the floating point vector addition is just 4 cycles.
struct cpu_vector_cost
{
const int scalar_stmt_cost; /* Cost of any scalar operation,
excluding load and store. */
...
const int vec_stmt_cost; /* Cost of any vector operation,
excluding load, store, permute,
vector-to-scalar and
scalar-to-vector operation. */
Anyways I filed PR 79262 for the regression.