[Bug tree-optimization/92822] [10 Regression] vfma_laneq_f32 and vmul_laneq_f32 are broken on aarch64 after r278938
rsandifo at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jan 27 09:46:00 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92822
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
Assignee|rguenth at gcc dot gnu.org |rsandifo at gcc dot gnu.org
--- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I think this is mostly a target problem. We weren't providing
patterns to extract a 64-bit vector from a 128-bit vector,
despite that being very much a native operation.
Adding those fixes most of the problems. What's left is that:
v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, ?, ? }>;
...extract first half of v4sf_b...
gets filled out as:
v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 2, 3 }>;
...extract first half of v4sf_b...
and we never recover from the awkwardness of that permute.
The easiest fix seems to be to extend a partial duplicate
to a full duplicate instead of a partial duplicate followed
by a partial blend.
More information about the Gcc-bugs
mailing list