[Bug tree-optimization/92822] [10 Regression] vfma_laneq_f32 and vmul_laneq_f32 are broken on aarch64 after r278938

Mon Jan 27 09:46:00 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92822

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org
           Assignee|rguenth at gcc dot gnu.org         |rsandifo at gcc dot gnu.org

--- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I think this is mostly a target problem.  We weren't providing
patterns to extract a 64-bit vector from a 128-bit vector,
despite that being very much a native operation.

Adding those fixes most of the problems.  What's left is that:

  v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, ?, ? }>;
  ...extract first half of v4sf_b...

gets filled out as:

  v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 2, 3 }>;
  ...extract first half of v4sf_b...

and we never recover from the awkwardness of that permute.
The easiest fix seems to be to extend a partial duplicate
to a full duplicate instead of a partial duplicate followed
by a partial blend.