[Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack

Tue Jun 22 12:43:10 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #11 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
FWIW, you could try something similar to how aarch64 handles this
for Advanced SIMD, with a combination of:

- TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
- TARGET_VECTORIZE_RELATED_MODE.

We get the optimal code for these tests on aarch64, even when
the loop vectoriser is used.  E.g.:

void bar (short unsigned int * p1, short unsigned int * p2, int * restrict p3)
{
  vector(4) int vect__11.26;
  vector(4) int vect__8.25;
  vector(4) short unsigned int vect__7.24;
  vector(4) int vect__5.21;
  vector(4) short unsigned int vect__4.20;

  <bb 2> [local count: 214748371]:
  vect__4.20_34 = MEM <vector(4) short unsigned int> [(short unsigned int
*)p1_15(D)];
  vect__5.21_35 = (vector(4) int) vect__4.20_34;
  vect__7.24_38 = MEM <vector(4) short unsigned int> [(short unsigned int
*)p2_16(D)];
  vect__8.25_39 = (vector(4) int) vect__7.24_38;
  vect__11.26_40 = vect__5.21_35 + vect__8.25_39;
  MEM <vector(4) int> [(int *)p3_17(D)] = vect__11.26_40;
  return;
}

which for -O2 -ftree-vectorize is produced by the loop vectorizer
rather than SLP.