[Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
rsandifo at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Jun 22 12:43:10 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #11 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
FWIW, you could try something similar to how aarch64 handles this
for Advanced SIMD, with a combination of:
- TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
- TARGET_VECTORIZE_RELATED_MODE.
We get the optimal code for these tests on aarch64, even when
the loop vectoriser is used. E.g.:
void bar (short unsigned int * p1, short unsigned int * p2, int * restrict p3)
{
vector(4) int vect__11.26;
vector(4) int vect__8.25;
vector(4) short unsigned int vect__7.24;
vector(4) int vect__5.21;
vector(4) short unsigned int vect__4.20;
<bb 2> [local count: 214748371]:
vect__4.20_34 = MEM <vector(4) short unsigned int> [(short unsigned int
*)p1_15(D)];
vect__5.21_35 = (vector(4) int) vect__4.20_34;
vect__7.24_38 = MEM <vector(4) short unsigned int> [(short unsigned int
*)p2_16(D)];
vect__8.25_39 = (vector(4) int) vect__7.24_38;
vect__11.26_40 = vect__5.21_35 + vect__8.25_39;
MEM <vector(4) int> [(int *)p3_17(D)] = vect__11.26_40;
return;
}
which for -O2 -ftree-vectorize is produced by the loop vectorizer
rather than SLP.
More information about the Gcc-bugs
mailing list