[Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jun 17 06:38:10 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |53947
Keywords| |missed-optimization
Target| |x86_64-*-* i?86-*-*
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2021-06-17
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, so the difference is that we use loop vect for 'foo' but fail to do that
for 'bar' and BB vect succeeds. Disabling loop vect but enabling BB vect also
produces optimal code for 'foo' (unrolling happens before):
foo:
.LFB0:
.cfi_startproc
vpmovzxwd (%rsi), %ymm0
vpmovzxwd (%rdi), %ymm1
vpaddd %ymm1, %ymm0, %ymm0
vmovdqu %ymm0, (%rdx)
vzeroupper
the key difference in the vectorizer is that BB vect supports different
vector sizes in the same instance but the loop vectorizer can only use
a single vector size.
There's some related PRs in that context.
void
foo (unsigned short* p1, unsigned short* p2, int* __restrict p3)
{
for (int i = 0 ; i != 32; i++)
p3[i] = p1[i] + p2[i];
return;
}
is never optimized optimally because of too many iterations for unrolling
to trigger (--parm max-completely-peel-times default is 16).
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
More information about the Gcc-bugs
mailing list