[Bug tree-optimization/103903] Loops handling r,g,b values are not vectorized to use power of 2 vectors even if they can
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Jan 5 07:43:35 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103903
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
If you fix the loop to do
for (i=0;i<100000;i++)
{
dest[i].r/=src[i].g;
dest[i].g/=src[i].g;
dest[i].b/=src[i].b;
}
it's vectorized just fine (with larger than necessary VF):
.L2:
movaps dest+16(%rax), %xmm1
movaps dest+32(%rax), %xmm0
addq $48, %rax
divps src-32(%rax), %xmm1
movaps dest-48(%rax), %xmm2
divps src-16(%rax), %xmm0
divps src-48(%rax), %xmm2
movaps %xmm1, dest-32(%rax)
movaps %xmm2, dest-48(%rax)
movaps %xmm0, dest-16(%rax)
cmpq $1200000, %rax
jne .L2
so not sure what you are asking for? Is the unrolling harmful? It should
be doable to do the "re-rolling" on the fly in some cases but it might be
some work to tie that in.
More information about the Gcc-bugs
mailing list