[Bug tree-optimization/103903] Loops handling r,g,b values are not vectorized to use power of 2 vectors even if they can

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed Jan 5 07:43:35 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103903

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
If you fix the loop to do

  for (i=0;i<100000;i++)
  {
          dest[i].r/=src[i].g;
          dest[i].g/=src[i].g;
          dest[i].b/=src[i].b;
  }

it's vectorized just fine (with larger than necessary VF):

.L2:
        movaps  dest+16(%rax), %xmm1
        movaps  dest+32(%rax), %xmm0
        addq    $48, %rax
        divps   src-32(%rax), %xmm1
        movaps  dest-48(%rax), %xmm2
        divps   src-16(%rax), %xmm0
        divps   src-48(%rax), %xmm2
        movaps  %xmm1, dest-32(%rax)
        movaps  %xmm2, dest-48(%rax)
        movaps  %xmm0, dest-16(%rax)
        cmpq    $1200000, %rax
        jne     .L2

so not sure what you are asking for?  Is the unrolling harmful?  It should
be doable to do the "re-rolling" on the fly in some cases but it might be
some work to tie that in.


More information about the Gcc-bugs mailing list