Bug 103903 - Loops handling r,g,b values are not vectorized to use power of 2 vectors even if they can
Summary: Loops handling r,g,b values are not vectorized to use power of 2 vectors even...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2022-01-04 15:17 UTC by Jan Hubicka
Modified: 2022-01-05 09:49 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-01-04 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2022-01-04 15:17:52 UTC
This is another textcase comming from Firefox's LightPixel. I am not sure if this is duplicate, but I think it is quite common in programs dealing with RGB values.

To match the vectorized code we would need to move from SLP vectorizing the 3 parallel computations to vectorising the loop.

struct a {float r,g,b;};
struct a src[100000], dest[100000];

void
test ()
{
  int i;
  for (i=0;i<100000;i++)
  {
          dest[i].r/=src[i].g;
          dest[i].g/=src[i].g;
          dest[i].b/=src[i].b;
  }
}

is vectorized to do 3 operaitons at a time, while equivalent:

float src[300000], dest[300000];

void
test ()
{
  int i;
  for (i=0;i<300000;i++)
  {
          dest[i]/=src[i];
  }
}

runs faster.
Comment 1 Andrew Pinski 2022-01-04 19:18:35 UTC
Basically this is re-rolling.

PR 99412 is another example of re-rolling; there might be others.
Comment 2 Richard Biener 2022-01-05 07:43:35 UTC
If you fix the loop to do

  for (i=0;i<100000;i++)
  {
          dest[i].r/=src[i].g;
          dest[i].g/=src[i].g;
          dest[i].b/=src[i].b;
  }

it's vectorized just fine (with larger than necessary VF):

.L2:
        movaps  dest+16(%rax), %xmm1
        movaps  dest+32(%rax), %xmm0
        addq    $48, %rax
        divps   src-32(%rax), %xmm1
        movaps  dest-48(%rax), %xmm2
        divps   src-16(%rax), %xmm0
        divps   src-48(%rax), %xmm2
        movaps  %xmm1, dest-32(%rax)
        movaps  %xmm2, dest-48(%rax)
        movaps  %xmm0, dest-16(%rax)
        cmpq    $1200000, %rax
        jne     .L2

so not sure what you are asking for?  Is the unrolling harmful?  It should
be doable to do the "re-rolling" on the fly in some cases but it might be
some work to tie that in.
Comment 3 Jan Hubicka 2022-01-05 09:49:25 UTC
Aha, sorry. I did not spot the typo in cut&paste.  Unrolling is fine.  I need to figure out why in the real testcase we don't do the same transformation.