[Bug tree-optimization/68558] New: Fails to SLP loop

Thu Nov 26 14:44:00 GMT 2015

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68558

            Bug ID: 68558
           Summary: Fails to SLP loop
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
            Blocks: 53947
  Target Milestone: ---

void IMB_double_fast_x (int *destf, int *dest, int y, int *p1f)
{
  int i;
  for (i = y; i > 0; i--)
    {
      *dest++ = 0;
      destf[0] = p1f[0];
      destf[1] = p1f[1];
      destf[2] = p1f[2];
      destf[3] = p1f[3];
      destf[4] = p1f[8];
      destf[5] = p1f[9];
      destf[6] = p1f[10];
      destf[7] = p1f[11];
      destf += 8;
      p1f += 12;
    }
}

fails to SLP because of

t.c:4:3: note: Detected interleaving store of size 8 starting with *destf_37 =
_13;
t.c:4:3: note: Detected interleaving load of size 12 starting with _13 =
*p1f_39;
t.c:4:3: note: Data access with gaps requires scalar epilogue loop
...
t.c:4:3: note: Build SLP failed: the number of interleaved loads is greater
than the SLP group size _13 = *p1f_39;

splitting the load group doesn't help because then we'll hit

t.c:4:3: note: Build SLP failed: differen interleaving chains in one node

splitting the store group to vector-size pieces would generally make sense
but may have interesting effects on SLP discovery like w/o also splitting
loads will hit the first issue above.

The best fix would be to lift the above restrictions and let permutation
support decide whether it can create the required loads or not.

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations