[Bug tree-optimization/66036] New: strided group loads are not vectorized

Wed May 6 12:41:00 GMT 2015

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66036

            Bug ID: 66036
           Summary: strided group loads are not vectorized
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

For example

struct Xd {
    double x;
    double y;
};
double testd (struct Xd *x, int stride, int n)
{
  int i;
  double sum = 0.;
  for (i = 0; i < n; ++i)
    { 
      sum += x[i*stride].x;
      sum += x[i*stride].y;
    }
  return sum;
}

or similar cases without reduction (simple case)

int testi (int *p, short *q, int stride, int n)
{
  int i;
  for (i = 0; i < n; ++i)
    {
      q[i*4+0] = p[i*stride+0];
      q[i*4+1] = p[i*stride+1];
      q[i*4+2] = p[i*stride+2];
      q[i*4+3] = p[i*stride+3];
    }
}

or the more complex case

int testi2 (int *q, short *p, int stride, int n)
{
  int i;
  for (i = 0; i < n; ++i)
    {
      q[i*4+0] = p[i*stride+0];
      q[i*4+1] = p[i*stride+1];
      q[i*4+2] = p[i*stride+2];
      q[i*4+3] = p[i*stride+3];
    }
}

because here the SLP group has smaller-than-vector size and thus requires
two "scalar" loads and a vector build from them (x86_64 movhlps/movulps).
The more complex form happens in SPEC CPUv6.