[Bug tree-optimization/66036] New: strided group loads are not vectorized
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed May 6 12:41:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66036
Bug ID: 66036
Summary: strided group loads are not vectorized
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
For example
struct Xd {
double x;
double y;
};
double testd (struct Xd *x, int stride, int n)
{
int i;
double sum = 0.;
for (i = 0; i < n; ++i)
{
sum += x[i*stride].x;
sum += x[i*stride].y;
}
return sum;
}
or similar cases without reduction (simple case)
int testi (int *p, short *q, int stride, int n)
{
int i;
for (i = 0; i < n; ++i)
{
q[i*4+0] = p[i*stride+0];
q[i*4+1] = p[i*stride+1];
q[i*4+2] = p[i*stride+2];
q[i*4+3] = p[i*stride+3];
}
}
or the more complex case
int testi2 (int *q, short *p, int stride, int n)
{
int i;
for (i = 0; i < n; ++i)
{
q[i*4+0] = p[i*stride+0];
q[i*4+1] = p[i*stride+1];
q[i*4+2] = p[i*stride+2];
q[i*4+3] = p[i*stride+3];
}
}
because here the SLP group has smaller-than-vector size and thus requires
two "scalar" loads and a vector build from them (x86_64 movhlps/movulps).
The more complex form happens in SPEC CPUv6.
More information about the Gcc-bugs
mailing list