The below test case fails to vectorize. gcc version 7.0.0 20160724 (experimental) (GCC) gcc -Ofast -mavx -fvect-cost-model=unlimited slp.c -S -fdump-tree-slp-all struct st { double x; double y; double z; double p; double q; }*obj; double a,b,c; void slp_test() { obj->x = a*a+3.0; obj->y= b*b+c; obj->z= a+b*3.0; obj->p= a+b*3.0; obj->q =a+b+c; } LLVM is able to SLP vectorize looks like it is creating vector of [a,c] and [b*3.0,b*b] and does vector add. GCC is not SLP vectorizing. Group slitting also not working. I expected it to get split and vectorize these statements. obj->z= a+b*3.0; obj->p= a+b*3.0; Another case struct st { double x; double y; double z; double p; double q; }*obj; double a,b,c; void slp_test() { obj->x = a*b; obj->y= b+c; obj->z= a+b*3.0; obj->p= a+b*3.0; obj->q =a+b+c; } LLVM forms vector [b*3.0,a+b] [a,c] and does vector addition.
Confirmed. I think doing it as [a, b, b, b] * [a, b, 3., 3.] + [3., c, a, a] would be "optimal" (not factoring in vector construction cost of course). The issue is how SLP construction works and the number of swaps / builds from scalars do. One issue is that we even try with a group-size of 5. Fixing that doesn't fix it though as we do not consider building a vector from scalars until we tried to swap the parent op (and if that fails we don't go back building children from scalars). Only trying with a group size of 4 would also regress the case where we'd have split after the first element. That said, the whole SLP discovery needs a different algorithmic approach to fix cases like this.