We do not support interleaving of accesses in the inner loop but SLP should be possible if the group is contiguous with respect to the outer loop evolution. void foo (double * __restrict a, double *b, int n) { for (int i = 0; i < 1024; ++i) { double res = a[i]; for (int j = 0; j < 8; ++j) res += b[j * 16 + 2*i]; a[i] = res; } } or void foo (double * __restrict a, double *b, int n) { for (int i = 0; i < 1024; ++i) { double res = a[i]; for (int j = 0; j < 8; ++j) res += b[j * 16 + 2*i] + b[j * 16 + 2*i + 1]; a[i] = res; } } should be possible to vectorize (the former is with a gap, the latter not). In practice this is likely relevant for both image (pixel, w/ and w/o gap) and complex numbers.
Confirmed.