void test1(int *p, int *t, int N) { for (int i = 0; i != N; i++) *t += p[i]; } void test2(int *p, int *t, int N) { if (N > 1024) // hint, N is not small for (int i = 0; i != N; i++) *t += p[i]; } void test3(int *p, int *t, int N) { if (N > 1024) { // hint, N is not small int s = 0; for (int i = 0; i != N; i++) s += p[i]; *t += s; } } test3 is successfully vectorized with LLVM, GCC, ICC. Sadly, only ICC can catch test1 and test2. https://godbolt.org/z/PzoYd4eEK
I suspect the vectorizer is not adding an alias check in the case of reduction.
The issue is that t can point anywhere into p[], Andrew is correct in that we could in theory do a runtime check but unfortunately vectorization relies on reductions being done on registers and thus store-motion to have taken place. But store-motion does not do any runtime alias checks. The fix is at the source level to add __restrict__ to p for example or to perform the store-motion yourself as you've done in test3. In principle the vectorizer could do reduction vectorization on stride zero memory accesses as well, but currently we give up on such stores completely.