(from https://stackoverflow.com/q/57465290/1918193) long RegularTest(int n) { long sum = 0; for (int i = 0; i < n; ++i) if (i % 2 != 0) sum += i + 1; return sum; } Compiling with -O3 -march=skylake, this gets vectorized, but the result has # vect_vec_iv_.14_60 = PHI <{ 0, 1, 2, 3, 4, 5, 6, 7 }(5), vect_vec_iv_.14_61(6)> vect_vec_iv_.14_61 = vect_vec_iv_.14_60 + { 8, 8, 8, 8, 8, 8, 8, 8 }; vect__3.17_66 = vect_vec_iv_.14_60 + { 2, 2, 2, 2, 2, 2, 2, 2 }; (those are the only uses of vect_vec_iv_.14_6[01]) If we are only ever going to use x+2, why not use that instead, initialize with {2,3,4,...}, and skip the +2 at every iteration? (there are other things to discuss about optimizing this testcase, for instance clang is clever enough to unroll by a factor of 2 and remove the condition, but let's stick to the induction variable for this PR)
(In reply to Marc Glisse from comment #0) > If we are only ever going to use x+2, why not use that instead, initialize > with {2,3,4,...}, and skip the +2 at every iteration? Or since we have another variable # vect_vec_iv_.13_57 = PHI <{ 1, 2, 3, 4, 5, 6, 7, 8 }(5), vect_vec_iv_.13_58(6)> vect_vec_iv_.13_58 = vect_vec_iv_.13_57 + { 8, 8, 8, 8, 8, 8, 8, 8 }; use that one +1 and maintain one less variable.
Confirmed. Note that IVOPTs does not even look at vector IVs and unfortunately the vectorizer itself doesn't try to be clever in any way here. Indeed there's a lot to be desired for the vector code we generate for this testcase... Even when not vectorized unrolling the loop to eliminate the conditional would be profitable I guess (no convenient pass to do this though).