(taken from a stackoverflow question about a bug in llvm, replace -1 with -2 if you want to test llvm and avoid the bug) gcc -O3 fails to vectorize the following program because it sees control flow in the loop. If I move i++ before the "if", which becomes i == 0, we still fail to vectorize because we get confused about the number of iterations. Finally, if I stop at i == 2048, we do vectorize, but the generated code could do with some improvements (that would be for a different PR though). #include <stdint.h> #include <string.h> int main() { uint32_t i = 0; uint32_t count = 0; while (1) { float n; memcpy(&n, &i, sizeof(float)); if(n >= 0.0f && n <= 1.0f) count++; if (i == -1) break; i++; } return count; }
if-conversion doesn't happen because the vectorizer wouldn't be happy with the loop anyway because its latch block is not empty. i is incremented after the exit test. If you write the testcase as int main() { unsigned int i = 0; unsigned int count = 0; while (1) { float n; __builtin_memcpy(&n, &i, sizeof(float)); if(n >= 0.0f && n <= 1.0f) count++; i++; if (i == 0) break; } return count; } then it would work if the vectorizer weren't to check the 'wrong' number of iterations to bail out for zero. Basically, it's hard to write a loop that iterates UINT_MAX times with an uint induction variable ;) And yeah, I fixed a bug similar to LLVMs for 4.9 recently ... (PR59058). It left the above missed-optimization hole.