Consider the following short program. If it is compiled MyFunc1 is vectorized but MyFunc2 is not. Both fuctions differ only in the empty loop, which is a comment in MyMunc2. My compiler is the latest gcc41 snapshot (20050618) Michael Cieslinski double MyFunc1 (int size) { int len = size + 1; double Data[16] = {0}; // for (int i=3; i<len; i++) {} // empty loop for (int i=0; i<len; i++) Data[i] = Data[i]>=0 ? Data[i] : -Data[i]; return Data[1]; } double MyFunc2 (int size) { int len = size + 1; double Data[16] = {0}; for (int i=3; i<len; i++) {} // empty loop for (int i=0; i<len; i++) Data[i] = Data[i]>=0 ? Data[i] : -Data[i]; return Data[1]; } Output from gcc: g++41c -O2 -ftree-vectorize -c vec-test.cpp -ftree-vectorizer-verbose=5 vec-test.cpp:7: note: accesses have the same alignment. vec-test.cpp:7: note: LOOP VECTORIZED. vec-test.cpp:1: note: vectorized 1 loops in function. vec-test.cpp:17: note: not vectorized: unsupported data-type vec-test.cpp:19: note: not vectorized: number of iterations cannot be computed. vec-test.cpp:13: note: vectorized 0 loops in function. g++41c -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.1-20050618/configure --prefix=/usr/local/gcc41c -- program-suffix=41c --with-arch=opteron --enable-languages=c,c++ --enable- checking Thread model: posix gcc version 4.1.0 20050618 (experimental)
Confirmed, I think this is a bug in scaler evolution though.
The problem occurs in decision whether the number of loop iterations is greater than zero. The (single) predecessor edge is checked for being EDGE_TRUE_VALUE or EDGE_FALSE_VALUE, and the corresponding predicate is used to make the decision. In the first case (single loop) BB 0 contains predicate 'len>0', its TRUE successor is BB 1, and the fallthru successor of BB 1 is BB 2 - the loop. The condition to check is 'len <= 0', which is therefore simplified to FALSE. In the second case, however, the control flow is more complicated. The loop is in BB 6, its predecessor is BB 3, which has 2 predecessors: BB 5 (with predicate 'len > 0'), and BB 2 - the first loop. The first loop is also guarded by 'len > 0', but this information is not propagated.
More precisely, the code when it comes to loop optimizer looks basically as if (len > 3) something; else if (len > 0) something_else; else return; for (i = 0; i < len; i++) whatever; So indeed, len > 0 on each path that reaches the loop, but it is not trivial to deduce. One solution is to use the results of VRP here. However, VRP uses scev analysis, and adding yet another cycle into the current completely incomprehensible nest of dependences between # of iterations analysis and SCEV seems quite scary to me (and there are other technical difficulties with this solution -- running VRP twice is bad from efficiency reasons, but keeping results of VRP valid is not entirely trivial and may prevent some optimizations). The other possibility is to extend the current oracle-like approach (# of iterations analysis basically asks whether a given condition -- len > 0 in this case -- is true, and answer is computed on-demand by traversing SSA and dominance tree) to handle also this case. This would however need to have the ASSERT_EXPRs (so we either would need to insert them before loop optimizer, or keep them valid since VRP, neither of which seems to be good for performance and memory consumption).
Link to vectorizer missed-optimization meta-bug.
This is fixed, even if I disable VRP / jump-threading and substitute the info from comment #3 in the code: double MyFunc2 (int size) { int len = size + 1; double Data[16] = {0}; if (len > 3) ; else if (len > 0) ; else return 0; for (int i=0; i<len; i++) Data[i] = Data[i]>=0 ? Data[i] : -Data[i]; return Data[1]; }