gcc 4.5 can not vectorize this simple loop: void foo(int a[], int n) { int i; for(i=1; i< n; i++) a[i] = a[0]; } "gcc -O3 -fdump-tree-vect-all -c foo.c" shows: foo.c:3: note: not vectorized: unhandled data-ref foo.c:3: note: bad data references. foo.c:1: note: vectorized 0 loops in function. It seems gcc gets confused at a[0] and gives up vectorization. There is no dependence in this loop, and we should teach gcc to handle a[0] to vectorize it.
Actually a[0] should be load hoisted from the loop as it not changed from inside the loop at all.
So currently inside LIM (which does load motion in general): D.2724_7 = a_6(D) + D.2723_5; D.2725_8 = *a_6(D); *D.2724_7 = D.2725_8; But LIM/alias oracle does not know that D.2723_5 has a range of [4, n_3*4] which means D.2724_7 can never equal a_6 so we don't pull out the load from a_6.
Related to PR 29751 but that only does a simple method and does not handle this case as we need range info.
Here is another similar case but more general. We know that a(j) and a(i) never access the same memory location. intel ifort can vectorize this triangular loop: do 10 j = 1,n do 20 i = j+1, n a(i) = a(i) - aa(i,j) * a(j) 20 continue 10 continue
(In reply to comment #4) > Here is another similar case but more general. Actually it is a totally different case. Please file a new bug with that case; though there might already be a bug about that one.
> > Actually it is a totally different case. Please file a new bug with that case; > though there might already be a bug about that one. > I could not see the difference even though j is not a compile-time constant. (it is an invariant to the innermost loop). I can say: GCC does not pull out a[j] from loop that changes a[i] for i:[j+1,n]
So even though we can vectorize this loop these days, the non-vectorized loop still has the load each iteration. at -O2: .L3: movl (%ecx), %edx addl $4, %eax movl %edx, -4(%eax) cmpl %ebx, %eax jne .L3