The modifies testcase from PR18767 shows the problem where loop count variables still remains in vectorized loop. Compiling the modified testcase with 'g++ -O2 -march=pentium4 -ftree-vectorize' following code is produced for the first loop: ... leal -24(%ebp), %esi leal -40(%ebp), %ebx leal -56(%ebp), %ecx xorl %eax, %eax xorl %edx, %edx .L2: addl $1, %eax movaps (%edx,%esi), %xmm0 mulps (%ebx,%edx), %xmm0 movaps %xmm0, (%edx,%ecx) addl $16, %edx cmpl $1, %eax jne .L2 ... It looks that the compiler does not figure out that the conditional jump is never taken. However with 'g++ -O2 -march=pentium4 -ftree-vectorize -funroll-loops' generated code is a lot better: ... movaps -24(%ebp), %xmm0 mulps -40(%ebp), %xmm0 movaps %xmm0, -56(%ebp) ... Uros.
I think this is related to PR 18557.
Fixed on the mainline: _Z6foobarv: .LFB2: pushl %ebp .LCFI0: movl %esp, %ebp .LCFI1: subl $56, %esp .LCFI2: movaps -40(%ebp), %xmm0 mulps -24(%ebp), %xmm0 movaps %xmm0, -56(%ebp) fldz fadds -56(%ebp) fadds -52(%ebp) fadds -48(%ebp) fadds -44(%ebp) leave ret