[PATCH][RFC] Add versioning for constant strides for vectorization
Ira Rosen
IRAR@il.ibm.com
Sun Jan 25 14:08:00 GMT 2009
gcc-patches-owner@gcc.gnu.org wrote on 23/01/2009 18:08:43:
>
> This patch adds the capability to the vectorizer to perform versioning
> for the case of a constant (suitable) stride. For example for
>
> subroutine to_product_of(self,a,b,a1,a2)
> complex(kind=8) :: self (:)
> complex(kind=8), intent(in) :: a(:,:)
> complex(kind=8), intent(in) :: b(:)
> integer a1,a2
> do i = 1,a1
> do j = 1,a2
> self(i) = self(i) + a(j,i)*b(j)
> end do
> end do
> end subroutine
>
> we can only apply vectorization if the strides of the fastest dimension
> of self, a and b are one (they are loaded from the passed array
> descriptors and thus appear as (loop invariant) variables).
>
> During the implementation of this I noticed that peeling for
> number of iterations (we have to unroll the above loop twice, and so
> for an odd number of iterations have a epilogue loop for the remaining
> iteration(s)) does not play well with versioning and we end up
> vectorizing the wrong loop. So I just disabled versioning if we
> apply peeling with an epilogue loop and instead attach the versioning
> condition to the pre-condition of the main loop that skips directly
> to the epilogue if the number of iterations is too small. We obviously
> can use the epilogue loop as the non-vectorized version.
>
> This patch also inserts an extra copyprop and dce pass before the
> vectorizer so it can recognize the reduction in the above testcase
> (LIM has made that reduction non-obvious). So I noticed that
> copyprop does not preserve loop-closed SSA form and fixed that as well.
>
> Some earlier version bootstrapped and tested ok on
> x86_64-unknown-linux-gnu, a final attempt is still running.
>
> I didn't yet performance test this extensively, but it might need
> cost-model adjustments and/or need to wait until we have profile
> feedback to properly seed vectorizer analysis here. A micro-benchmark
> based on the above loop shows around 15% improvement on AMD K10.
>
> Feedback (and ppc testing) is still welcome of course.
Regtested on powerpc64-suse-linux..
>
> 2009-01-23 Richard Guenther <rguenther@suse.de>
>
>
> * gcc.dg/vect/fast-math-vect-complex-5.c: New testcase.
> * gfortran.dg/vect/fast-math-vect-complex-1.f90: Likewise.
> * gfortran.dg/vect/fast-math-vect-stride-1.f90: Likewise.
The Fortran testcases require vect_double as well.
The testcases get vectorized on powerpc64-suse-linux if the type is
_Complex float/complex(kind=4).
Thanks,
Ira
More information about the Gcc-patches
mailing list