[PATCH] New early loop unrolling pass
Richard Guenther
rguenther@suse.de
Thu May 1 13:34:00 GMT 2008
On Thu, 1 May 2008, Dominique Dhumieres wrote:
> > Of course, the patch didn't change. I don't consider this a show-stopper
> > as it obviously just exposes bugs in the vectorizer or its cost model.
> > There is plenty of time to address this during stage1/2 or ignore it.
> >
> > It would be helpful if you could provide a reduced runtime testcase with
> > just one loop that shows this regression.
>
> I am not sure the problem is a bug in the vectorizer, but rather than the
> early unrolling is too agressive and prevent the vectorization of the
> unrolled loop, as shown by the following reduced test:
>
> integer, parameter :: n = 1000000
> integer :: i, j, k
> real(8) :: pi, sum1, sum2, theta, phi, sini, cosi, dotp
> real(8) :: a(3), b(9,3), c(3)
> pi = acos(-1.0d0)
> theta = pi/9.0d0
> phi = pi/4.5d0
> do k = 1, 9
> b(k,1) = 0.5d0*cos(k*phi)*sin(k*theta)
> b(k,2) = 0.5d0*sin(k*phi)*sin(k*theta)
> b(k,3) = 0.5d0*cos(k*theta)
> end do
> theta = pi/real(n,kind=8)
> sum2 = 0.0
> do i = 1, n
> sini = sin(i*theta)
> cosi = cos(i*theta)
> phi = pi/4.5d0
> sum1 = 0.0d0
> do j = 1, 9
> c(1) = 0.5d0*cos(j*phi)*sini
> c(2) = 0.5d0*sin(j*phi)*sini
> c(3) = 0.5d0*cosi
> do k =1, 9
> ! a(1) = b(k,1) - c(1)
> ! a(2) = b(k,2) - c(2)
> ! a(3) = b(k,3) - c(3)
> a = b(k,:) - c
> dotp = a(1)*a(1) + a(2)*a(2) + a(3)*a(3)
> ! dotp = dot_product(a,a)
> sum1 = sum1 +dotp
> end do
> end do
> sum2 = sum2 + sum1/81.0d0
> end do
> print *, 3.0d0*sum2/(4.0d0*pi*real(n,kind=8))
> end
>
> [ibook-dhum] bug/timing% gfc -O3 -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorizer-verbose=2 test_vect.f90
> test_vect.f90:24: note: LOOP VECTORIZED.
> test_vect.f90:8: note: not vectorized: unsupported data-type complex(kind=8)
> test_vect.f90:1: note: vectorized 1 loops in function.
>
> The 'k' loop is vectorized if one implicit loop is left inside (either
> "a = b(k,:) - c" or "dotp = dot_product(a,a)", but when these two implicit
> loops are unrolled by hand, it seems that the 'k' loop is now unrolled
> preventing any vectorization:
>
> do k =1, 9
> a(1) = b(k,1) - c(1)
> a(2) = b(k,2) - c(2)
> a(3) = b(k,3) - c(3)
> ! a = b(k,:) - c
> dotp = a(1)*a(1) + a(2)*a(2) + a(3)*a(3)
> ! dotp = dot_product(a,a)
> sum1 = sum1 +dotp
> end do
>
> [ibook-dhum] bug/timing% gfc -O3 -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorizer-verbose=2 test_vect.f90
> test_vect.f90:20: note: not vectorized: unsupported data-type complex(kind=8)
> test_vect.f90:8: note: not vectorized: unsupported data-type complex(kind=8)
> test_vect.f90:1: note: vectorized 0 loops in function.
Thanks for the testcase!
> Am I correct to understand that the vectorizer operates on loops only?
> If yes, vectorizable loops should probably not unrolled (at least without
> care).
Yes, the vectorizer operates on loops only but also can vectorize
scalar code if it is inside a loop. Thus the unroller does not
unroll the outermost loop (so it seems for your testcase the
loops over i and j are still preserved). I'll have to analyze
what kind of vectorization is applied without unrolling and I
suspect a simple (hopefully ;)) missed-optimization with the vectorizer.
So ..
> Last question: should I continue to use pr34265, or close it and onpen a
> new pr?
.. please open a new PR with the above testcase.
Thanks!
Richard.
More information about the Gcc-patches
mailing list