[PATCH] New early loop unrolling pass
Dominique Dhumieres
dominiq@lps.ens.fr
Thu May 1 13:29:00 GMT 2008
> Of course, the patch didn't change. I don't consider this a show-stopper
> as it obviously just exposes bugs in the vectorizer or its cost model.
> There is plenty of time to address this during stage1/2 or ignore it.
>
> It would be helpful if you could provide a reduced runtime testcase with
> just one loop that shows this regression.
I am not sure the problem is a bug in the vectorizer, but rather than the
early unrolling is too agressive and prevent the vectorization of the
unrolled loop, as shown by the following reduced test:
integer, parameter :: n = 1000000
integer :: i, j, k
real(8) :: pi, sum1, sum2, theta, phi, sini, cosi, dotp
real(8) :: a(3), b(9,3), c(3)
pi = acos(-1.0d0)
theta = pi/9.0d0
phi = pi/4.5d0
do k = 1, 9
b(k,1) = 0.5d0*cos(k*phi)*sin(k*theta)
b(k,2) = 0.5d0*sin(k*phi)*sin(k*theta)
b(k,3) = 0.5d0*cos(k*theta)
end do
theta = pi/real(n,kind=8)
sum2 = 0.0
do i = 1, n
sini = sin(i*theta)
cosi = cos(i*theta)
phi = pi/4.5d0
sum1 = 0.0d0
do j = 1, 9
c(1) = 0.5d0*cos(j*phi)*sini
c(2) = 0.5d0*sin(j*phi)*sini
c(3) = 0.5d0*cosi
do k =1, 9
! a(1) = b(k,1) - c(1)
! a(2) = b(k,2) - c(2)
! a(3) = b(k,3) - c(3)
a = b(k,:) - c
dotp = a(1)*a(1) + a(2)*a(2) + a(3)*a(3)
! dotp = dot_product(a,a)
sum1 = sum1 +dotp
end do
end do
sum2 = sum2 + sum1/81.0d0
end do
print *, 3.0d0*sum2/(4.0d0*pi*real(n,kind=8))
end
[ibook-dhum] bug/timing% gfc -O3 -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorizer-verbose=2 test_vect.f90
test_vect.f90:24: note: LOOP VECTORIZED.
test_vect.f90:8: note: not vectorized: unsupported data-type complex(kind=8)
test_vect.f90:1: note: vectorized 1 loops in function.
The 'k' loop is vectorized if one implicit loop is left inside (either
"a = b(k,:) - c" or "dotp = dot_product(a,a)", but when these two implicit
loops are unrolled by hand, it seems that the 'k' loop is now unrolled
preventing any vectorization:
do k =1, 9
a(1) = b(k,1) - c(1)
a(2) = b(k,2) - c(2)
a(3) = b(k,3) - c(3)
! a = b(k,:) - c
dotp = a(1)*a(1) + a(2)*a(2) + a(3)*a(3)
! dotp = dot_product(a,a)
sum1 = sum1 +dotp
end do
[ibook-dhum] bug/timing% gfc -O3 -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorizer-verbose=2 test_vect.f90
test_vect.f90:20: note: not vectorized: unsupported data-type complex(kind=8)
test_vect.f90:8: note: not vectorized: unsupported data-type complex(kind=8)
test_vect.f90:1: note: vectorized 0 loops in function.
Am I correct to understand that the vectorizer operates on loops only?
If yes, vectorizable loops should probably not unrolled (at least without
care).
Last question: should I continue to use pr34265, or close it and onpen a
new pr?
Cheers
Dominique
More information about the Gcc-patches
mailing list