[PATCH] New early loop unrolling pass

Richard Guenther rguenther@suse.de
Thu May 1 13:34:00 GMT 2008


On Thu, 1 May 2008, Dominique Dhumieres wrote:

> > Of course, the patch didn't change.  I don't consider this a show-stopper
> > as it obviously just exposes bugs in the vectorizer or its cost model.
> > There is plenty of time to address this during stage1/2 or ignore it.
> > 
> > It would be helpful if you could provide a reduced runtime testcase with
> > just one loop that shows this regression.
> 
> I am not sure the problem is a bug in the vectorizer, but rather than the 
> early unrolling is too agressive and prevent the vectorization of the 
> unrolled loop, as shown by the following reduced test:
> 
> integer, parameter :: n = 1000000
> integer  :: i, j, k
> real(8)  :: pi, sum1, sum2, theta, phi, sini, cosi, dotp
> real(8)  :: a(3), b(9,3), c(3)
> pi = acos(-1.0d0)
> theta = pi/9.0d0
> phi = pi/4.5d0
> do k = 1, 9
>    b(k,1) = 0.5d0*cos(k*phi)*sin(k*theta)
>    b(k,2) = 0.5d0*sin(k*phi)*sin(k*theta)
>    b(k,3) = 0.5d0*cos(k*theta)
> end do
> theta = pi/real(n,kind=8)
> sum2 = 0.0
> do i = 1, n
>     sini = sin(i*theta)
>     cosi = cos(i*theta)
>     phi = pi/4.5d0
>     sum1 = 0.0d0
>     do j = 1, 9
> 	c(1) = 0.5d0*cos(j*phi)*sini
> 	c(2) = 0.5d0*sin(j*phi)*sini
> 	c(3) = 0.5d0*cosi
> 	do k =1, 9
> !           a(1) = b(k,1) - c(1)
> !           a(2) = b(k,2) - c(2)
> !           a(3) = b(k,3) - c(3)
> 	   a = b(k,:) - c
> 	   dotp = a(1)*a(1) + a(2)*a(2) + a(3)*a(3)
> !           dotp = dot_product(a,a)
> 	   sum1 = sum1 +dotp
> 	end do
>     end do
>     sum2 = sum2 + sum1/81.0d0
> end do
> print *, 3.0d0*sum2/(4.0d0*pi*real(n,kind=8))
> end
> 
> [ibook-dhum] bug/timing% gfc -O3 -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorizer-verbose=2 test_vect.f90
> test_vect.f90:24: note: LOOP VECTORIZED.
> test_vect.f90:8: note: not vectorized: unsupported data-type complex(kind=8)
> test_vect.f90:1: note: vectorized 1 loops in function.
> 
> The 'k' loop is vectorized if one implicit loop is left inside (either
> "a = b(k,:) - c" or "dotp = dot_product(a,a)", but when these two implicit 
> loops are unrolled by hand, it seems that the 'k' loop is now unrolled
> preventing any vectorization:
> 
>      do k =1, 9
> 	a(1) = b(k,1) - c(1)
> 	a(2) = b(k,2) - c(2)
> 	a(3) = b(k,3) - c(3)
> !         a = b(k,:) - c
> 	dotp = a(1)*a(1) + a(2)*a(2) + a(3)*a(3)
> !         dotp = dot_product(a,a)
> 	sum1 = sum1 +dotp
>      end do
> 
> [ibook-dhum] bug/timing% gfc -O3 -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorizer-verbose=2 test_vect.f90
> test_vect.f90:20: note: not vectorized: unsupported data-type complex(kind=8)
> test_vect.f90:8: note: not vectorized: unsupported data-type complex(kind=8)
> test_vect.f90:1: note: vectorized 0 loops in function.

Thanks for the testcase!

> Am I correct to understand that the vectorizer operates on loops only?
> If yes, vectorizable loops should probably not unrolled (at least without
> care).

Yes, the vectorizer operates on loops only but also can vectorize
scalar code if it is inside a loop.  Thus the unroller does not
unroll the outermost loop (so it seems for your testcase the
loops over i and j are still preserved).  I'll have to analyze
what kind of vectorization is applied without unrolling and I
suspect a simple (hopefully ;)) missed-optimization with the vectorizer.
So ..

> Last question: should I continue to use pr34265, or close it and onpen a 
> new pr?

.. please open a new PR with the above testcase.

Thanks!
Richard.



More information about the Gcc-patches mailing list