This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RFC] Add versioning for constant strides for vectorization
- From: Richard Guenther <rguenther at suse dot de>
- To: Dominique Dhumieres <dominiq at lps dot ens dot fr>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Sun, 25 Jan 2009 13:57:04 +0100 (CET)
- Subject: Re: [PATCH][RFC] Add versioning for constant strides for vectorization
- References: <20090125113951.E436B3BABA@mailhost.lps.ens.fr>
On Sun, 25 Jan 2009, Dominique Dhumieres wrote:
> Richard,
>
> > This patch adds the capability to the vectorizer to perform versioning
> > for the case of a constant (suitable) stride.
>
> I have applied the patch on i686-apple-darwin9 (Core2 2.1Ghz, 4Mb cache,
> 2Gb RAM). It regtested without regression. However the following test:
>
> program mymatmul
> implicit none
> integer, parameter :: n = 2000
> real, dimension(n,n) :: rr, ri
> complex, dimension(n,n) :: a,b,c
> real :: t1, t2
> integer :: i, j, k
>
> call random_number (rr)
> call random_number (ri)
> a = cmplx (rr, ri)
> call random_number (rr)
> call random_number (ri)
> b = cmplx (rr, ri)
>
> call cpu_time (t1)
>
> c = cmplx (0., 0.)
> do j = 1, n
> do k = 1, n
> do i = 1, n
> c(i,j) = c(i,j) + a(i,k) * b(k,j)
> end do
> end do
> end do
>
> call cpu_time (t2)
> write (*,'(F8.4)') t2-t1
>
> end program mymatmul
>
> did not vectorize:
We should be able to handle that I think. Can you file a bugreport
please? It should be the same as vectorizing simply
j = random
k = random
tmp = b(k,j)
do i = 1, n
c(i,j) = c(i,j) + a(i,k) * tmp
end do
it works with real data though.
> I can only report some timing with the polyhedron test suite:
Thanks.
> The timing shows a ~10% improvement for capacita.f90 compensated by a ~10%
> degradation for fatigue.f90. All the other times are within the noise.
>
> Thanks for the patch.
>
> Dominique
>
> PS Most of the time in capacita and tfft is spent in FFT subroutines that
> are not vectorized. Anything that can be done to change that?
Not easily I guess. From looking at SPEC 2006 tonto which I did recently
I noticed that GFortran inserts many temporaries for intrinsics, which
automatically makes vectorization harder (or at least no longer the
bottleneck). Like for
x = sum (a(:,i)*b(:,i))
where it puts a(:,i)*b(:,i) into a temporary array before doing the
reduction via libgfortran. But well, this is something that should
be addressed by using middle-end arrays once I manage to spend some
time on that project again.
Richard.