This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Add versioning for constant strides for vectorization


On Sun, 25 Jan 2009, Dominique Dhumieres wrote:

> Richard,
> 
> > This patch adds the capability to the vectorizer to perform versioning
> > for the case of a constant (suitable) stride.
> 
> I have applied the patch on i686-apple-darwin9 (Core2 2.1Ghz, 4Mb cache, 
> 2Gb RAM). It regtested without regression. However the following test:
> 
> program mymatmul
>   implicit none
>   integer, parameter :: n = 2000
>   real, dimension(n,n) :: rr, ri
>   complex, dimension(n,n) :: a,b,c
>   real :: t1, t2
>   integer :: i, j, k
> 
>   call random_number (rr)
>   call random_number (ri)
>   a = cmplx (rr, ri)
>   call random_number (rr)
>   call random_number (ri)
>   b = cmplx (rr, ri)
> 
>   call cpu_time (t1)
> 
>   c = cmplx (0., 0.)
>   do j = 1, n
>      do k = 1, n
> 	do i = 1, n
> 	   c(i,j) = c(i,j) + a(i,k) * b(k,j)
> 	end do
>      end do
>   end do
> 
>   call cpu_time (t2)
>   write (*,'(F8.4)') t2-t1
> 
> end program mymatmul
> 
> did not vectorize:

We should be able to handle that I think.  Can you file a bugreport
please?  It should be the same as vectorizing simply

 j = random
 k = random
 tmp = b(k,j)
 do i = 1, n
   c(i,j) = c(i,j) + a(i,k) * tmp
 end do

it works with real data though.

> I can only report some timing with the polyhedron test suite:

Thanks.

> The timing shows a ~10% improvement for capacita.f90 compensated by a ~10% 
> degradation for fatigue.f90. All the other times are within the noise.
> 
> Thanks for the patch.
> 
> Dominique
> 
> PS Most of the time in capacita and tfft is spent in FFT subroutines that 
> are not vectorized. Anything that can be done to change that?

Not easily I guess.  From looking at SPEC 2006 tonto which I did recently
I noticed that GFortran inserts many temporaries for intrinsics, which
automatically makes vectorization harder (or at least no longer the
bottleneck).  Like for

  x = sum (a(:,i)*b(:,i))

where it puts a(:,i)*b(:,i) into a temporary array before doing the
reduction via libgfortran.  But well, this is something that should
be addressed by using middle-end arrays once I manage to spend some
time on that project again.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]