This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/25621] Missed optimization when unrolling the loop (splitting up the sum) (only with -ffast-math)

From: "jv244 at cam dot ac dot uk" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 4 Jul 2007 09:23:42 -0000
Subject: [Bug tree-optimization/25621] Missed optimization when unrolling the loop (splitting up the sum) (only with -ffast-math)
References: <bug-25621-6642@http.gcc.gnu.org/bugzilla/>
Reply-to: gcc-bugzilla at gcc dot gnu dot org


------- Comment #6 from jv244 at cam dot ac dot uk  2007-07-04 09:23 -------
(In reply to comment #5)
> You can also try to tune --param max-variable-expansions-in-unroller. The
> default is to add one expansion (which seems to be the most helpful due to the
> fact that adding more expansions can increase register pressure).
> 

there seems to be no effect from --param max-variable-expansions-in-unroller, I
get the same timings for all values.

I do notice that ifort is twice as fast as gfortran on the original loop on my
machine (core2):

> gfortran -O3 -ffast-math -ftree-vectorize -march=native  -funroll-loops -fvariable-expansion-in-unroller --param max-variable-expansions-in-unroller=4 pr25621.f90
> ./a.out
 default loop  0.868054000000000
 hand optimized loop  0.864054000000000

> ifort -xT -O3 pr25621.f90
pr25621.f90(32) : (col. 0) remark: LOOP WAS VECTORIZED.
pr25621.f90(33) : (col. 0) remark: LOOP WAS VECTORIZED.
pr25621.f90(9) : (col. 2) remark: LOOP WAS VECTORIZED.
> ./a.out
 default loop  0.440027000000000
 hand optimized loop  0.876055000000000

and it looks like ifort vectorizes the first loop (whereas gfortran does not '
unsupported use in stmt'). As a reference :

> gfortran -O3 -ffast-math -ftree-vectorize -march=native  -funroll-loops pr25621.f90
> ./a.out
 default loop   1.29608100000000
 hand optimized loop  0.860054000000000

the code actually used for testing is :

! simple loop
! assume N is even
SUBROUTINE S31(a,b,c,N)
 IMPLICIT NONE
 integer :: N
 real*8  :: a(N),b(N),c
 integer :: i
 c=0.0D0
 DO i=1,N
   c=c+a(i)*b(i)
 ENDDO
END SUBROUTINE

! 'improved' loop
SUBROUTINE S32(a,b,c,N)
 IMPLICIT NONE
 integer :: N
 real*8  :: a(N),b(N),c,tmp
 integer :: i
 c=0.0D0
 tmp=0.0D0
 DO i=1,N,2
    c=c+a(i)*b(i)
    tmp=tmp+a(i+1)*b(i+1)
 ENDDO
 c=c+tmp
END SUBROUTINE

integer, parameter :: N=1024
real*8  :: a(N),b(N),c,tmp,t1,t2

a=0.0_8
b=0.0_8
DO i=1,2000000
   CALL S31(a,b,c,N)
ENDDO

CALL CPU_TIME(t1)
DO i=1,1000000
   CALL S31(a,b,c,N)
ENDDO
CALL CPU_TIME(t2)
write(6,*) "default loop", t2-t1
CALL CPU_TIME(t1)
DO i=1,1000000
   CALL S32(a,b,c,N)
ENDDO
CALL CPU_TIME(t2)
write(6,*) "hand optimized loop",t2-t1
END





-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]