This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/25621] Missed optimization when unrolling the loop (splitting up the sum) (only with -ffast-math)
- From: "jv244 at cam dot ac dot uk" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 4 Jul 2007 09:23:42 -0000
- Subject: [Bug tree-optimization/25621] Missed optimization when unrolling the loop (splitting up the sum) (only with -ffast-math)
- References: <bug-25621-6642@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #6 from jv244 at cam dot ac dot uk 2007-07-04 09:23 -------
(In reply to comment #5)
> You can also try to tune --param max-variable-expansions-in-unroller. The
> default is to add one expansion (which seems to be the most helpful due to the
> fact that adding more expansions can increase register pressure).
>
there seems to be no effect from --param max-variable-expansions-in-unroller, I
get the same timings for all values.
I do notice that ifort is twice as fast as gfortran on the original loop on my
machine (core2):
> gfortran -O3 -ffast-math -ftree-vectorize -march=native -funroll-loops -fvariable-expansion-in-unroller --param max-variable-expansions-in-unroller=4 pr25621.f90
> ./a.out
default loop 0.868054000000000
hand optimized loop 0.864054000000000
> ifort -xT -O3 pr25621.f90
pr25621.f90(32) : (col. 0) remark: LOOP WAS VECTORIZED.
pr25621.f90(33) : (col. 0) remark: LOOP WAS VECTORIZED.
pr25621.f90(9) : (col. 2) remark: LOOP WAS VECTORIZED.
> ./a.out
default loop 0.440027000000000
hand optimized loop 0.876055000000000
and it looks like ifort vectorizes the first loop (whereas gfortran does not '
unsupported use in stmt'). As a reference :
> gfortran -O3 -ffast-math -ftree-vectorize -march=native -funroll-loops pr25621.f90
> ./a.out
default loop 1.29608100000000
hand optimized loop 0.860054000000000
the code actually used for testing is :
! simple loop
! assume N is even
SUBROUTINE S31(a,b,c,N)
IMPLICIT NONE
integer :: N
real*8 :: a(N),b(N),c
integer :: i
c=0.0D0
DO i=1,N
c=c+a(i)*b(i)
ENDDO
END SUBROUTINE
! 'improved' loop
SUBROUTINE S32(a,b,c,N)
IMPLICIT NONE
integer :: N
real*8 :: a(N),b(N),c,tmp
integer :: i
c=0.0D0
tmp=0.0D0
DO i=1,N,2
c=c+a(i)*b(i)
tmp=tmp+a(i+1)*b(i+1)
ENDDO
c=c+tmp
END SUBROUTINE
integer, parameter :: N=1024
real*8 :: a(N),b(N),c,tmp,t1,t2
a=0.0_8
b=0.0_8
DO i=1,2000000
CALL S31(a,b,c,N)
ENDDO
CALL CPU_TIME(t1)
DO i=1,1000000
CALL S31(a,b,c,N)
ENDDO
CALL CPU_TIME(t2)
write(6,*) "default loop", t2-t1
CALL CPU_TIME(t1)
DO i=1,1000000
CALL S32(a,b,c,N)
ENDDO
CALL CPU_TIME(t2)
write(6,*) "hand optimized loop",t2-t1
END
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621