[Bug tree-optimization/61000] New: No loop interchange for inner loop along the slow index

Tue Apr 29 13:00:00 GMT 2014

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000

            Bug ID: 61000
           Summary: No loop interchange for inner loop along the slow
                    index
           Product: gcc
           Version: 4.10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dominiq at lps dot ens.fr
                CC: grosser at gcc dot gnu.org, mircea.namolaru at inria dot fr

Graphite is unable to do the loop interchange when the inner loop is along the
slow index of an array:

[Book15] Fortran/omp_tst% cat loop.f90
module parms
implicit none
private
integer, parameter, public :: num = 1024
integer, parameter, public :: dp = kind(0.0d0)

end module parms

program loops
use parms
implicit none
real(kind=dp), dimension(:, :), &
allocatable :: a, c
integer :: i, j, k, n_iter=100
integer(8) :: start, finish, counts
allocate(a(num,num),c(num,num))

call random_number(a)
c = 0
call system_clock (start, counts)
do k=1,n_iter
  do i=1,num
!    c(i,1) = 0.5*(a(i,2) - a(i,num))
!    c(i,num) = 0.5*(a(i,1) - a(i,num-1))
    do j=2,num-1
      c(i,j) = 0.5*(a(i,j+1) - a(i,j-1))
    end do
  end do
end do
call system_clock (finish)
print *, sum(abs(c)) ! To ensure computation
print *, "Elapsed time =" ,&
(finish - start) / real(counts, kind=8), "seconds"

c = 0
call system_clock (start, counts)
do k=1,n_iter
!  do i=1,num
!    c(i,1) = 0.5*(a(i,2) - a(i,num))
!    c(i,num) = 0.5*(a(i,1) - a(i,num-1))
!  end do
  do j=2,num-1
    do i=1,num
      c(i,j) = 0.5*(a(i,j+1) - a(i,j-1))
    end do
  end do
end do
call system_clock (finish)
print *, sum(abs(c)) ! To ensure computation
print *, "Elapsed time =" ,&
(finish - start) / real(counts, kind=8), "seconds"

end program loops
[Book15] Fortran/omp_tst% gfc -Ofast -floop-interchange loop.f90 
[Book15] Fortran/omp_tst% time a.out
   174350.51293227341     
 Elapsed time =   2.1943769999999998      seconds
   174350.51293227341     
 Elapsed time =  0.14006299999999999      seconds
2.347u 0.011s 0:02.36 99.5%    0+0k 0+0io 30pf+0w

This may be a duplicate of pr36011, but the timings are not affected by adding
-fno-tree-pre -fno-tree-loop-im. Note that gcc with -floop-interchange is able
to optimize the matrix product (see pr14741 and pr60997).

This also affects the polyhedron test air.f90. With the following patch

--- air.f90    2009-08-28 14:22:26.000000000 +0200
+++ air_va.f90    2014-04-19 13:10:44.000000000 +0200
@@ -400,8 +400,8 @@
 !
 ! COMPUTE THE FLUX TERMS
 !
-      DO i = 1 , MXPx
-         DO j = 1 , MXPy
+      DO j = 1 , MXPy
+         DO i = 1 , MXPx
 !
 ! compute vanleer fluxes
 !
@@ -657,8 +657,8 @@
       ENDDO
 !
 ! COMPUTE THE FLUX TERMS
-      DO i = 1 , MXPx
-         DO j = 1 , MXPy
+      DO j = 1 , MXPy
+         DO i = 1 , MXPx
 !
 ! compute vanleer fluxes
 !

the execution time goes from 3.2s to 2.7s (with -Ofast, with/without
-floop-interchange).