[Bug tree-optimization/61000] New: No loop interchange for inner loop along the slow index
dominiq at lps dot ens.fr
gcc-bugzilla@gcc.gnu.org
Tue Apr 29 13:00:00 GMT 2014
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000
Bug ID: 61000
Summary: No loop interchange for inner loop along the slow
index
Product: gcc
Version: 4.10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: dominiq at lps dot ens.fr
CC: grosser at gcc dot gnu.org, mircea.namolaru at inria dot fr
Graphite is unable to do the loop interchange when the inner loop is along the
slow index of an array:
[Book15] Fortran/omp_tst% cat loop.f90
module parms
implicit none
private
integer, parameter, public :: num = 1024
integer, parameter, public :: dp = kind(0.0d0)
end module parms
program loops
use parms
implicit none
real(kind=dp), dimension(:, :), &
allocatable :: a, c
integer :: i, j, k, n_iter=100
integer(8) :: start, finish, counts
allocate(a(num,num),c(num,num))
call random_number(a)
c = 0
call system_clock (start, counts)
do k=1,n_iter
do i=1,num
! c(i,1) = 0.5*(a(i,2) - a(i,num))
! c(i,num) = 0.5*(a(i,1) - a(i,num-1))
do j=2,num-1
c(i,j) = 0.5*(a(i,j+1) - a(i,j-1))
end do
end do
end do
call system_clock (finish)
print *, sum(abs(c)) ! To ensure computation
print *, "Elapsed time =" ,&
(finish - start) / real(counts, kind=8), "seconds"
c = 0
call system_clock (start, counts)
do k=1,n_iter
! do i=1,num
! c(i,1) = 0.5*(a(i,2) - a(i,num))
! c(i,num) = 0.5*(a(i,1) - a(i,num-1))
! end do
do j=2,num-1
do i=1,num
c(i,j) = 0.5*(a(i,j+1) - a(i,j-1))
end do
end do
end do
call system_clock (finish)
print *, sum(abs(c)) ! To ensure computation
print *, "Elapsed time =" ,&
(finish - start) / real(counts, kind=8), "seconds"
end program loops
[Book15] Fortran/omp_tst% gfc -Ofast -floop-interchange loop.f90
[Book15] Fortran/omp_tst% time a.out
174350.51293227341
Elapsed time = 2.1943769999999998 seconds
174350.51293227341
Elapsed time = 0.14006299999999999 seconds
2.347u 0.011s 0:02.36 99.5% 0+0k 0+0io 30pf+0w
This may be a duplicate of pr36011, but the timings are not affected by adding
-fno-tree-pre -fno-tree-loop-im. Note that gcc with -floop-interchange is able
to optimize the matrix product (see pr14741 and pr60997).
This also affects the polyhedron test air.f90. With the following patch
--- air.f90 2009-08-28 14:22:26.000000000 +0200
+++ air_va.f90 2014-04-19 13:10:44.000000000 +0200
@@ -400,8 +400,8 @@
!
! COMPUTE THE FLUX TERMS
!
- DO i = 1 , MXPx
- DO j = 1 , MXPy
+ DO j = 1 , MXPy
+ DO i = 1 , MXPx
!
! compute vanleer fluxes
!
@@ -657,8 +657,8 @@
ENDDO
!
! COMPUTE THE FLUX TERMS
- DO i = 1 , MXPx
- DO j = 1 , MXPy
+ DO j = 1 , MXPy
+ DO i = 1 , MXPx
!
! compute vanleer fluxes
!
the execution time goes from 3.2s to 2.7s (with -Ofast, with/without
-floop-interchange).
More information about the Gcc-bugs
mailing list