When scheduling an additional DSE pass the bwaves kernel (gfortran.dg/pr81303.f) is no longer interchanged because prepare_perfect_loop_nest tries to use a too outer loop where we fail to compute access functions for the data references. If artificially restricting the nest to only cover the two innermost loops the interchange still works. A modified testcase simulating the effect of the DSE is provided below: ! { dg-options "-O3 -ffast-math -floop-interchange -fdump-tree-linterchange-details" } subroutine mat_times_vec(y,x,a,axp,ayp,azp,axm,aym,azm, $ nb,nx,ny,nz) implicit none integer nb,nx,ny,nz,i,j,k,m,l,kit,im1,ip1,jm1,jp1,km1,kp1 real*8 y(nb,nx,ny,nz),x(nb,nx,ny,nz),tem real*8 a(nb,nb,nx,ny,nz), 1 axp(nb,nb,nx,ny,nz),ayp(nb,nb,nx,ny,nz),azp(nb,nb,nx,ny,nz), 2 axm(nb,nb,nx,ny,nz),aym(nb,nb,nx,ny,nz),azm(nb,nb,nx,ny,nz) do k=1,nz km1=mod(k+nz-2,nz)+1 kp1=mod(k,nz)+1 do j=1,ny jm1=mod(j+ny-2,ny)+1 jp1=mod(j,ny)+1 do i=1,nx im1=mod(i+nx-2,nx)+1 ip1=mod(i,nx)+1 do l=1,nb tem=0.0 do m=1,nb tem=tem+ 1 a(l,m,i,j,k)*x(m,i,j,k)+ 2 axp(l,m,i,j,k)*x(m,ip1,j,k)+ 3 ayp(l,m,i,j,k)*x(m,i,jp1,k)+ 4 azp(l,m,i,j,k)*x(m,i,j,kp1)+ 5 axm(l,m,i,j,k)*x(m,im1,j,k)+ 6 aym(l,m,i,j,k)*x(m,i,jm1,k)+ 7 azm(l,m,i,j,k)*x(m,i,j,km1) enddo y(l,i,j,k)=tem enddo enddo enddo enddo return end ! { dg-final { scan-tree-dump-times "is interchanged" 1 "linterchange" } }
prepare_perfect_loop_nest fails doing compute_access_strides and I wonder if we should do prepare_data_references data-ref analysis with respect to the BBs containing loop and only after collecting all datarefs instantiate the access fns up to the outermost loop possible. I don't remember exactly how we deal with this but it looks like compute_access_stride doesn't look at the access function as computed by dataref analysis but instead it analyzes things itself and could in theory instantiate_scev one-loop-at-a-time communicating "failure" upwards?
Created attachment 50523 [details] DSE patch For reference this is the patch adding an additional DSE pass which fails the existing gfortran.dg/pr81303.f and it's vect/ variant.
Sth as simple (and brute-force) as the following fixes this. Somehow SCEV must already know the "point of failure" though and eventually always instantiating from loop to loop_nest in steps might be more efficient than trying all possibilities on failure (or maybe we should do that on failure). diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc index f45b9364644..215587a8406 100644 --- a/gcc/gimple-loop-interchange.cc +++ b/gcc/gimple-loop-interchange.cc @@ -1320,7 +1320,14 @@ compute_access_stride (class loop *loop_nest, class loop *loop, } tree scev_base = build_fold_addr_expr (ref); tree scev = analyze_scalar_evolution (loop, scev_base); - scev = instantiate_scev (loop_preheader_edge (loop_nest), loop, scev); + tree orig_scev = scev; + scev = instantiate_scev (loop_preheader_edge (loop_nest), loop, orig_scev); + while (chrec_contains_undetermined (scev) && loop_nest != loop) + { + loop_nest = loop_nest->inner; + scev = instantiate_scev (loop_preheader_edge (loop_nest), + loop, orig_scev); + } if (! chrec_contains_undetermined (scev)) { tree sl = scev;
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:6ff66d1ea48960fe96bb51a750c01135e65fe452 commit r12-125-g6ff66d1ea48960fe96bb51a750c01135e65fe452 Author: Richard Biener <rguenther@suse.de> Date: Wed Apr 7 14:53:40 2021 +0200 tree-optimization/99956 - improve loop interchange When we apply store motion and DSE manually to the bwaves kernel in gfortran.dg/pr81303.f loop interchange no longer happens because the perfect nest considered covers outer loops we cannot analyze strides for. The following compensates for this by shrinking the nest in this analysis which was already possible but on a too coarse granularity. It shares the shrinked nest with the rest of the DRs so the complexity overhead should be negligible. 2021-04-07 Richard Biener <rguenther@suse.de> PR tree-optimization/99956 * gimple-loop-interchange.cc (compute_access_stride): Try instantiating the access in a shallower loop nest if instantiating failed. (compute_access_strides): Pass adjustable loop_nest to compute_access_stride. * gfortran.dg/pr99956.f: New testcase.
Fixed for GCC 12.