Bug 99956 - loop interchange fails when altering bwaves inner loop
Summary: loop interchange fails when altering bwaves inner loop
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: 12.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2021-04-07 12:21 UTC by Richard Biener
Modified: 2024-11-26 04:55 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 12.0
Known to fail:
Last reconfirmed: 2021-04-07 00:00:00


Attachments
DSE patch (1.73 KB, patch)
2021-04-07 12:29 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2021-04-07 12:21:05 UTC
When scheduling an additional DSE pass the bwaves kernel (gfortran.dg/pr81303.f)
is no longer interchanged because prepare_perfect_loop_nest tries to use
a too outer loop where we fail to compute access functions for the data references.  If artificially restricting the nest to only cover the two innermost
loops the interchange still works.

A modified testcase simulating the effect of the DSE is provided below:

! { dg-options "-O3 -ffast-math -floop-interchange -fdump-tree-linterchange-details" }

        subroutine mat_times_vec(y,x,a,axp,ayp,azp,axm,aym,azm,
     $  nb,nx,ny,nz)
        implicit none
        integer nb,nx,ny,nz,i,j,k,m,l,kit,im1,ip1,jm1,jp1,km1,kp1

        real*8 y(nb,nx,ny,nz),x(nb,nx,ny,nz),tem

        real*8 a(nb,nb,nx,ny,nz),
     1  axp(nb,nb,nx,ny,nz),ayp(nb,nb,nx,ny,nz),azp(nb,nb,nx,ny,nz),
     2  axm(nb,nb,nx,ny,nz),aym(nb,nb,nx,ny,nz),azm(nb,nb,nx,ny,nz)


      do k=1,nz
         km1=mod(k+nz-2,nz)+1
         kp1=mod(k,nz)+1
         do j=1,ny
            jm1=mod(j+ny-2,ny)+1
            jp1=mod(j,ny)+1
            do i=1,nx
               im1=mod(i+nx-2,nx)+1
               ip1=mod(i,nx)+1
               do l=1,nb
                  tem=0.0
                  do m=1,nb
                     tem=tem+
     1               a(l,m,i,j,k)*x(m,i,j,k)+
     2               axp(l,m,i,j,k)*x(m,ip1,j,k)+
     3               ayp(l,m,i,j,k)*x(m,i,jp1,k)+
     4               azp(l,m,i,j,k)*x(m,i,j,kp1)+
     5               axm(l,m,i,j,k)*x(m,im1,j,k)+
     6               aym(l,m,i,j,k)*x(m,i,jm1,k)+
     7               azm(l,m,i,j,k)*x(m,i,j,km1)
                  enddo
                  y(l,i,j,k)=tem
               enddo
            enddo
         enddo
        enddo
        return
        end

! { dg-final { scan-tree-dump-times "is interchanged" 1 "linterchange" } }
Comment 1 Richard Biener 2021-04-07 12:28:24 UTC
prepare_perfect_loop_nest fails doing compute_access_strides and I wonder if
we should do prepare_data_references data-ref analysis with respect to the
BBs containing loop and only after collecting all datarefs instantiate
the access fns up to the outermost loop possible.

I don't remember exactly how we deal with this but it looks like
compute_access_stride doesn't look at the access function as computed by
dataref analysis but instead it analyzes things itself and could in theory
instantiate_scev one-loop-at-a-time communicating "failure" upwards?
Comment 2 Richard Biener 2021-04-07 12:29:09 UTC
Created attachment 50523 [details]
DSE patch

For reference this is the patch adding an additional DSE pass which fails the
existing gfortran.dg/pr81303.f and it's vect/ variant.
Comment 3 Richard Biener 2021-04-07 12:46:25 UTC
Sth as simple (and brute-force) as the following fixes this.  Somehow SCEV
must already know the "point of failure" though and eventually always
instantiating from loop to loop_nest in steps might be more efficient than
trying all possibilities on failure (or maybe we should do that on failure).

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index f45b9364644..215587a8406 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -1320,7 +1320,14 @@ compute_access_stride (class loop *loop_nest, class loop *loop,
     }
   tree scev_base = build_fold_addr_expr (ref);
   tree scev = analyze_scalar_evolution (loop, scev_base);
-  scev = instantiate_scev (loop_preheader_edge (loop_nest), loop, scev);
+  tree orig_scev = scev;
+  scev = instantiate_scev (loop_preheader_edge (loop_nest), loop, orig_scev);
+  while (chrec_contains_undetermined (scev) && loop_nest != loop)
+    {
+      loop_nest = loop_nest->inner;
+      scev = instantiate_scev (loop_preheader_edge (loop_nest),
+                              loop, orig_scev);
+    }
   if (! chrec_contains_undetermined (scev))
     {
       tree sl = scev;
Comment 4 GCC Commits 2021-04-26 11:58:34 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:6ff66d1ea48960fe96bb51a750c01135e65fe452

commit r12-125-g6ff66d1ea48960fe96bb51a750c01135e65fe452
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Apr 7 14:53:40 2021 +0200

    tree-optimization/99956 - improve loop interchange
    
    When we apply store motion and DSE manually to the bwaves kernel
    in gfortran.dg/pr81303.f loop interchange no longer happens because
    the perfect nest considered covers outer loops we cannot analyze
    strides for.  The following compensates for this by shrinking the
    nest in this analysis which was already possible but on a too coarse
    granularity.  It shares the shrinked nest with the rest of the DRs
    so the complexity overhead should be negligible.
    
    2021-04-07  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/99956
            * gimple-loop-interchange.cc (compute_access_stride):
            Try instantiating the access in a shallower loop nest
            if instantiating failed.
            (compute_access_strides): Pass adjustable loop_nest
            to compute_access_stride.
    
            * gfortran.dg/pr99956.f: New testcase.
Comment 5 Richard Biener 2021-04-26 11:59:38 UTC
Fixed for GCC 12.