Bug 29648 - Inlining only done for contained procedures
Summary: Inlining only done for contained procedures
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2006-10-30 11:42 UTC by Philippe Schaffnit
Modified: 2009-05-02 18:05 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-10-31 05:34:33


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Schaffnit 2006-10-30 11:42:30 UTC
As can be found at http://gcc.gnu.org/ml/fortran/2005-07/msg00286.html

GFortran doesn't do inlining: unfortunately I cannot do it, and I don't know how hard this would be, but this would certainly help a lot with several codes I know...
Comment 1 Paul Thomas 2006-10-31 05:34:33 UTC
As the link says, inlining is implemented for contained procedures.  I have changed the summary to reflect this.

Comment 2 Philippe Schaffnit 2006-10-31 09:13:16 UTC
Right!

Thanks!
Comment 3 Bud Davis 2007-12-01 02:24:34 UTC
in case someone does not know what a contained procedure is (i sure didn't without getting out the Metcalf and Reid book), below is an example:

       program fred
       integer j
       j = 0
       call a(j)
       print*,j
       contains
       subroutine a(i)
       integer i
       i = i +2
       end subroutine a
       end program fred

compile it with no optimization and look at the assembler code, then compile it with -O2, the subroutine is inlined by gfortran.

not really relevant to this PR, but something I learned and verified, which might be of use.

--bud
Comment 4 Dominique d'Humieres 2007-12-03 07:45:23 UTC
The code in comment #3 is indeed inlined, but some cases are not. For instance if you compile the polyhedron test 'channel' with -O3 -ffast-math -funroll-loops and grep for _ddx, you get:

_ddx.837:
        call    _ddx.837
        call    _ddx.837

If you apply the following patch to channel.f90, i.e., do the inlining yourself,

--- channel.f90 2005-10-11 22:53:32.000000000 +0200
+++ chan.v2.f90 2007-11-29 21:30:25.000000000 +0100
@@ -145,10 +145,22 @@
 
     ! ------ interior calculations ------ !
 
-    dudx = ddx(u(:,:,mid))
-    dvdy = ddy(v(:,:,mid))
-    dhdx = ddx(h(:,:,mid))
-    dhdy = ddy(h(:,:,mid))
+    dudx(2:M-1,:) = u(3:M,: ,mid)-u(1:M-2,: ,mid)    ! interior points
+    dudx(1,:) = 2*(u(2,: ,mid)-u(  1,: ,mid))
+    dudx(M,:) = 2*(u(M,: ,mid)-u(M-1,: ,mid))
+    
+    dvdy(:,2:N-1) = v(:,3:N ,mid)-v(:,1:N-2 ,mid)    ! interior points
+    dvdy(:,1) = 2*(v(:,2 ,mid)-v(:,  1 ,mid))
+    dvdy(:,N) = 2*(v(:,N ,mid)-v(:,N-1 ,mid))
+
+    dhdx(2:M-1,:) = h(3:M,: ,mid)-h(1:M-2,: ,mid)    ! interior points
+    dhdx(1,:) = 2*(h(2,: ,mid)-h(  1,: ,mid))
+    dhdx(M,:) = 2*(h(M,: ,mid)-h(M-1,: ,mid))
+
+    dhdy(:,2:N-1) = h(:,3:N ,mid)-h(:,1:N-2 ,mid)    ! interior points
+    dhdy(:,1) = 2*(h(:,2 ,mid)-h(:,  1 ,mid))
+    dhdy(:,N) = 2*(h(:,N ,mid)-h(:,N-1 ,mid))
+
 
     u(2:M-1,1:N,new) = u(2:M-1,1:N,old) &               ! interior u points
         +2.d0*dt*f(2:M-1,1:N)*v(2:M-1,1:N,mid) &
@@ -234,38 +246,6 @@
                 0.5*(v(i,j,mid)+v(i,j-1,mid))
 
 !------------------------------------------------------------
-contains
-!------------------------------------------------------------
-    function ddx(array)
-    implicit double precision (a-h,o-z)
-    double precision::          array(:,:)
-    double precision::          ddx(size(array,dim=1),size(array,dim=2))
-
-    I = size(array,dim=1)
-    J = size(array,dim=2)
-
-    ddx(2:I-1,1:J) = array(3:I,1:J)-array(1:I-2,1:J)    ! interior points
-
-    ddx(1,1:J) = 2*(array(2,1:J)-array(  1,1:J))
-    ddx(I,1:J) = 2*(array(I,1:J)-array(I-1,1:J))
-
-    end function ddx
-
-    function ddy(array)
-    implicit double precision (a-h,o-z)
-    double precision::          array(:,:)
-    double precision::          ddy(size(array,dim=1),size(array,dim=2))
-
-    I = size(array,dim=1)
-    J = size(array,dim=2)
-
-    ddy(1:I,2:J-1) = array(1:I,3:J)-array(1:I,1:J-2)    ! interior points
-
-    ddy(1:I,1) = 2*(array(1:I,2)-array(1:I,  1))
-    ddy(1:I,J) = 2*(array(1:I,J)-array(1:I,J-1))
-
-    end function ddy
-!------------------------------------------------------------
 end program sw
 
 !------------------------------------------------------------

the timing on an Intel Core2Duo 2.16Ghz goes from 4s to 2.2s.

So my question is: what are the rules applied by GCC for the inlining? I understand that with -Os, one rule is that inlining must not increase the code size, but what happened in the case of channel.f90 with -O3?


Comment 5 Steven Bosscher 2007-12-03 10:13:16 UTC
Inlining is driven by heuristics.  See ipa-inline.c.  Heuristics cannot be perfect for all applications, of course.  The current tuning of the heuristics is based on SPEC2k scores on Opteron, i.e. mostly for programs written in C and C++.  Maybe for Fortran the current heuristics do not lead to the best possible results.
Comment 6 Dominique d'Humieres 2007-12-03 12:41:15 UTC
Form the gcc manual:

-finline-limit=n
... . The default value of n is 600.  ...

This does not seem accurate: ddx and ddy are inlined for n=318, but not for n=317 (corresponding respectively to 2.7s and 1.6s for the compile time and 2.5s and 4.1s for the execution time).
Note that with the default settings the compile and execution time are 1.6s and 4.1s. For the patched source they are respectively 1.7s and 2.2s.

Comment 7 Francois-Xavier Coudert 2009-05-02 14:31:08 UTC
It's now working with -fwhole-file.