This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [Patch] C++ inlining heuristics changed (3.0.1)


On Wed, Aug 22, 2001 at 04:26:18PM +0200, Kurt Garloff wrote:
>  ^ (Max size of inlineable fn)
>  |
>  |----------------.                             <-- (b)
>  |                 .
>  |                  .
>  |                   .  slope =: (d)
>  |                    .
>  |                     .
>  |                      .
>  |                       ---------------------  <-- (c)
>  +------------------------------------------------->
>                                     (Size of inlined code:
> 		   ^		     Already inlined + current)
>                  |
> 		  (a)
> 
> This way, we would avoid the theoretical infinite inlining.
> I'll start to play with this a bit.
> I guess, a good starting value for (a) is 600, for (b) 300 and I'd choose
> the slope (d) to be 0.1, i.e. we will reach the inlining limit just before
> 3300 and only inline extremely small functions afterwards.
> 
> Does this make sense?
> Would a patch like this be acceptable for 3.0.2?

OK; I did a lot of benchmarking and
- left (a) at 600
- set (b) to (a)/2 (as before)
- set (c) to 130
- used a slope (d) of -1/16.

Some comments:
* Increasing (a) over 600 only increased compile time significantly
* Decreasing (a) below 500 started to yield worse results
* The optimum of 130 for (c) was found by a lot of compilations:
  Up (and including) to 130 (corresponding to -finline-limit=260), 
  the code size shrinked (!) when increasing this number. So, the
  smallest executables (on ix86) are produced for -finline-limit=260
  and this patch (see numbers below)
* If (b) is smaller than (c), just the (b) limit will apply
* Just to be really sure not to get into trouble (infinite inlining
  with following OOM), I also cut the minimum inline limit (c) after we
  reached 64 * (a).
* (d) has also been experimentally determined.

> Please compare to the numbers reported by myself in
> http://gcc.gnu.org/ml/gcc/2001-08/msg00919.html
>
> inline   compile     size      size       spec      spec
> limit    time(u)   double      cplx     double      cplx
>
>   200     1:38      71764     80365      0.814     0.517
>   400     2:07      82612     89308      0.776     0.562
>   600     2:41      88468    101444      0.870     0.846
>  1000     3:18      87788    103412      0.869     0.844
>  2000     3:49      89148    110732      0.873     0.847
>  5000     3:58      89148    109852      0.872     0.845
> 10000     3:55      89148    109852      0.872     0.846
>
> Just for comparison:
> 2.95.3    3:35      88364    123636      0.912     0.846
> 3.0.0     3:48      89488    109872      0.873     0.800
> 3.0.1     2:50      95620        -       0.320         -
>
> The results look very nice, IMHO.

Here the new results:
 
   200      1:38      71404     81796      0.877     0.519
   260      1:38      68636     73276      0.877     0.610
   300      1:55      77212     81460      0.877     0.651
   400      2:07      79108     82492      0.805     0.637
   500      2:28      84228     97924      0.870     0.839
   600      2:49      83772     98108      0.869     0.845
  1000      3:24      87348    104156      0.869     0.844
  2000      3:37      89148    109924      0.872     0.845
  
Those results look even better: Executables are slightly smaller and run at
slightly higher speed than with the first patch and much better than plain
3.0.1.

I'd like to have this patch included in 3.0.2. (Unless somebody can come
with an inliner for 3.0.2 that does not cut inlining at leaves but at the
trunk of the call tree.)

Complete patch (against 3.0.1) attached.

The copyright for this little contribution is herewith assigned to the FSF.
I did bootstrap the compiler on two machines (both ix86). If more
architectures are required, please tell me; I could check on some SuSE
machines ... 
OTOH, I have difficulties to imagine how this patch could break
bootstrapping, first because it's trivial code and second because it's 
C++ only.
make info and make dvi both succeeded.

Running the testuite on gum07:
garloff@gum07:/usr/src/gcc-2.96/i686-pc-linux-gnu $ make check    
[...]
WARNING: Couldn't find tool init file
Test Run By garloff on Wed Aug 22 21:32:07 2001
Native configuration is i686-pc-linux-gnu

                === libstdc++-v3 tests ===
		
Schedule of variations:
    unix
    
    Running target unix
    Using /usr/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/src/gcc-2.96/libstdc++-v3/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /usr/src/gcc-2.96/libstdc++-v3/testsuite/libstdc++-v3.dg/dg.exp ...
ERROR: tcl error sourcing
/usr/src/gcc-2.96/libstdc++-v3/testsuite/libstdc++-v3.dg/dg.exp.
ERROR: couldn't compile regular expression pattern: quantifier operand invalid
    while executing
"regsub "^$srcdir/?" $prog "" name"
    (procedure "dg-test" line 37)
    invoked from within
"dg-test $testcase $flags ${default-extra-flags}"
[...]

Hmmm, is this box too old?
		

Please apply!

Regards,
-- 
Kurt Garloff                   <kurt@garloff.de>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <garloff@suse.de>    [SuSE Nuernberg, DE]
 (See mail header or public key servers for PGP2 and GPG public keys.)
--- ./gcc/cp/optimize.c.orig	Thu Jun  7 00:50:22 2001
+++ ./gcc/cp/optimize.c	Wed Aug 22 20:20:26 2001
@@ -621,7 +621,23 @@
      inline_data *id;
 {
   int inlinable;
-
+  /* garloff@suse.de, 2001-08-22: The C++ inline throttling has
+   * bad side effects, sometimes, in our top-down inlining: we
+   * may end up inlining the trunk and not the leaves of the call tree,
+   * because we inlined too much before. Poor performance is the result.
+   * Real solution is bottom up inlining; here we just use a better
+   * heuristics: don't cut off inlining completely, but drop it off
+   * slowly. The further we are beyond the max limit the smaller the
+   * function needs to be to still get inlined.
+   */
+  int max_inline_single = MAX_INLINE_INSNS/2;
+  int max_inline_recursive = MAX_INLINE_INSNS;
+  /* Functions that small can always be inlined:
+   * We have to trade arg saving and the call and ret insns against 
+   * the function length itself, here.
+   */
+  int min_inline = 130;
+  
   /* If we've already decided this function shouldn't be inlined,
      there's no need to check again.  */
   if (DECL_UNINLINABLE (fn))
@@ -641,7 +657,7 @@
   else if (varargs_function_p (fn))
     ;
   /* We can't inline functions that are too big.  */
-  else if (DECL_NUM_STMTS (fn) * INSNS_PER_STMT > MAX_INLINE_INSNS)
+  else if (DECL_NUM_STMTS (fn) * INSNS_PER_STMT > max_inline_single)
     ;
   /* All is well.  We can inline this function.  Traditionally, GCC
      has refused to inline functions using alloca, or functions whose
@@ -655,11 +671,22 @@
 
   /* Even if this function is not itself too big to inline, it might
      be that we've done so much inlining already that we don't want to
-     risk inlining any more.  */
-  if ((DECL_NUM_STMTS (fn) + id->inlined_stmts) * INSNS_PER_STMT 
-      > MAX_INLINE_INSNS)
-    inlinable = 0;
-
+     risk inlining any more. 
+     Only if it's smaller tham min_inline, we still go ahead, unless
+     we're really far beyond the recursive inlining limit. */
+  if ((DECL_NUM_STMTS (fn) + id->inlined_stmts) * INSNS_PER_STMT
+       > max_inline_recursive 
+      && ( DECL_NUM_STMTS (fn) * INSNS_PER_STMT > min_inline
+	   || (DECL_NUM_STMTS (fn) + id->inlined_stmts) * INSNS_PER_STMT
+	   > max_inline_recursive * 64 ) ) {
+    /* Use a linear function with a slope of -0.0625
+     * we could also use an int approx of sqrt or similar things */
+    signed int max_curr = max_inline_single
+	- ( (DECL_NUM_STMTS (fn) + id->inlined_stmts) * INSNS_PER_STMT
+	    - max_inline_recursive ) / 16;
+    if ((signed int)(DECL_NUM_STMTS (fn) * INSNS_PER_STMT) > max_curr)
+      inlinable = 0;
+  }
   /* We can inline a template instantiation only if it's fully
      instantiated.  */
   if (inlinable
--- ./gcc/cp/ChangeLog.orig	Mon Aug 20 20:48:06 2001
+++ ./gcc/cp/ChangeLog	Wed Aug 22 20:32:01 2001
@@ -1,3 +1,10 @@
+2001-08-22  Kurt Garloff  <garloff@suse.de>
+
+	* optimize.c (inlinable_function_p): Change heuristics of inlining
+	heuristics: Rather than allow one single function to exhaust the
+	limit, allow only half way. Afterwards don't cut abruptly, but
+	get more and more restrictive until a minimum size.
+
 2001-08-19  Release Manager
 
 	* GCC 3.0.1 Released.
--- ./gcc/cp/NEWS.orig	Wed Aug 22 20:37:43 2001
+++ ./gcc/cp/NEWS	Wed Aug 22 20:37:14 2001
@@ -1,3 +1,13 @@
+*** Changes in GCC 3.0.2:
+
+* The meaning of the -finline-limit changed for C++: We allow single
+  functions up to half the size. Once we have exhausted the size by
+  recursive inlining, we start to decrease the acceptable size for
+  inlining slowly up to until we're at 8 times the number. Then only
+  very small functions are still inlined.
+  Results in much improved C++ runtime performance (with about the
+  same compile time).
+
 *** Changes in GCC 3.0.1:
 
 * -fhonor-std and -fno-honor-std have been removed. -fno-honor-std was
--- ./gcc/doc/gcc.info-4.orig	Mon Aug 20 21:07:55 2001
+++ ./gcc/doc/gcc.info-4	Wed Aug 22 20:46:12 2001
@@ -732,6 +732,14 @@
      presumably means slower programs).  This option is particularly
      useful for programs that use inlining heavily such as those based
      on recursive templates with C++.
+     Note that for C++, the rules are slightly more complicated. The
+     maximum size of a function that may be inlined is half of N. For
+     recursive inlining, we start to decrease the acceptable size for
+     inlining once we have inlined N pseudo instructions until we reach
+     8 times N. Then only very small functions may still be inlined.
+     Experimenting with this parameter may be useful to improve
+     runtime performance (e.g. N=1500) or to decrease the size of the
+     executable (e.g. N=260).
 
      _Note:_ pseudo instruction represents, in this particular context,
      an abstract measurement of function's size.  In no way, it

PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]