This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [Patch] inlining tweaking for 2.95.3


On Thu, Aug 23, 2001 at 12:34:44PM +0200, Kurt Garloff wrote:
> As the idea that we want to give a preference to leaves seems to be a good
> one, I give them a preference by a factor of two (max_insns).
> This way, the normal threshold can be set a bit lower, resulting in reduced
> memory consumption and lowering the chance of consuming ridiculous amounts
> of memeory becaus eof exessive inlining.
> 
> Then, I was playing with the number and found an astonishingly low number
> needed to yielding maximum performance. Compile time is almost halved for my
> tests.

Unfotunately, benchmarking many more apps, I found runtime performance
between 20% worse to 5% better; now I think we want to avoid the 20% worse
within the 2.95 tree. (Well, the 3.0.1 did yield degradations of more than a
factor 2 as compared to 3.0, but that's another story.)

The patch should be included, but we may want to discuss the exact default
value for the inline-limit:
* With 400, I still get considerably better compile time with slightly
  smaller binaries; runtime performance varies between 5% worse to 5% 
  better. (Avg. seems to be slightly worse, like 1%.)
* With 1000, we would almost restore the 2.95.3 behaviour, except for 
  crazily huge inline functions. Now, we want to avoid them anyway,
  in order to improve runtime performance and in order to not consume
  excessive resources.
  (I do consider this a bug fix; I got a little C++ test program that will
  make G++-2.95.3 consume > 0.5GB memory with optimization. Without, it
  gets compiled in a few seconds with only tenths of MBs memory consumption.)

Note that code that does not do heavy inlining, such as most C code, is not
affected at all by the patch, unless you do -finline-functions aka -O3.

> I would be delighted to get feddback on this patch.
> I'd e.g. expect code which uses a lot of inlining (as most C++ code does) to
> compile significantly faster. I'd expect KDE to compile in half of the time
> e.g. and half a bit smaller executbales. 

Now, I was too optimistic here, I guess. But something like 15% improvement
still seems reasonable to me.

> Is anybody able to find runtime performance pessimizations?

I'd like to get reports, especially with the second version of the patch.
If you find pessimization, what is the effect of changing X in
-finline-limit-X to e.g. 200, 260, 320, 500, 750, 1000, 2000?

New patch (defaulting to 400) attached.

> And, yeah, I would appreciate to find this patch back in 2.95.4. 

This one should be ready for inclusion, I believe.

> Will there be one?

?
-- 
Kurt Garloff                   <kurt@garloff.de>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <garloff@suse.de>    [SuSE Nuernberg, DE]
 (See mail header or public key servers for PGP2 and GPG public keys.)
diff -u gcc-2.95.3.orig/gcc/ChangeLog gcc-2.95.3/gcc/ChangeLog
--- gcc-2.95.3.orig/gcc/ChangeLog	Fri Mar 16 13:52:02 2001
+++ gcc-2.95.3/gcc/ChangeLog	Fri Aug 24 07:57:47 2001
@@ -1,3 +1,11 @@
+2001-08-23  Kurt Garloff  <kurt@garloff.de>
+	
+	* integrate.c (function_cannot_inline_p): Reduce max size for
+	inlining from 10000 to 400, triple this value (i.e. 1200) for
+	leaf functions. Fine tune INTEGRATE_THRESHOLD.
+	* toplev.c (rest_of_compilation): Set current_function_is_leaf 
+	for function_cannot_inline_p
+
 Fri Mar 16 12:46:19 GMT 2001 Bernd Schmidt  (bernds@redhat.com)
 
 	* gcc-2.95.3 Released.
Only in gcc-2.95.3/gcc: ChangeLog~
diff -u gcc-2.95.3.orig/gcc/integrate.c gcc-2.95.3/gcc/integrate.c
--- gcc-2.95.3.orig/gcc/integrate.c	Mon Apr 26 01:35:12 1999
+++ gcc-2.95.3/gcc/integrate.c	Fri Aug 24 07:51:35 2001
@@ -53,11 +53,11 @@
    This is overridden on RISC machines.  */
 #ifndef INTEGRATE_THRESHOLD
 /* Inlining small functions might save more space then not inlining at
-   all.  Assume 1 instruction for the call and 1.5 insns per argument.  */
+   all.  Assume 2 instruction for the call/ret and 1.5 insns per argument.  */
 #define INTEGRATE_THRESHOLD(DECL) \
   (optimize_size \
-   ? (1 + (3 * list_length (DECL_ARGUMENTS (DECL))) / 2) \
-   : (8 * (8 + list_length (DECL_ARGUMENTS (DECL)))))
+   ? (2 + (1 + 3 * list_length (DECL_ARGUMENTS (DECL))) / 2) \
+   : (8 * (7 + list_length (DECL_ARGUMENTS (DECL)))))
 #endif
 
 static rtx initialize_for_inline	PROTO((tree, int, int, int, int));
@@ -91,10 +91,12 @@
    function.  Increasing values mean more agressive inlining.
    This affects currently only functions explicitly marked as
    inline (or methods defined within the class definition for C++).
-   The default value of 10000 is arbitrary but high to match the
-   previously unlimited gcc capabilities.  */
+   The default value of 400 is much lower than before and
+   matches better with the 3.0.1 numbers.
+   We allow double the size for leaf functions.
+ */
 
-int inline_max_insns = 10000;
+int inline_max_insns = 400;
 
 
 /* Returns the Ith entry in the label_map contained in MAP.  If the
@@ -154,6 +156,10 @@
   if (current_function_cannot_inline)
     return current_function_cannot_inline;
 
+  /* Prefer leaf functions */
+  if (current_function_is_leaf)
+    max_insns *= 3;
+
   /* If its not even close, don't even look.  */
   if (get_max_uid () > 3 * max_insns)
     return N_("function too large to be inline");
Only in gcc-2.95.3/gcc: integrate.c~
Common subdirectories: gcc-2.95.3.orig/gcc/intl and gcc-2.95.3/gcc/intl
diff -u gcc-2.95.3.orig/gcc/invoke.texi gcc-2.95.3/gcc/invoke.texi
--- gcc-2.95.3.orig/gcc/invoke.texi	Thu Jan 25 15:03:17 2001
+++ gcc-2.95.3/gcc/invoke.texi	Fri Aug 24 07:55:49 2001
@@ -2351,11 +2351,15 @@
 inline (ie marked with the inline keyword or defined within the class 
 definition in c++).  @var{n} is the size of functions that can be inlined in 
 number of pseudo instructions (not counting parameter handling).  The default
-value of n is 10000.  Increasing this value can result in more inlined code at
-the cost of compilation time and memory consumption.  Decreasing usually makes
-the compilation faster and less code will be inlined (which presumably 
-means slower programs).  This option is particularly useful for programs that 
-use inlining heavily such as those based on recursive templates with c++.
+value of n is 400 .  Increasing this value (to e.g. 750) can result in more
+inlined code at the cost of compilation time and memory consumption and will
+result in larger excutables. Too much inlining can decrease performance
+again because of the limited size of your CPU's instruction cache.
+Decreasing (e.g. to 150) usually makes the compilation faster and less code
+will be inlined (which presumably means smaller executables but slower
+programs).  This option is particularly useful for programs that use
+inlining heavily such as those based on recursive templates with c++.
+Note that functions at the leaf of the call tree get a bonus.
 
 @emph{Note:} pseudo instruction represents, in this particular context, an
 abstract measurement of function's size.  In no way, it represents a count
Only in gcc-2.95.3/gcc: invoke.texi~
diff -u gcc-2.95.3.orig/gcc/toplev.c gcc-2.95.3/gcc/toplev.c
--- gcc-2.95.3.orig/gcc/toplev.c	Thu Aug 23 07:51:53 2001
+++ gcc-2.95.3/gcc/toplev.c	Thu Aug 23 09:41:44 2001
@@ -3623,6 +3623,7 @@
       if (DECL_INLINE (decl) || flag_inline_functions)
 	TIMEVAR (integration_time,
 		 {
+		   current_function_is_leaf = leaf_function_p ();
 		   lose = function_cannot_inline_p (decl);
 		   if (lose || ! optimize)
 		     {
Only in gcc-2.95.3/gcc: toplev.c~

PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]