This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[Patch] inlining tweaking for 2.95.3


Hi,

encouraged by the success of tweaking the 3.0.1 tree inliner (C++),
http://gcc.gnu.org/ml/gcc/2001-08/msg01087.html
I had a look at 2.95.3. We only have the normal RTL inliner here in
integrate.

As the idea that we want to give a preference to leaves seems to be a good
one, I give them a preference by a factor of two (max_insns).
This way, the normal threshold can be set a bit lower, resulting in reduced
memory consumption and lowering the chance of consuming ridiculous amounts
of memeory becaus eof exessive inlining.

Then, I was playing with the number and found an astonishingly low number
needed to yielding maximum performance. Compile time is almost halved for my
tests.

Here are the benchmark results; this time I additionally benchmarked the
tests with std::complex<double> type.
All tests on iPIII-700 (Coppermine) under Linux 2.4.7
                    
max-insn  compile    stripped sizes (B)         TBCI bench specs
          time (u)  double    cplx  std::c    double    cplx  std::c
 100:         1:40   76868  100532   82756     0.919   0.816   0.698
 130:         1:46   80644  102724   85148     0.921   0.847   0.698
 150:         1:47   82484  102620   86460     0.916   0.855   0.782
 180:         1:58   82300  102220  100372     0.918   0.839   0.779
 200:         1:53   82332  104252  100644     0.923   0.846   0.779
 240:         2:08   82364  111268  102588     0.922   0.848   0.781
 240 -O3:     2:11   84660  111364  102436     0.904   0.849   0.776
 300:         2:06   84572  114316  102660     0.921   0.844   0.781
 400:         2:45   88364  114220  104828     0.920   0.846   0.783
 500:         2:44   88364  114180  109644     0.922   0.851   0.780
 600:         3:28   88364  122004  109636     0.917   0.847   0.780

For reference:

2.95.3:       3:37   88364  123636  109636     0.922   0.784   0.780
3.0.0:        3:48   89488  109872             0.873   0.800
3.0.1:        2:50   95620                     0.320
3.0.1/wp:     2:49   83772   98108   93396     0.869   0.845   0.937



From those numbers, I'd go for 200. In the patch (attached), I chose for 240
to be a bit more safe against risking regression from 2.95.3. The optimum is
probably dependent on the platform, and as we're coming from 10000
(ridiculous value, BTW), I chose a value slightly above the one found
optimal during my tests.

If you compare the max-insns number to 3.0.1 with my patch, note that you
should multiply the 2.95.3 numbers by 2 to have a similar effect.

What do we learn?
* 2.95.3 performs amazingly well with the patch.
* I came up with a simple approach to make 3.0.1 inlining heuristics perform
  much better. Maybe we can even go back to more simplisitc code:
  Just pick a rather low value (300) and make sure we give a bonus to the
  leaves, as we do in 2.95.3 with my patch.
  Additionally, I'd keep the throttling to prevent infinite recusrion when
  inlining. But just start to use it much much later than with plain 3.0.1
  (Before my patch, the limit for a single functions was the same as the
  recursive limit, which yielded very poor results.)
* The INTEGRATE_THRESHOLD seems to work well on 2.95.3; maybe we could
  compute the min_insns in the tree inliner of 3.0.1 with a similar formula.

I would be delighted to get feddback on this patch.
I'd e.g. expect code which uses a lot of inlining (as most C++ code does) to
compile significantly faster. I'd expect KDE to compile in half of the time
e.g. and half a bit smaller executbales. 
Is anybody able to find runtime performance pessimizations?

And, yeah, I would appreciate to find this patch back in 2.95.4. 
Will there be one?

Regards,
-- 
Kurt Garloff                   <kurt@garloff.de>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <garloff@suse.de>    [SuSE Nuernberg, DE]
 (See mail header or public key servers for PGP2 and GPG public keys.)
diff -u gcc-2.95.3.orig/gcc/ChangeLog gcc-2.95.3/gcc/ChangeLog
--- gcc-2.95.3.orig/gcc/ChangeLog	Fri Mar 16 13:52:02 2001
+++ gcc-2.95.3/gcc/ChangeLog	Thu Aug 23 11:38:34 2001
@@ -1,3 +1,11 @@
+2001-08-23  Kurt Garloff  <kurt@garloff.de>
+	
+	* integrate.c (function_cannot_inline_p): Reduce max size for
+	inlining from 10000 to 240, twice this value (i.e. 480) for leaf
+	functions. Round up in INTEGRATE_THRESHOLD.
+	* toplev.c (rest_of_compilation): Set current_function_is_leaf 
+	for function_cannot_inline_p
+
 Fri Mar 16 12:46:19 GMT 2001 Bernd Schmidt  (bernds@redhat.com)
 
 	* gcc-2.95.3 Released.
Only in gcc-2.95.3/gcc: ChangeLog~
diff -u gcc-2.95.3.orig/gcc/integrate.c gcc-2.95.3/gcc/integrate.c
--- gcc-2.95.3.orig/gcc/integrate.c	Mon Apr 26 01:35:12 1999
+++ gcc-2.95.3/gcc/integrate.c	Thu Aug 23 11:38:47 2001
@@ -53,10 +53,10 @@
    This is overridden on RISC machines.  */
 #ifndef INTEGRATE_THRESHOLD
 /* Inlining small functions might save more space then not inlining at
-   all.  Assume 1 instruction for the call and 1.5 insns per argument.  */
+   all.  Assume 2 instruction for the call/ret and 1.5 insns per argument.  */
 #define INTEGRATE_THRESHOLD(DECL) \
   (optimize_size \
-   ? (1 + (3 * list_length (DECL_ARGUMENTS (DECL))) / 2) \
+   ? (2 + (1 + 3 * list_length (DECL_ARGUMENTS (DECL))) / 2) \
    : (8 * (8 + list_length (DECL_ARGUMENTS (DECL)))))
 #endif
 
@@ -91,10 +91,12 @@
    function.  Increasing values mean more agressive inlining.
    This affects currently only functions explicitly marked as
    inline (or methods defined within the class definition for C++).
-   The default value of 10000 is arbitrary but high to match the
-   previously unlimited gcc capabilities.  */
+   The default value of 240 is much lower than before and
+   matches better with the 3.0.1 numbers.
+   We allow double the size for leaf functions.
+ */
 
-int inline_max_insns = 10000;
+int inline_max_insns = 240;
 
 
 /* Returns the Ith entry in the label_map contained in MAP.  If the
@@ -154,6 +156,10 @@
   if (current_function_cannot_inline)
     return current_function_cannot_inline;
 
+  /* Prefer leaf functions */
+  if (current_function_is_leaf)
+    max_insns *= 2;
+
   /* If its not even close, don't even look.  */
   if (get_max_uid () > 3 * max_insns)
     return N_("function too large to be inline");
@@ -228,7 +234,7 @@
   result = DECL_RTL (DECL_RESULT (fndecl));
   if (result && GET_CODE (result) == PARALLEL)
     return N_("inline functions not supported for this return value type");
-
+	
   return 0;
 }
 
Only in gcc-2.95.3/gcc: integrate.c~
diff -u gcc-2.95.3.orig/gcc/toplev.c gcc-2.95.3/gcc/toplev.c
--- gcc-2.95.3.orig/gcc/toplev.c	Thu Aug 23 07:51:53 2001
+++ gcc-2.95.3/gcc/toplev.c	Thu Aug 23 09:41:44 2001
@@ -3623,6 +3623,7 @@
       if (DECL_INLINE (decl) || flag_inline_functions)
 	TIMEVAR (integration_time,
 		 {
+		   current_function_is_leaf = leaf_function_p ();
 		   lose = function_cannot_inline_p (decl);
 		   if (lose || ! optimize)
 		     {
Only in gcc-2.95.3/gcc: toplev.c~

PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]