This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
[Patch] inlining tweaking for 2.95.3
- To: gcc at gcc dot gnu dot org
- Subject: [Patch] inlining tweaking for 2.95.3
- From: Kurt Garloff <kurt at garloff dot de>
- Date: Thu, 23 Aug 2001 12:34:44 +0200
- Cc: Bernd Schmidt <bernds at redhat dot com>
- Organization: TU/e(NL), SuSE(DE)
Hi,
encouraged by the success of tweaking the 3.0.1 tree inliner (C++),
http://gcc.gnu.org/ml/gcc/2001-08/msg01087.html
I had a look at 2.95.3. We only have the normal RTL inliner here in
integrate.
As the idea that we want to give a preference to leaves seems to be a good
one, I give them a preference by a factor of two (max_insns).
This way, the normal threshold can be set a bit lower, resulting in reduced
memory consumption and lowering the chance of consuming ridiculous amounts
of memeory becaus eof exessive inlining.
Then, I was playing with the number and found an astonishingly low number
needed to yielding maximum performance. Compile time is almost halved for my
tests.
Here are the benchmark results; this time I additionally benchmarked the
tests with std::complex<double> type.
All tests on iPIII-700 (Coppermine) under Linux 2.4.7
max-insn compile stripped sizes (B) TBCI bench specs
time (u) double cplx std::c double cplx std::c
100: 1:40 76868 100532 82756 0.919 0.816 0.698
130: 1:46 80644 102724 85148 0.921 0.847 0.698
150: 1:47 82484 102620 86460 0.916 0.855 0.782
180: 1:58 82300 102220 100372 0.918 0.839 0.779
200: 1:53 82332 104252 100644 0.923 0.846 0.779
240: 2:08 82364 111268 102588 0.922 0.848 0.781
240 -O3: 2:11 84660 111364 102436 0.904 0.849 0.776
300: 2:06 84572 114316 102660 0.921 0.844 0.781
400: 2:45 88364 114220 104828 0.920 0.846 0.783
500: 2:44 88364 114180 109644 0.922 0.851 0.780
600: 3:28 88364 122004 109636 0.917 0.847 0.780
For reference:
2.95.3: 3:37 88364 123636 109636 0.922 0.784 0.780
3.0.0: 3:48 89488 109872 0.873 0.800
3.0.1: 2:50 95620 0.320
3.0.1/wp: 2:49 83772 98108 93396 0.869 0.845 0.937
From those numbers, I'd go for 200. In the patch (attached), I chose for 240
to be a bit more safe against risking regression from 2.95.3. The optimum is
probably dependent on the platform, and as we're coming from 10000
(ridiculous value, BTW), I chose a value slightly above the one found
optimal during my tests.
If you compare the max-insns number to 3.0.1 with my patch, note that you
should multiply the 2.95.3 numbers by 2 to have a similar effect.
What do we learn?
* 2.95.3 performs amazingly well with the patch.
* I came up with a simple approach to make 3.0.1 inlining heuristics perform
much better. Maybe we can even go back to more simplisitc code:
Just pick a rather low value (300) and make sure we give a bonus to the
leaves, as we do in 2.95.3 with my patch.
Additionally, I'd keep the throttling to prevent infinite recusrion when
inlining. But just start to use it much much later than with plain 3.0.1
(Before my patch, the limit for a single functions was the same as the
recursive limit, which yielded very poor results.)
* The INTEGRATE_THRESHOLD seems to work well on 2.95.3; maybe we could
compute the min_insns in the tree inliner of 3.0.1 with a similar formula.
I would be delighted to get feddback on this patch.
I'd e.g. expect code which uses a lot of inlining (as most C++ code does) to
compile significantly faster. I'd expect KDE to compile in half of the time
e.g. and half a bit smaller executbales.
Is anybody able to find runtime performance pessimizations?
And, yeah, I would appreciate to find this patch back in 2.95.4.
Will there be one?
Regards,
--
Kurt Garloff <kurt@garloff.de> [Eindhoven, NL]
Physics: Plasma simulations <K.Garloff@Phys.TUE.NL> [TU Eindhoven, NL]
Linux: SCSI, Security <garloff@suse.de> [SuSE Nuernberg, DE]
(See mail header or public key servers for PGP2 and GPG public keys.)
diff -u gcc-2.95.3.orig/gcc/ChangeLog gcc-2.95.3/gcc/ChangeLog
--- gcc-2.95.3.orig/gcc/ChangeLog Fri Mar 16 13:52:02 2001
+++ gcc-2.95.3/gcc/ChangeLog Thu Aug 23 11:38:34 2001
@@ -1,3 +1,11 @@
+2001-08-23 Kurt Garloff <kurt@garloff.de>
+
+ * integrate.c (function_cannot_inline_p): Reduce max size for
+ inlining from 10000 to 240, twice this value (i.e. 480) for leaf
+ functions. Round up in INTEGRATE_THRESHOLD.
+ * toplev.c (rest_of_compilation): Set current_function_is_leaf
+ for function_cannot_inline_p
+
Fri Mar 16 12:46:19 GMT 2001 Bernd Schmidt (bernds@redhat.com)
* gcc-2.95.3 Released.
Only in gcc-2.95.3/gcc: ChangeLog~
diff -u gcc-2.95.3.orig/gcc/integrate.c gcc-2.95.3/gcc/integrate.c
--- gcc-2.95.3.orig/gcc/integrate.c Mon Apr 26 01:35:12 1999
+++ gcc-2.95.3/gcc/integrate.c Thu Aug 23 11:38:47 2001
@@ -53,10 +53,10 @@
This is overridden on RISC machines. */
#ifndef INTEGRATE_THRESHOLD
/* Inlining small functions might save more space then not inlining at
- all. Assume 1 instruction for the call and 1.5 insns per argument. */
+ all. Assume 2 instruction for the call/ret and 1.5 insns per argument. */
#define INTEGRATE_THRESHOLD(DECL) \
(optimize_size \
- ? (1 + (3 * list_length (DECL_ARGUMENTS (DECL))) / 2) \
+ ? (2 + (1 + 3 * list_length (DECL_ARGUMENTS (DECL))) / 2) \
: (8 * (8 + list_length (DECL_ARGUMENTS (DECL)))))
#endif
@@ -91,10 +91,12 @@
function. Increasing values mean more agressive inlining.
This affects currently only functions explicitly marked as
inline (or methods defined within the class definition for C++).
- The default value of 10000 is arbitrary but high to match the
- previously unlimited gcc capabilities. */
+ The default value of 240 is much lower than before and
+ matches better with the 3.0.1 numbers.
+ We allow double the size for leaf functions.
+ */
-int inline_max_insns = 10000;
+int inline_max_insns = 240;
/* Returns the Ith entry in the label_map contained in MAP. If the
@@ -154,6 +156,10 @@
if (current_function_cannot_inline)
return current_function_cannot_inline;
+ /* Prefer leaf functions */
+ if (current_function_is_leaf)
+ max_insns *= 2;
+
/* If its not even close, don't even look. */
if (get_max_uid () > 3 * max_insns)
return N_("function too large to be inline");
@@ -228,7 +234,7 @@
result = DECL_RTL (DECL_RESULT (fndecl));
if (result && GET_CODE (result) == PARALLEL)
return N_("inline functions not supported for this return value type");
-
+
return 0;
}
Only in gcc-2.95.3/gcc: integrate.c~
diff -u gcc-2.95.3.orig/gcc/toplev.c gcc-2.95.3/gcc/toplev.c
--- gcc-2.95.3.orig/gcc/toplev.c Thu Aug 23 07:51:53 2001
+++ gcc-2.95.3/gcc/toplev.c Thu Aug 23 09:41:44 2001
@@ -3623,6 +3623,7 @@
if (DECL_INLINE (decl) || flag_inline_functions)
TIMEVAR (integration_time,
{
+ current_function_is_leaf = leaf_function_p ();
lose = function_cannot_inline_p (decl);
if (lose || ! optimize)
{
Only in gcc-2.95.3/gcc: toplev.c~
PGP signature