This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

--param inline-unit-growth tweek


Hi,
the inliner now cut program size growth as PARAM_INLINE_UNIT_GROWTH% of
original program size before inliner is invoked.  Unforutnately this value is
not very well defined: with early inlining we do a lot of inlining not included
in this number, additionally we do have mandatory inlining via alwaysinline and
flatten construct we don't want to block inlining of small functions and
finally inlining of very small functions reduce estimated program size a lot.

In fact for tramp3d, early inlining reduce stimates about 5 times.  This
includes all inlining of functions smaller than the expected call overhead.  In
addition to this we manage to devirtualize/constant propagate calls via hooks
and real inliner further finds additional small functions to inline.  This
accounts about another 15% of program size (this is actually a surprise to me -
I would like to see some kind of evidence that this indeed is all from
devirtualized calls but it ought to).

As a result setting the parameter to 0 now allows 15% growth for tramp3d (about
26% growth for gerald's testcase) and this number closely depends on how early
inliner and real inliner cooperate (ie I hope that ealry inliner will be able
to deal with functions called via hooks and do basic simple devirtualization
that would mean that we will reduce this "reserve" to 0% if everything goes
well) and thus improving early inlininer would lead to significantly reducing
amount of work done by late inliner and such.  In short even if it doesn't seem
to, the parameter has pretty unobvious consequences.

This patch change the parameter to mean actual unit growth: the greedy
algorithm of inliner is implemented in a way that inlining reducing code size
is performed first and we start code expanding inlining only afterwards
(naturally some code expanding inlining might lead to further code size
reducing inlining).

Now the inliner is looking for minimal unit size reached during the inlining
process and the the allowed growth is based on this number: so setting it to 0
would actually mean inlining to accomondate -Os (this is however not
implemented by this patch).

Given the "reserve" numbers above this naturally leads to less inlining.  I did
bencharking on Gerald's DLV application with growths set to 50,60 and 70% (this
is only benchmark I really know to be pushing this limit up - Tramp3d and SPEC
are both very happy with a lot smaller growths)


[0]: /usr/bin/time ./dl-mainline
[1]: /usr/bin/time ./dl-patch50
[2]: /usr/bin/time ./dl-patch60
[3]: /usr/bin/time ./dl-patch70

                     |     [0]      |     [1]      |     [2]      |     [3]      |
---------------------+--------------+--------------+--------------+--------------+
      STRATCOMP1-ALL |  2.18 (0.01) |  2.19 (0.00) |  2.14 (0.01) |  2.16 (0.00) |
   STRATCOMP-770.2-Q |  0.48 (0.00) |  0.50 (0.00) |  0.48 (0.00) |  0.48 (0.01) |
               2QBF1 | 11.10 (0.09) | 11.30 (0.09) | 11.45 (0.07) | 11.10 (0.09) |
          PRIMEIMPL2 |  4.99 (0.02) |  5.67 (0.02) |  5.24 (0.02) |  5.01 (0.01) |
       3COL-SIMPLEX1 |  4.98 (0.01) |  5.35 (0.02) |  5.02 (0.02) |  5.06 (0.10) |
        3COL-RANDOM1 |  5.72 (0.04) |  6.32 (0.10) |  6.17 (0.11) |  5.69 (0.03) |
          HP-RANDOM1 |  6.21 (0.10) |  6.41 (0.09) |  6.12 (0.03) |  6.35 (0.28) |
       HAMCYCLE-FREE |  0.72 (0.00) |  0.74 (0.00) |  0.73 (0.01) |  0.72 (0.00) |
             DECOMP2 |  7.80 (0.03) |  7.57 (0.04) |  7.45 (0.02) |  7.51 (0.04) |
        BW-P5-nopush |  4.41 (0.03) |  4.57 (0.03) |  4.50 (0.04) |  4.40 (0.01) |
       BW-P5-pushbin |  3.51 (0.04) |  3.62 (0.03) |  3.56 (0.03) |  3.52 (0.02) |
     BW-P5-nopushbin |  1.17 (0.01) |  1.23 (0.01) |  1.18 (0.02) |  1.18 (0.01) |
        HANOI-Towers |  2.38 (0.01) |  2.44 (0.04) |  2.35 (0.03) |  2.36 (0.07) |
              RAMSEY |  5.36 (0.00) |  5.70 (0.03) |  5.32 (0.03) |  5.34 (0.03) |
             CRISTAL |  6.25 (0.25) |  6.56 (0.17) |  6.09 (0.19) |  6.33 (0.23) |
           21-QUEENS |  5.15 (0.02) |  4.97 (0.04) |  5.26 (0.02) |  5.13 (0.02) |
   MSTDir[V=13,A=40] |  8.22 (0.03) |  8.64 (0.03) |  8.33 (0.02) |  8.29 (0.02) |
   MSTDir[V=15,A=40] |  8.21 (0.03) |  8.63 (0.03) |  8.30 (0.02) |  8.29 (0.03) |
 MSTUndir[V=13,A=40] |  4.52 (0.02) |  4.69 (0.02) |  4.53 (0.02) |  4.51 (0.02) |
         TIMETABLING |  6.06 (0.01) |  6.39 (0.02) |  6.06 (0.03) |  6.08 (0.01) |
---------------------+--------------+--------------+--------------+--------------+

It is clear that keeping limit on 50% brings some runtime regressions, so I've bumped
it to 60% that doesn't seem too bad and still produces smaller binarry
(1.77MB instead of 1.82MB when statically linked against the same libstdc++,
2m34s of compilation instead of 2m55s of compilation)

I've regtested and bootstrapped the following patch and commited it (in order
to hit tonight C++ benchmark testers).  This is just start of the planned
inliner tunning process and is primarily meant to ensure some sort of
consistency and limit hidden dependencies in between various parameters.
I certainly hope I will be able to actually reduce the value of this
parameter rather then bumping it up.

Honza

	* ipa-inline.c (initial_insns, max_insns): Delete.
	(compute_max_insns): New function.
	(cgraph_decide_inlining_of_small_function): Use it; take minimal amount
	of insns as base for code growth.
	(cgraph_decide_inlining): Make initial_insns local; do not compute
	max_insns.
	* params.def (PARAM_INLINE_UNIT_GROWTH): Set to 60.
	* invoke.texi (inline-unit-growth): Update docs.
Index: ipa-inline.c
===================================================================
--- ipa-inline.c	(revision 121142)
+++ ipa-inline.c	(working copy)
@@ -169,9 +169,7 @@ cgraph_decide_inlining_incrementally (st
 /* Statistics we collect about inlining algorithm.  */
 static int ncalls_inlined;
 static int nfunctions_inlined;
-static int initial_insns;
 static int overall_insns;
-static int max_insns;
 static gcov_type max_count;
 
 /* Estimate size of the function after inlining WHAT into TO.  */
@@ -753,6 +751,19 @@ cgraph_set_inline_failed (struct cgraph_
       e->inline_failed = reason;
 }
 
+/* Given whole compilation unit esitmate of INSNS, compute how large we can
+   allow the unit to grow.  */
+static int
+compute_max_insns (int insns)
+{
+  int max_insns = insns;
+  if (max_insns < PARAM_VALUE (PARAM_LARGE_UNIT_INSNS))
+    max_insns = PARAM_VALUE (PARAM_LARGE_UNIT_INSNS);
+
+  return max_insns = ((HOST_WIDEST_INT) max_insns
+	              * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
+}
+
 /* We use greedy algorithm for inlining of small functions:
    All inline candidates are put into prioritized heap based on estimated
    growth of the overall number of instructions and then update the estimates.
@@ -768,6 +779,7 @@ cgraph_decide_inlining_of_small_function
   const char *failed_reason;
   fibheap_t heap = fibheap_new ();
   bitmap updated_nodes = BITMAP_ALLOC (NULL);
+  int min_insns, max_insns;
 
   if (dump_file)
     fprintf (dump_file, "\nDeciding on smaller functions:\n");
@@ -796,6 +808,10 @@ cgraph_decide_inlining_of_small_function
 	    edge->aux = fibheap_insert (heap, cgraph_edge_badness (edge), edge);
 	  }
     }
+
+  max_insns = compute_max_insns (overall_insns);
+  min_insns = overall_insns;
+
   while (overall_insns <= max_insns && (edge = fibheap_extract_min (heap)))
     {
       int old_insns = overall_insns;
@@ -923,6 +939,14 @@ cgraph_decide_inlining_of_small_function
 		   edge->caller->global.insns,
 		   overall_insns - old_insns);
 	}
+      if (min_insns > overall_insns)
+	{
+	  min_insns = overall_insns;
+	  max_insns = compute_max_insns (min_insns);
+
+	  if (dump_file)
+	    fprintf (dump_file, "New minimal insns reached: %i\n", min_insns);
+	}
     }
   while ((edge = fibheap_extract_min (heap)) != NULL)
     {
@@ -949,6 +973,7 @@ cgraph_decide_inlining (void)
     XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
   int old_insns = 0;
   int i;
+  int initial_insns;
 
   max_count = 0;
   for (node = cgraph_nodes; node; node = node->next)
@@ -965,13 +990,6 @@ cgraph_decide_inlining (void)
   overall_insns = initial_insns;
   gcc_assert (!max_count || (profile_info && flag_branch_probabilities));
 
-  max_insns = overall_insns;
-  if (max_insns < PARAM_VALUE (PARAM_LARGE_UNIT_INSNS))
-    max_insns = PARAM_VALUE (PARAM_LARGE_UNIT_INSNS);
-
-  max_insns = ((HOST_WIDEST_INT) max_insns
-	       * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
-
   nnodes = cgraph_postorder (order);
 
   if (dump_file)
@@ -996,12 +1014,10 @@ cgraph_decide_inlining (void)
       /* Handle nodes to be flattened, but don't update overall unit size.  */
       if (lookup_attribute ("flatten", DECL_ATTRIBUTES (node->decl)) != NULL)
         {
-	  int old_overall_insns = overall_insns;
   	  if (dump_file)
     	    fprintf (dump_file,
 	     	     "Flattening %s\n", cgraph_node_name (node));
 	  cgraph_decide_inlining_incrementally (node, INLINE_ALL, 0);
-	  overall_insns = old_overall_insns;
         }
 
       if (!node->local.disregard_inline_limits)
Index: params.def
===================================================================
--- params.def	(revision 121142)
+++ params.def	(working copy)
@@ -199,7 +199,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
 DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
 	 "inline-unit-growth",
 	 "how much can given compilation unit grow because of the inlining (in percent)",
-	 50, 0, 0)
+	 60, 0, 0)
 DEFPARAM(PARAM_INLINE_CALL_COST,
 	 "inline-call-cost",
 	 "expense of call operation relative to ordinary arithmetic operations",
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 121142)
+++ doc/invoke.texi	(working copy)
@@ -6096,7 +6096,7 @@ before applying @option{--param inline-u
 @item inline-unit-growth
 Specifies maximal overall growth of the compilation unit caused by inlining.
 This parameter is ignored when @option{-funit-at-a-time} is not used.
-The default value is 50 which limits unit growth to 1.5 times the original
+The default value is 60 which limits unit growth to 1.6 times the original
 size.
 
 @item large-stack-frame


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]