This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Inirect inlining in early inliner


Hi,
this patch teach early inliner to handle indirect inlining and devirtualization.
Basically I want here:

extern void inlined ();
void inline_me_too (void);
void inline_through_me (void (*ptr)(void));
void
inline_me (void)
{
  inlined();
}

void main(void)
{
  inline_through_me (inline_me);
  inline_through_me (inline_me_too);
}
void
inline_through_me (void (*ptr)(void))
{
  ptr();
}

void
inline_me_too (void)
{
  inlined();
}

My motivation si basically driven by libstdc++ headers where some functions
just wraps virtual function call.  Inlining them makes it possible to
devirtualize the call early and get better output of early inlining.

With current cost metrics we are basically able to inline only very simple
wrappers doing nothing but virtual call, still this enables over 200 more
functions to be inlined in tramp3d and results in better ipa-pure-const
results.

I handle this by iteration: first we decide inlining, perform it and inliner's
argument subtitution results in creation of new direct function calls.  Early
inliner is re-done then and if new oppurtunities are found, we iterate.  I
expect this scheme to be overall cheaper than doing proper jump functions that
ivolve walking whole function body, while this involve just one extra cheap
pass over cgraph edges in average case.

There is however important problem with flatten and alwaysinline that when used
in cycle results in infinite iteration (such as testcase flatten-2.c).  I made
early inliner to ignore these attributes in subsequent iterations that seems
safe enough, still for degenerate case where function inlining does not affect
function size metric (so inlining would iterate since function would not grow)
I added bound on number of iterations.

I don't think there is another resonable way to prevent the infinite inlining
chain since we don't want to build info on SCCs and we don't preserve inlining
history after inlining the calls so we can't easilly detect a cycle.

I will wait for couple days for comments on the patch before commiting and will
also re-benchmark this patch.  On pretty-ipa similar patch caused noticeable
improvmenets in libstdc++ benchamrks and DLV.

Bootstrapped/regtested x86_64-linux with Richi's SRA patch reverted.
The patch caused number of type checking error in libstdc++ testcases.

Honza

	* doc/invoke.texi (max-early-inliner-iterations): New flag.
	* ipa-inline.c (enum inlining_mode): New INLINE_SIZE_NORECURSIVE.
	(try_inline): Fix return value.
	(cgraph_decide_inlining_incrementally): Honor new value.
	(cgraph_early_inlining): Handle indirect inlining.
	* params.def (PARAM_EARLY_INLINER_MAX_ITERATIONS): New.

	* testsuite/gcc.dg/tree-ssa/inline-3.c: New testcase

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 147344)
+++ doc/invoke.texi	(working copy)
@@ -7461,6 +7461,12 @@ given call expression.  This parameter l
 whose probability exceeds given threshold (in percents).  The default value is
 10.
 
+@item max-early-inliner-iterations
+@itemx max-early-inliner-iterations
+Limit of iterations of early inliner.  This basically bounds number of nested
+indirect calls early inliner can resolve.  Deeper chains are still handled by
+late inlining.
+
 @item inline-call-cost
 Specify cost of call instruction relative to simple arithmetics operations
 (having cost of 1).  Increasing this cost disqualifies inlining of non-leaf
Index: testsuite/gcc.dg/tree-ssa/inline-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/inline-3.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/inline-3.c	(revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-einline2" } */
+extern void inlined ();
+void inline_me_too (void);
+void inline_through_me (void (*ptr)(void));
+void
+inline_me (void)
+{
+  inlined();
+}
+
+void main(void)
+{
+  inline_through_me (inline_me);
+  inline_through_me (inline_me_too);
+}
+void
+inline_through_me (void (*ptr)(void))
+{
+  ptr();
+}
+
+void
+inline_me_too (void)
+{
+  inlined();
+}
+/* { dg-final { scan-tree-dump-times "Inlining inline_me " 1 "einline2"} } */
+/* { dg-final { scan-tree-dump-times "Inlining inline_me_too " 1 "einline2"} } */
Index: ipa-inline.c
===================================================================
--- ipa-inline.c	(revision 147344)
+++ ipa-inline.c	(working copy)
@@ -152,6 +152,7 @@ along with GCC; see the file COPYING3.  
 enum inlining_mode {
   INLINE_NONE = 0,
   INLINE_ALWAYS_INLINE,
+  INLINE_SIZE_NORECURSIVE,
   INLINE_SIZE,
   INLINE_ALL
 };
@@ -1269,6 +1270,7 @@ try_inline (struct cgraph_edge *e, enum 
   struct cgraph_node *callee = e->callee;
   enum inlining_mode callee_mode = (enum inlining_mode) (size_t) callee->aux;
   bool always_inline = e->callee->local.disregard_inline_limits;
+  bool inlined = false;
 
   /* We've hit cycle?  */
   if (callee_mode)
@@ -1323,9 +1325,10 @@ try_inline (struct cgraph_edge *e, enum 
 
       if (mode == INLINE_ALL || always_inline)
 	cgraph_decide_inlining_incrementally (e->callee, mode, depth + 1);
+      inlined = true;
     }
   callee->aux = (void *)(size_t) callee_mode;
-  return true;
+  return inlined;
 }
 
 /* Decide on the inlining.  We do so in the topological order to avoid
@@ -1348,7 +1351,7 @@ cgraph_decide_inlining_incrementally (st
 
   old_mode = (enum inlining_mode) (size_t)node->aux;
 
-  if (mode != INLINE_ALWAYS_INLINE
+  if (mode != INLINE_ALWAYS_INLINE && mode != INLINE_SIZE_NORECURSIVE
       && lookup_attribute ("flatten", DECL_ATTRIBUTES (node->decl)) != NULL)
     {
       if (dump_file)
@@ -1362,69 +1365,70 @@ cgraph_decide_inlining_incrementally (st
   node->aux = (void *)(size_t) mode;
 
   /* First of all look for always inline functions.  */
-  for (e = node->callees; e; e = e->next_callee)
-    {
-      if (!e->callee->local.disregard_inline_limits
-	  && (mode != INLINE_ALL || !e->callee->local.inlinable))
-	continue;
-      if (gimple_call_cannot_inline_p (e->call_stmt))
-	continue;
-      /* When the edge is already inlined, we just need to recurse into
-	 it in order to fully flatten the leaves.  */
-      if (!e->inline_failed && mode == INLINE_ALL)
-	{
-          inlined |= try_inline (e, mode, depth);
-	  continue;
-	}
-      if (dump_file)
-	{
-	  indent_to (dump_file, depth);
-	  fprintf (dump_file,
-		   "Considering to always inline inline candidate %s.\n",
-		   cgraph_node_name (e->callee));
-	}
-      if (cgraph_recursive_inlining_p (node, e->callee, &e->inline_failed))
-	{
-	  if (dump_file)
-	    {
-	      indent_to (dump_file, depth);
-	      fprintf (dump_file, "Not inlining: recursive call.\n");
-	    }
-	  continue;
-	}
-      if (!tree_can_inline_p (node->decl, e->callee->decl))
-	{
-	  gimple_call_set_cannot_inline (e->call_stmt, true);
-	  if (dump_file)
-	    {
-	      indent_to (dump_file, depth);
-	      fprintf (dump_file,
-		       "Not inlining: Target specific option mismatch.\n");
-	    }
-	  continue;
-	}
-      if (gimple_in_ssa_p (DECL_STRUCT_FUNCTION (node->decl))
-	  != gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->callee->decl)))
-	{
-	  if (dump_file)
-	    {
-	      indent_to (dump_file, depth);
-	      fprintf (dump_file, "Not inlining: SSA form does not match.\n");
-	    }
+  if (mode != INLINE_SIZE_NORECURSIVE)
+    for (e = node->callees; e; e = e->next_callee)
+      {
+	if (!e->callee->local.disregard_inline_limits
+	    && (mode != INLINE_ALL || !e->callee->local.inlinable))
 	  continue;
-	}
-      if (!e->callee->analyzed && !e->callee->inline_decl)
-	{
-	  if (dump_file)
-	    {
-	      indent_to (dump_file, depth);
-	      fprintf (dump_file,
-		       "Not inlining: Function body no longer available.\n");
-	    }
+	if (gimple_call_cannot_inline_p (e->call_stmt))
 	  continue;
-	}
-      inlined |= try_inline (e, mode, depth);
-    }
+	/* When the edge is already inlined, we just need to recurse into
+	   it in order to fully flatten the leaves.  */
+	if (!e->inline_failed && mode == INLINE_ALL)
+	  {
+	    inlined |= try_inline (e, mode, depth);
+	    continue;
+	  }
+	if (dump_file)
+	  {
+	    indent_to (dump_file, depth);
+	    fprintf (dump_file,
+		     "Considering to always inline inline candidate %s.\n",
+		     cgraph_node_name (e->callee));
+	  }
+	if (cgraph_recursive_inlining_p (node, e->callee, &e->inline_failed))
+	  {
+	    if (dump_file)
+	      {
+		indent_to (dump_file, depth);
+		fprintf (dump_file, "Not inlining: recursive call.\n");
+	      }
+	    continue;
+	  }
+	if (!tree_can_inline_p (node->decl, e->callee->decl))
+	  {
+	    gimple_call_set_cannot_inline (e->call_stmt, true);
+	    if (dump_file)
+	      {
+		indent_to (dump_file, depth);
+		fprintf (dump_file,
+			 "Not inlining: Target specific option mismatch.\n");
+	      }
+	    continue;
+	  }
+	if (gimple_in_ssa_p (DECL_STRUCT_FUNCTION (node->decl))
+	    != gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->callee->decl)))
+	  {
+	    if (dump_file)
+	      {
+		indent_to (dump_file, depth);
+		fprintf (dump_file, "Not inlining: SSA form does not match.\n");
+	      }
+	    continue;
+	  }
+	if (!e->callee->analyzed && !e->callee->inline_decl)
+	  {
+	    if (dump_file)
+	      {
+		indent_to (dump_file, depth);
+		fprintf (dump_file,
+			 "Not inlining: Function body no longer available.\n");
+	      }
+	    continue;
+	  }
+	inlined |= try_inline (e, mode, depth);
+      }
 
   /* Now do the automatic inlining.  */
   if (mode != INLINE_ALL && mode != INLINE_ALWAYS_INLINE)
@@ -1459,7 +1463,7 @@ cgraph_decide_inlining_incrementally (st
 	/* When the function body would grow and inlining the function won't
 	   eliminate the need for offline copy of the function, don't inline.
 	 */
-	if ((mode == INLINE_SIZE
+	if (((mode == INLINE_SIZE || mode == INLINE_SIZE_NORECURSIVE)
 	     || (!flag_inline_functions
 		 && !DECL_DECLARED_INLINE_P (e->callee->decl)))
 	    && (cgraph_estimate_size_after_inlining (1, e->caller, e->callee)
@@ -1531,15 +1535,22 @@ cgraph_early_inlining (void)
 {
   struct cgraph_node *node = cgraph_node (current_function_decl);
   unsigned int todo = 0;
+  int iterations = 0;
 
   if (sorrycount || errorcount)
     return 0;
-  if (cgraph_decide_inlining_incrementally (node, INLINE_SIZE, 0))
+  while (cgraph_decide_inlining_incrementally (node,
+  					       iterations
+					       ? INLINE_SIZE_NORECURSIVE : INLINE_SIZE, 0)
+	 && iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
     {
       timevar_push (TV_INTEGRATION);
-      todo = optimize_inline_calls (current_function_decl);
+      todo |= optimize_inline_calls (current_function_decl);
+      iterations++;
       timevar_pop (TV_INTEGRATION);
     }
+  if (dump_file)
+    fprintf (dump_file, "Iterations: %i\n", iterations);
   cfun->always_inline_functions_inlined = true;
   return todo;
 }
Index: params.def
===================================================================
--- params.def	(revision 147344)
+++ params.def	(working copy)
@@ -139,6 +139,14 @@ DEFPARAM (PARAM_MIN_INLINE_RECURSIVE_PRO
 	  "Inline recursively only when the probability of call being executed exceeds the parameter",
 	  10, 0, 0)
 
+/* Limit of iterations of early inliner.  This basically bounds number of
+   nested indirect calls early inliner can resolve.  Deeper chains are still
+   handled by late inlining.  */
+DEFPARAM (PARAM_EARLY_INLINER_MAX_ITERATIONS,
+	  "max-early-inliner-iterations",
+	  "The maximum number of nested indirect inlining performed by early inliner",
+	  10, 0, 0)
+
 /* Limit the number of expansions created by the variable expansion
    optimization to avoid register pressure.  */
 DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS,


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]