This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR37448 (and dups?)


On Thu, 18 Feb 2016, Richard Biener wrote:

> On Thu, 18 Feb 2016, Jan Hubicka wrote:
> 
> > > 
> > > This fixes the IPA inline analysis quadraticness that arises when we
> > > inline functions into the same function many times via
> > > inline_to_all_callers as we do in the testcase for PR37448.  In that
> > > case updating the overall summary of the caller is done for each
> > > inlined call instead of just once.
> > > 
> > > The solution is to keep track of which nodes we inlined to and
> > > delay the summary update.
> > > 
> > > Honza, I believe this is safe as the callers summary should only
> > > matter for further inline decisions and not this one (inlining
> > > into all callers).  Double-checking is appreciated - there should
> > > be no code changes by this patch (I verified this on the PR37448
> > > testcase).
> > 
> > I think it is not the case if one function contains multiple calls
> > and eventually hits large function growth.  In that case we inline
> > just some callers and not the others.
> 
> Are you sure?  I thought we compute that up-front ... at least
> we make sure we can_inline_edge_p all calls before even trying
> to inline all calls - that looks somewhat redundant then if it
> can happen that we give up anyway.  Ah, so can_inline_edge_p has
> 
>   /* Check if caller growth allows the inlining.  */
>   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
>            && !disregard_limits
>            && !lookup_attribute ("flatten",
>                                  DECL_ATTRIBUTES (caller->decl))
>            && !caller_growth_limits (e))
>     inlinable = false;
> 
> and we check that from when we do the inlining and upfront from
> want_inline_function_to_all_callers_p ... we also give up
> if we inline a recursive function it seems.
> 
> > For next stage1 I plan to finish the overahaul to sreals that will let
> > me to make update_overall_summary incremental (i.e. O(size_of_inlined_function)
> > and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
> > also solve these issues.
> 
> Hopefully.
> 
> > One can probably do a pesimistic estimate on code size of the function 
> > (i.e. add the size of inlined function) and only if that hits the large 
> > function growth do the exact calculation. Or we can just decide that it 
> > is OK to ignore large function growht in this partiuclar case.
> 
> Ignoring it is probably not a good idea and will just lead to a different
> degenerate case.  As we update the summary afterwards it's probably
> ok to do incremental (pessimistic) updates on the size anyway?  In
> inline_merge_summary I mean?  Or should I open-code that into
> inline_call if ! update_overall_summary?

So like below - I update self-size by the growth estimate which is
supposed to match, this should keep the overall function growth
limits working, not sure about the stack stuff but that doesn't seem
to be updated by update_overall_summaries either.

This also fixes a similar issue in early inlining where we can then
trivially delay update_overall_summaries.  I remember seeing early
inline analysis behave similarly quadratic.

It leaves alone recursive and IPA small function inlining as those
update overall unit size as well - I first thought about somehow
making overall summary update lazy, but that looks quite complicated
and IIRC the round-off errors issue was when re-computing the
key for the fibheap after doing one inlining, right?

I hope you can get to use sreals early in next stage1 though I wonder
how that will avoid all corner-cases.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.  Ok for trunk
and branch?

Thanks,
Richard.

2016-02-18  Richard Biener  <rguenther@suse.de>

	PR ipa/37448
	* ipa-inline-transform.c (inline_call): When not updating
	overall summaries adjust self size by the growth estimate.
	* ipa-inline.c (inline_to_all_callers_1): Add to the callers
	hash-set, do not update overall summaries here.  Renamed from ...
	(inline_to_all_callers): ... this which is now wrapping the
	above and performing delayed overall summary update.
	(early_inline_small_functions): Delay updating of the overall
	summary.

Index: gcc/ipa-inline-transform.c
===================================================================
--- gcc/ipa-inline-transform.c	(revision 233498)
+++ gcc/ipa-inline-transform.c	(working copy)
@@ -300,10 +300,12 @@ inline_call (struct cgraph_edge *e, bool
   struct cgraph_edge *curr = e;
   struct cgraph_node *callee = e->callee->ultimate_alias_target ();
   bool new_edges_found = false;
+  int estimated_growth;
 
+  if (! update_overall_summary)
+    estimated_growth = estimate_edge_growth (e);
   /* This is used only for assert bellow.  */
 #if 0
-  int estimated_growth = estimate_edge_growth (e);
   bool predicated = inline_edge_summary (e)->predicate != NULL;
 #endif
 
@@ -373,7 +375,13 @@ inline_call (struct cgraph_edge *e, bool
     new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
   check_speculations (e->callee);
   if (update_overall_summary)
-   inline_update_overall_summary (to);
+    inline_update_overall_summary (to);
+  else
+    /* Update self size by the estimate so overall function growth limits
+       work for further inlining into this function.  Before inlining
+       the function we inlined to again we expect the caller to update
+       the overall summary.  */
+    inline_summaries->get (to)->size += estimated_growth;
   new_size = inline_summaries->get (to)->size;
 
   if (callee->calls_comdat_local)
Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c	(revision 233498)
+++ gcc/ipa-inline.c	(working copy)
@@ -2163,7 +2163,8 @@ flatten_function (struct cgraph_node *no
    recursion.  */
 
 static bool
-inline_to_all_callers (struct cgraph_node *node, void *data)
+inline_to_all_callers_1 (struct cgraph_node *node, void *data,
+			 hash_set<cgraph_node *> *callers)
 {
   int *num_calls = (int *)data;
   bool callee_removed = false;
@@ -2193,7 +2194,10 @@ inline_to_all_callers (struct cgraph_nod
 		   inline_summaries->get (node->callers->caller)->size);
 	}
 
-      inline_call (node->callers, true, NULL, NULL, true, &callee_removed);
+      /* Remember which callers we inlined to, delaying updating the
+	 overall summary.  */
+      callers->add (node->callers->caller);
+      inline_call (node->callers, true, NULL, NULL, false, &callee_removed);
       if (dump_file)
 	fprintf (dump_file,
 		 " Inlined into %s which now has %i size\n",
@@ -2211,6 +2215,23 @@ inline_to_all_callers (struct cgraph_nod
   return false;
 }
 
+/* Wrapper around inline_to_all_callers_1 doing delayed overall summary
+   update.  */
+
+static bool
+inline_to_all_callers (struct cgraph_node *node, void *data)
+{
+  hash_set<cgraph_node *> callers;
+  bool res = inline_to_all_callers_1 (node, data, &callers);
+  /* Perform the delayed update of the overall summary of all callers
+     processed.  This avoids quadratic behavior in the cases where
+     we have a lot of calls to the same function.  */
+  for (hash_set<cgraph_node *>::iterator i = callers.begin ();
+       i != callers.end (); ++i)
+    inline_update_overall_summary (*i);
+  return res;
+}
+
 /* Output overall time estimate.  */
 static void
 dump_overall_stats (void)
@@ -2590,10 +2611,13 @@ early_inline_small_functions (struct cgr
 	fprintf (dump_file, " Inlining %s into %s.\n",
 		 xstrdup_for_dump (callee->name ()),
 		 xstrdup_for_dump (e->caller->name ()));
-      inline_call (e, true, NULL, NULL, true);
+      inline_call (e, true, NULL, NULL, false);
       inlined = true;
     }
 
+  if (inlined)
+    inline_update_overall_summary (node);
+
   return inlined;
 }
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]