This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Partial inlining


Hi,
this patch implements function splitting that allows partial inlining.
Functions are split if there is BB whose frequency is smaller than frequency of
entry BB such that the control flow must past through it just once and
afterwards we don't use values computed beforehand.

I tested the patch on spec2k with and without LTO, spec2k6 and C++ benchmarks.
The storry is always pretty much the same, the patch allows to reduce function
size for inlining from 50 to 40 estimated instructions.  This is still a lot.
However previously we hit number of issues on spec2000 that are gone now.  First
offender is botan that gets huge (>40%) regressions.  It might be interesting to
analyze those.  Offnoise changes on spec2k6 are as follows:

SPEC2k6 time (bigger is better):
milc 6%
povray 4.2%
lbm -2%
wrf 1.1%

SPEC2k6 size (smaller is better):
Perlbench -1.3%
Bzip2 0.3%
gcc 0.1%
gobmk 0.4%
hmmer 0.2%
sjeng 0.1%
libquantum -36%
h264ref -0.7%
omnetpp 0.6%
astar +4%
xlanbmk -1%

So largest gain is probably 36% code size saving at libquantum and 6% speedup
on milc.

We get regression in lbm and some small code size growth when splitting makes
inlining possible, but the inlining is useless.

Total compile time on spec2k6 has reduced a bit (as result of libquantum I
guess), but within noise factor.

Bootstrap time seems also comparable even if we do quite a lot of splitting.

On GCC LTO bootstrap we actually do quite a lot of useful splitting, in
cgraph_node, some vector functions (splitting away the error code) and in many
places where we have cheap guard in header of fucntion.

Due to nature of the problem, there are many possible improvements to the pass,
I want to track them incrementally to avoid pass getting too complicated
without too much of testing.  In particular I think it would be useful

1) resolve some stupid limitations we have now (return by value, EH code etc)
2) make it split large functions (i.e. not to help inlining, but to speed up
   compilation)
3) Add support for passing non-SSA values (currently if split part needs something
   non-SSA we just give up)
4) Make split part to recompute cheap values computed in header
5) Allow splitting into multiple parts
6) ...

There is also problem with debug quality.  Currently the split function will
appear as two functions in backtrace.  I hoped to solve it same way as openMP
but as Jakub told me we don't solve it there and we would probably need dwarf
extension to properly declare split function.
Well, in any case we should be able to fix things back when the split part gets
inlined back to header.

I had the patch in my local tree for a while and it is not causing troubles,
does it seem OK for mainline?

Bootstrapped/regtested x86_64-linux.

Honza

	* tree-pass.h (pass_split_functions): Declare.
	* opts.c (decode_options): Enable function splitting at -O2
	* timevar.def (TV_IPA_FNSPLIT): New macro.
	* ipa-split.c: New file.
	* common.opt (-fpartial-inlining): New flag.
	* Makefile.in (ipa-split.o): New object file.
	* passes.c (init_optimization_passes): Add ipa-split.
	* params.def (max-inline-insns-auto): Reduce max-inline-insns-auto to 40.
	(partial-inlining-entry-probability): New parameters.
	* doc/invoke.texi (-fpartial-inlining): New.
	
	* testsuite/gcc.dg/tree-ssa/ipa-split.c

/* { dg-do compile } */
/* { dg-options "-O3 -fdump-tree-fnsplit" } */
int test2(a)
{
   if (a<100)
     return 1;
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   do_something_big ();
   return 0;
}

test()
{
  test2(10);
  test2(20);
}
/* { dg-final { scan-tree-dump-times "Splitting function" 1 "fnsplit"} } */
/* { dg-final { cleanup-tree-dump "fnsplit" } } */
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 160109)
+++ tree-pass.h	(working copy)
@@ -442,6 +442,7 @@ extern struct gimple_opt_pass pass_build
 extern struct gimple_opt_pass pass_local_pure_const;
 extern struct gimple_opt_pass pass_tracer;
 extern struct gimple_opt_pass pass_warn_unused_result;
+extern struct gimple_opt_pass pass_split_functions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_function_and_variable_visibility;
Index: opts.c
===================================================================
--- opts.c	(revision 160109)
+++ opts.c	(working copy)
@@ -910,6 +910,7 @@ decode_options (unsigned int argc, const
   opt2 = (optimize >= 2);
   flag_inline_small_functions = opt2;
   flag_indirect_inlining = opt2;
+  flag_partial_inlining = opt2;
   flag_thread_jumps = opt2;
   flag_crossjumping = opt2;
   flag_optimize_sibling_calls = opt2;
Index: timevar.def
===================================================================
--- timevar.def	(revision 160109)
+++ timevar.def	(working copy)
@@ -52,6 +52,7 @@ DEFTIMEVAR (TV_CGRAPH                , "
 DEFTIMEVAR (TV_CGRAPHOPT             , "callgraph optimization")
 DEFTIMEVAR (TV_VARPOOL               , "varpool construction")
 DEFTIMEVAR (TV_IPA_CONSTANT_PROP     , "ipa cp")
+DEFTIMEVAR (TV_IPA_FNSPLIT           , "ipa function splitting")
 DEFTIMEVAR (TV_IPA_LTO_GIMPLE_IO     , "ipa lto gimple I/O")
 DEFTIMEVAR (TV_IPA_LTO_DECL_IO       , "ipa lto decl I/O")
 DEFTIMEVAR (TV_IPA_LTO_DECL_INIT_IO  , "ipa lto decl init I/O")
Index: ipa-split.c
===================================================================
--- ipa-split.c	(revision 0)
+++ ipa-split.c	(revision 0)
@@ -0,0 +1,1069 @@
+/* Function splitting pass
+   Copyright (C) 2010
+   Free Software Foundation, Inc.
+   Contributed by Jan Hubicka  <jh@suse.cz>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* The purpose of this pass is to split function bodies to improve
+   inlining.  I.e. for function of the form:
+
+   func (...)
+     {
+       if (cheap_test)
+	 something_small
+       else
+	 something_big
+     }
+
+   Produce:
+
+   func.part (...)
+     {
+	something_big
+     }
+
+   func (...)
+     {
+       if (cheap_test)
+	 something_small
+       else
+	 func.part (...);
+     }
+
+   When func becomes inlinable and when cheap_test is often true, inlining func,
+   but not fund.part leads to performance imrovement similar as inlining original
+   func while the code size growth is smaller.
+
+   The pass is organized in three stages:
+   1) Collect local info about basic block into BB_INFO structure and
+      compute function body estimated size and time.
+   2) Via DFS walk find all possible basic blocks where we can split
+      and chose best one.
+   3) If split point is found, split at the specified BB by creating a clone
+      and updating function to call it.  
+
+   The decisions what functions to split are in execute_split_functions
+   and consider_split.  
+
+   There are several possible future improvements for this pass including:
+
+   1) Splitting to break up large functions
+   2) Splitting to reduce stack frame usage
+   3) Allow split part of function to use values computed in the header part.
+      The values needs to be passed to split function, perhaps via same interface
+      as for nested functions or as argument.
+   4) Support for simple rematerialization.  I.e. when split part use
+      value computed in header from function parameter in very cheap way, we
+      can just recompute it.
+   5) Support splitting of nested functions.
+   6) Support non-SSA arguments.  
+   7) There is nothing preventing us from producing multiple parts of single function
+      when needed or splitting also the parts.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "target.h"
+#include "cgraph.h"
+#include "ipa-prop.h"
+#include "tree-flow.h"
+#include "tree-pass.h"
+#include "flags.h"
+#include "timevar.h"
+#include "diagnostic.h"
+#include "tree-dump.h"
+#include "tree-inline.h"
+#include "fibheap.h"
+#include "params.h"
+#include "gimple-pretty-print.h"
+
+/* Per basic block info.  */
+
+typedef struct
+{
+  unsigned int size;
+  unsigned int time;
+} bb_info;
+DEF_VEC_O(bb_info);
+DEF_VEC_ALLOC_O(bb_info,heap);
+
+static VEC(bb_info, heap) *bb_info_vec;
+
+/* Description of split point.  */
+
+struct split_point
+{
+  /* Size of the partitions.  */
+  unsigned int header_time, header_size, split_time, split_size;
+
+  /* SSA names that need to be passed into spit funciton.  */
+  bitmap ssa_names_to_pass;
+
+  /* Basic block where we split (that will become entry point of new function.  */
+  basic_block entry_bb;
+
+  /* Basic blocks we are splitting away.  */
+  bitmap split_bbs;
+};
+
+/* Best split point found.  */
+
+struct split_point best_split_point;
+
+/* Callback for walk_stmt_load_store_addr_ops.  If T is non-ssa automatic
+   variable, check it if it is present in bitmap passed via DATA.  */
+
+static bool
+test_nonssa_use (gimple stmt ATTRIBUTE_UNUSED, tree t,
+	         void *data ATTRIBUTE_UNUSED)
+{
+  t = get_base_address (t);
+
+  if (t && !is_gimple_reg (t)
+      && ((TREE_CODE (t) == VAR_DECL && auto_var_in_fn_p (t, current_function_decl))
+	  || (TREE_CODE (t) == PARM_DECL)))
+    return bitmap_bit_p ((bitmap)data, DECL_UID (t));
+  return false;
+}
+
+/* Dump split point CURRENT.  */
+
+static void
+dump_split_point (FILE * file, struct split_point *current)
+{
+  fprintf (file,
+	   "Split point at BB %i header time:%i header size: %i split time: %i split size: %i\n  bbs: ",
+	   current->entry_bb->index, current->header_time,
+	   current->header_size, current->split_time, current->split_size);
+  dump_bitmap (file, current->split_bbs);
+  fprintf (file, "  SSA names to pass: ");
+  dump_bitmap (file, current->ssa_names_to_pass);
+}
+
+/* We found an split_point CURRENT.  NON_SSA_VARS is bitmap of all non ssa
+   variables used and RETURN_BB is return basic block.
+   See if we can split function here.  */
+
+static void
+consider_split (struct split_point *current, bitmap non_ssa_vars,
+		basic_block return_bb)
+{
+  tree parm;
+  unsigned int num_args = 0;
+  unsigned int call_overhead;
+  edge e;
+  edge_iterator ei;
+  if (dump_file)
+    dump_split_point (dump_file, current);
+
+  /* Do not split when we would end up calling function anyway.  */
+  if (current->entry_bb->frequency
+      >= (ENTRY_BLOCK_PTR->frequency
+	  * PARAM_VALUE (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY) / 100))
+    {
+      if (dump_file)
+	fprintf (dump_file,
+		 "  Refused: split BB frequency is too large.\n");
+      return;
+    }
+
+  if (!current->header_size)
+    {
+      if (dump_file)
+	fprintf (dump_file, "  Refused: header empty\n");
+      gcc_unreachable ();
+      return;
+    }
+
+  /* FIXME: We can do better: if the split region start with a loop and there is only
+     one entry point from outer wrold, we can update PHI.  */
+  if (!gsi_end_p (gsi_start_phis (current->entry_bb)))
+    {
+      if (dump_file)
+	fprintf (dump_file,
+		 "  Refused: entry BB has PHI\n");
+      return;
+    }
+
+
+  /* See what argument we will pass to the split function and compute
+     call overhead.  */
+  call_overhead = eni_size_weights.call_cost;
+  for (parm = DECL_ARGUMENTS (current_function_decl); parm;
+       parm = TREE_CHAIN (parm))
+    {
+      if (!is_gimple_reg (parm))
+	{
+	  if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm)))
+	    {
+	      if (dump_file)
+		fprintf (dump_file,
+			 "  Refused: need to pass non-ssa param values\n");
+	      return;
+	    }
+	}
+      else if (gimple_default_def (cfun, parm)
+	       && bitmap_bit_p (current->ssa_names_to_pass,
+				SSA_NAME_VERSION (gimple_default_def
+						  (cfun, parm))))
+	{
+	  if (!VOID_TYPE_P (TREE_TYPE (parm)))
+	    call_overhead += estimate_move_cost (TREE_TYPE (parm));
+	  num_args++;
+	}
+    }
+  if (!VOID_TYPE_P (TREE_TYPE (current_function_decl)))
+    call_overhead += estimate_move_cost (TREE_TYPE (current_function_decl));
+
+  if (current->split_size <= call_overhead)
+    {
+      if (dump_file)
+	fprintf (dump_file,
+		 "  Refused: split size is smaller than call overhead\n");
+      return;
+    }
+  if (current->header_size + call_overhead
+      >= (unsigned int)(DECL_DECLARED_INLINE_P (current_function_decl)
+			? MAX_INLINE_INSNS_SINGLE
+			: MAX_INLINE_INSNS_AUTO))
+    {
+      if (dump_file)
+	fprintf (dump_file,
+		 "  Refused: header size is too large for inline candidate\n");
+      return;
+    }
+
+  /* FIXME: we currently can pass only SSA function parameters to the split
+     arguments.  Once parm_adjustment infrastructure is supported by clonning,
+     we can pass more than that.  */
+  if (num_args != bitmap_count_bits (current->ssa_names_to_pass))
+    {
+      if (dump_file)
+	fprintf (dump_file,
+		 "  Refused: need to pass non-param values\n");
+      return;
+    }
+
+  /* When there are non-ssa vars used in the split region, see if they
+     are used in the header region.  If so, reject the split.
+     FIXME: we can use nested function support to access both.  */
+  if (!bitmap_empty_p (non_ssa_vars))
+    {
+      basic_block bb;
+      FOR_EACH_BB (bb)
+	if (!bitmap_bit_p (current->split_bbs, bb->index))
+	  {
+	    gimple_stmt_iterator bsi;
+	    for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+	      {
+		if (is_gimple_debug (gsi_stmt (bsi)))
+		  continue;
+		if (walk_stmt_load_store_addr_ops
+		    (gsi_stmt (bsi), non_ssa_vars, test_nonssa_use,
+		     test_nonssa_use, test_nonssa_use))
+		  {
+		    if (dump_file)
+		      fprintf (dump_file,
+			       "  Refused: split part has non-ssa uses\n");
+		    return;
+		  }
+	      }
+	    for (bsi = gsi_start_phis (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+	      {
+		if (is_gimple_debug (gsi_stmt (bsi)))
+		  continue;
+		if (walk_stmt_load_store_addr_ops
+		    (gsi_stmt (bsi), non_ssa_vars, test_nonssa_use,
+		     test_nonssa_use, test_nonssa_use))
+		  {
+		    if (dump_file)
+		      fprintf (dump_file,
+			       "  Refused: split part has non-ssa uses\n");
+		    return;
+		  }
+	      }
+	    FOR_EACH_EDGE (e, ei, bb->succs)
+	      if (e->dest == return_bb)
+		{
+		  for (bsi = gsi_start_phis (return_bb); !gsi_end_p (bsi);
+		       gsi_next (&bsi))
+		    {
+		      gimple stmt = gsi_stmt (bsi);
+		      tree op = gimple_phi_arg_def (stmt, e->dest_idx);
+
+		      if (!is_gimple_reg (gimple_phi_result (stmt)))
+			continue;
+		      STRIP_NOPS (op);
+		      if (TREE_CODE (op) != SSA_NAME
+			  && test_nonssa_use (stmt, op, non_ssa_vars))
+			{
+			  if (dump_file)
+			    fprintf (dump_file,
+				     "  Refused: split part has non-ssa uses\n");
+			  return;
+			}
+		    }
+		}
+	  }
+      return;
+    }
+  if (dump_file)
+    fprintf (dump_file, "  Accepted!\n");
+
+  /* At the moment chose split point with lowest frequency and that leaves
+     out smallest size of header.
+     In future we might re-consider this heuristics.  */
+  if (!best_split_point.split_bbs
+      || best_split_point.entry_bb->frequency > current->entry_bb->frequency
+      || (best_split_point.entry_bb->frequency == current->entry_bb->frequency
+	  && best_split_point.split_size < current->split_size))
+	
+    {
+      if (dump_file)
+	fprintf (dump_file, "  New best split point!\n");
+      if (best_split_point.ssa_names_to_pass)
+	{
+	  BITMAP_FREE (best_split_point.ssa_names_to_pass);
+	  BITMAP_FREE (best_split_point.split_bbs);
+	}
+      best_split_point = *current;
+      best_split_point.ssa_names_to_pass = BITMAP_ALLOC (NULL);
+      bitmap_copy (best_split_point.ssa_names_to_pass,
+		   current->ssa_names_to_pass);
+      best_split_point.split_bbs = BITMAP_ALLOC (NULL);
+      bitmap_copy (best_split_point.split_bbs, current->split_bbs);
+    }
+}
+
+/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if none
+   found.  */
+
+static basic_block
+find_return_bb (void)
+{
+  edge e;
+  edge_iterator ei;
+  bool found_return = false;
+
+  if (EDGE_COUNT (EXIT_BLOCK_PTR->preds) == 1)
+    FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+      {
+	gimple_stmt_iterator bsi;
+	for (bsi = gsi_start_bb (e->src); !gsi_end_p (bsi); gsi_next (&bsi))
+	  if (gimple_code (gsi_stmt (bsi)) != GIMPLE_RETURN
+	      && gimple_code (gsi_stmt (bsi)) != GIMPLE_LABEL
+	      && !is_gimple_debug (gsi_stmt (bsi)))
+	    break;
+	  else if (gimple_code (gsi_stmt (bsi)) == GIMPLE_RETURN)
+	    found_return = true;
+	if (gsi_end_p (bsi) && found_return)
+	  return e->src;
+      }
+  return EXIT_BLOCK_PTR;
+}
+
+/* Callback for walk_stmt_load_store_addr_ops.  If T is non-ssa automatic
+   variable, mark it as used in bitmap passed via DATA. 
+   Return true when access to T prevents splitting the function.  */
+
+static bool
+mark_nonssa_use (gimple stmt ATTRIBUTE_UNUSED, tree t,
+	         void *data ATTRIBUTE_UNUSED)
+{
+  t = get_base_address (t);
+
+  if (!t || is_gimple_reg (t))
+    return false;
+
+  /* At present we can't pass non-SSA arguments to split function.
+     FIXME: this can be relaxed by passing references to arguments.  */
+  if (TREE_CODE (t) == PARM_DECL)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Can not split use of non-ssa function parameter.\n");
+      return true;
+    }
+
+  if (TREE_CODE (t) == VAR_DECL && auto_var_in_fn_p (t, current_function_decl))
+    bitmap_set_bit ((bitmap)data, DECL_UID (t));
+  return false;
+}
+
+/* Compute local properties of basic block BB we collect when looking for split points
+   point.
+   We look for ssa defs and store them in SET_SSA_NAMES, for ssa uses and store them
+   in USED_SSA_NAMES and for any non-SSA automatic vars stored in NON_SSA_VARS.
+
+   When BB has edge to RETURN_BB, collect uses in RETURN_BB too.  
+
+   Return false when BB contains something that prevents it from being put into
+   split function.  */
+
+static bool
+visit_bb (basic_block bb, basic_block return_bb,
+	  bitmap set_ssa_names, bitmap used_ssa_names,
+	  bitmap non_ssa_vars)
+{
+  gimple_stmt_iterator bsi;
+  edge e;
+  edge_iterator ei;
+  bool can_split = true;
+
+  for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+    {
+      gimple stmt = gsi_stmt (bsi);
+      tree op;
+      ssa_op_iter iter;
+      tree decl;
+
+      if (is_gimple_debug (stmt))
+	continue;
+
+      /* FIXME: We can split regions containing EH.  We can not however
+	 split RESX, EH_DISPATCH and EH_POINTER referring to same region
+	 into different partitions.  This would require tracking of
+	 EH regions and checking in consider_split_point if they 
+	 are not used elsewhere.  */
+      if (gimple_code (stmt) == GIMPLE_RESX
+	  && stmt_can_throw_external (stmt))
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "Can not split external resx.\n");
+	  can_split = false;
+	}
+      if (gimple_code (stmt) == GIMPLE_EH_DISPATCH)
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "Can not split eh dispatch.\n");
+	  can_split = false;
+	}
+
+      /* Check builtins that prevent splitting.  */
+      if (gimple_code (stmt) == GIMPLE_CALL
+	  && (decl = gimple_call_fndecl (stmt)) != NULL_TREE
+	  && DECL_BUILT_IN (decl))
+	switch (DECL_FUNCTION_CODE (decl))
+	  {
+	  /* FIXME: once we will allow passing non-parm values to split part,
+	     we need to be sure to handle correct builtin_stack_save and
+	     builtin_stack_restore.  At the moment we are safe; there is no
+	     way to store builtin_stack_save result in non-SSA variable
+	     since all calls to those are compiler generated.  */
+	  case BUILT_IN_APPLY:
+	  case BUILT_IN_VA_START:
+	    if (dump_file)
+	      fprintf (dump_file, "Can not split builtin_apply and va_start.\n");
+	    can_split = false;
+	    break;
+	  case BUILT_IN_EH_POINTER:
+	    if (dump_file)
+	      fprintf (dump_file, "Can not split builtin_eh_pointer.\n");
+	    can_split = false;
+	    break;
+	  default:
+	    break;
+	  }
+
+      FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_DEF)
+	if (TREE_CODE (op) == SSA_NAME)
+	  bitmap_set_bit (set_ssa_names, SSA_NAME_VERSION (op));
+      FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
+	if (TREE_CODE (op) == SSA_NAME)
+	  bitmap_set_bit (used_ssa_names, SSA_NAME_VERSION (op));
+      can_split &= !walk_stmt_load_store_addr_ops (stmt, non_ssa_vars,
+						   mark_nonssa_use,
+						   mark_nonssa_use,
+						   mark_nonssa_use);
+    }
+  for (bsi = gsi_start_phis (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+    {
+      gimple stmt = gsi_stmt (bsi);
+      tree op;
+      ssa_op_iter iter;
+
+      if (is_gimple_debug (stmt))
+	continue;
+      if (!is_gimple_reg (gimple_phi_result (stmt)))
+	continue;
+      FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_DEF)
+	if (TREE_CODE (op) == SSA_NAME)
+	  bitmap_set_bit (set_ssa_names, SSA_NAME_VERSION (op));
+      FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
+	if (TREE_CODE (op) == SSA_NAME)
+	  bitmap_set_bit (used_ssa_names, SSA_NAME_VERSION (op));
+      can_split &= !walk_stmt_load_store_addr_ops (stmt, non_ssa_vars,
+						   mark_nonssa_use,
+						   mark_nonssa_use,
+						   mark_nonssa_use);
+    }
+  /* Record also uses comming from PHI operand in return BB.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    if (e->dest == return_bb)
+      {
+	bool found_phi = false;
+	for (bsi = gsi_start_phis (return_bb); !gsi_end_p (bsi); gsi_next (&bsi))
+	  {
+	    gimple stmt = gsi_stmt (bsi);
+	    tree op = gimple_phi_arg_def (stmt, e->dest_idx);
+
+	    if (is_gimple_debug (stmt))
+	      continue;
+	    if (!is_gimple_reg (gimple_phi_result (stmt)))
+	      continue;
+	    found_phi = true;
+	    STRIP_NOPS (op);
+	    if (TREE_CODE (op) == SSA_NAME)
+	      bitmap_set_bit (used_ssa_names, SSA_NAME_VERSION (op));
+	    else
+	      can_split &= !mark_nonssa_use (stmt, op, non_ssa_vars);
+	  }
+	if (!gsi_end_p (gsi_last_bb (return_bb)))
+	  {
+	    ssa_op_iter iter;
+	    gimple stmt = gsi_stmt (gsi_last_bb (return_bb));
+	    tree op;
+	    if (!found_phi)
+	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
+		if (TREE_CODE (op) == SSA_NAME)
+		  bitmap_set_bit (used_ssa_names, SSA_NAME_VERSION (op));
+	    can_split &= !walk_stmt_load_store_addr_ops (stmt, non_ssa_vars,
+							 mark_nonssa_use,
+							 mark_nonssa_use,
+							 mark_nonssa_use);
+	  }
+      }
+  return can_split;
+}
+
+/* Stack entry for recursive DFS walk in find_split_point.  */
+
+typedef struct
+{
+  /* Basic block we are examining.  */
+  basic_block bb;
+
+  /* SSA names set and used by the BB and all BBs reachable
+     from it via DFS walk.  */
+  bitmap set_ssa_names, used_ssa_names;
+  bitmap non_ssa_vars;
+
+  /* All BBS visited from this BB via DFS walk.  */
+  bitmap bbs_visited;
+
+  /* Last examined edge in DFS walk.  Since we walk unoriented graph,
+     the value is up to sum of incomming and outgoing edges of BB.  */
+  unsigned int edge_num;
+
+  /* Stack entry index of earliest BB reachable from current BB
+     or any BB visited later in DFS valk.  */
+  int earliest;
+
+  /* Overall time and size of all BBs reached from this BB in DFS walk.  */
+  int overall_time, overall_size;
+
+  /* When false we can not split on this BB.  */
+  bool can_split;
+} stack_entry;
+DEF_VEC_O(stack_entry);
+DEF_VEC_ALLOC_O(stack_entry,heap);
+
+
+/* Find all articulations and call consider_split on them.
+   OVERALL_TIME and OVERALL_SIZE is time and size of the function.
+
+   We perform basic algorithm for finding an articulation in a graph
+   created from CFG by considering it to be an unoriented graph.
+
+   The articulation is discovered via DFS walk. We collect earliest
+   basic block on stack that is reachable via backward edge.  Articulation
+   is any basic block such that there is no backward edge bypassing it.
+   To reduce stack usage we maintain heap allocated stack in STACK vector.
+   AUX pointer of BB is set to index it appears in the stack or -1 once
+   it is visited and popped off the stack.
+
+   The algorithm finds articulation after visiting the whole component
+   reachable by it.  This makes it convenient to collect information about
+   the component used by consider_split.  */
+
+static void
+find_split_points (int overall_time, int overall_size)
+{
+  stack_entry first;
+  VEC(stack_entry, heap) *stack = NULL;
+  basic_block bb;
+  basic_block return_bb = find_return_bb ();
+  struct split_point current;
+
+  current.header_time = overall_time;
+  current.header_size = overall_size;
+  current.split_time = 0;
+  current.split_size = 0;
+  current.ssa_names_to_pass = BITMAP_ALLOC (NULL);
+
+  first.bb = ENTRY_BLOCK_PTR;
+  first.edge_num = 0;
+  first.overall_time = 0;
+  first.overall_size = 0;
+  first.earliest = INT_MAX;
+  first.set_ssa_names = 0;
+  first.used_ssa_names = 0;
+  first.bbs_visited = 0;
+  VEC_safe_push (stack_entry, heap, stack, &first);
+  ENTRY_BLOCK_PTR->aux = (void *)(ptrdiff_t)-1;
+
+  while (!VEC_empty (stack_entry, stack))
+    {
+      stack_entry *entry = VEC_last (stack_entry, stack);
+
+      /* We are wlaking acyclic graph, so edge_num counts
+	 succ and pred edges together.  However when considering
+         articulation, we want to have processed everything reachable
+	 from articulation but nothing that reaches into it.  */
+      if (entry->edge_num == EDGE_COUNT (entry->bb->succs)
+	  && entry->bb != ENTRY_BLOCK_PTR)
+	{
+	  int pos = VEC_length (stack_entry, stack);
+	  entry->can_split &= visit_bb (entry->bb, return_bb,
+					entry->set_ssa_names,
+					entry->used_ssa_names,
+					entry->non_ssa_vars);
+	  if (pos <= entry->earliest && !entry->can_split && dump_file)
+	    fprintf (dump_file,
+		     "found articulation at bb %i but can not split\n",
+		     entry->bb->index);
+	  if (pos <= entry->earliest && entry->can_split)
+	     {
+	       if (dump_file)
+		 fprintf (dump_file, "found articulation at bb %i\n",
+			  entry->bb->index);
+	       current.entry_bb = entry->bb;
+	       current.ssa_names_to_pass = BITMAP_ALLOC (NULL);
+	       bitmap_and_compl (current.ssa_names_to_pass,
+				 entry->used_ssa_names, entry->set_ssa_names);
+	       current.header_time = overall_time - entry->overall_time;
+	       current.header_size = overall_size - entry->overall_size;
+	       current.split_time = entry->overall_time;
+	       current.split_size = entry->overall_size;
+	       current.split_bbs = entry->bbs_visited;
+	       consider_split (&current, entry->non_ssa_vars, return_bb);
+	       BITMAP_FREE (current.ssa_names_to_pass);
+	     }
+	}
+      /* Do actual DFS walk.  */
+      if (entry->edge_num
+	  < (EDGE_COUNT (entry->bb->succs)
+	     + EDGE_COUNT (entry->bb->preds)))
+	{
+          edge e;
+	  basic_block dest;
+	  if (entry->edge_num < EDGE_COUNT (entry->bb->succs))
+	    {
+	      e = EDGE_SUCC (entry->bb, entry->edge_num);
+	      dest = e->dest;
+	    }
+	  else
+	    {
+	      e = EDGE_PRED (entry->bb, entry->edge_num - EDGE_COUNT (entry->bb->succs));
+	      dest = e->src;
+	    }
+
+	  entry->edge_num++;
+
+	  /* New BB to visit, push it to the stack.  */
+	  if (dest != return_bb && dest != EXIT_BLOCK_PTR
+	      && !dest->aux)
+	    {
+	      stack_entry new_entry;
+
+	      new_entry.bb = dest;
+	      new_entry.edge_num = 0;
+	      new_entry.overall_time
+		 = VEC_index (bb_info, bb_info_vec, dest->index)->time;
+	      new_entry.overall_size
+		 = VEC_index (bb_info, bb_info_vec, dest->index)->size;
+	      new_entry.earliest = INT_MAX;
+	      new_entry.set_ssa_names = BITMAP_ALLOC (NULL);
+	      new_entry.used_ssa_names = BITMAP_ALLOC (NULL);
+	      new_entry.bbs_visited = BITMAP_ALLOC (NULL);
+	      new_entry.non_ssa_vars = BITMAP_ALLOC (NULL);
+	      new_entry.can_split = true;
+	      bitmap_set_bit (new_entry.bbs_visited, dest->index);
+	      VEC_safe_push (stack_entry, heap, stack, &new_entry);
+	      dest->aux = (void *)(ptrdiff_t)VEC_length (stack_entry, stack);
+	    }
+	  /* Back edge found, record the earliest point.  */
+	  else if ((ptrdiff_t)dest->aux > 0
+		   && (ptrdiff_t)dest->aux < entry->earliest)
+	    entry->earliest = (ptrdiff_t)dest->aux;
+	}
+      /* We are done with examing the edges. pop off the value from stack and
+	 merge stuff we cummulate during the walk.  */
+      else if (entry->bb != ENTRY_BLOCK_PTR)
+	{
+	  stack_entry *prev = VEC_index (stack_entry, stack,
+					 VEC_length (stack_entry, stack) - 2);
+
+	  entry->bb->aux = (void *)(ptrdiff_t)-1;
+	  prev->can_split &= entry->can_split;
+	  if (prev->set_ssa_names)
+	    {
+	      bitmap_ior_into (prev->set_ssa_names, entry->set_ssa_names);
+	      bitmap_ior_into (prev->used_ssa_names, entry->used_ssa_names);
+	      bitmap_ior_into (prev->bbs_visited, entry->bbs_visited);
+	      bitmap_ior_into (prev->non_ssa_vars, entry->non_ssa_vars);
+	    }
+	  if (prev->earliest > entry->earliest)
+	    prev->earliest = entry->earliest;
+	  prev->overall_time += entry->overall_time;
+	  prev->overall_size += entry->overall_size;
+	  BITMAP_FREE (entry->set_ssa_names);
+	  BITMAP_FREE (entry->used_ssa_names);
+	  BITMAP_FREE (entry->bbs_visited);
+	  BITMAP_FREE (entry->non_ssa_vars);
+	  VEC_pop (stack_entry, stack);
+	}
+      else
+        VEC_pop (stack_entry, stack);
+    }
+  ENTRY_BLOCK_PTR->aux = NULL;
+  FOR_EACH_BB (bb)
+    bb->aux = NULL;
+  BITMAP_FREE (current.ssa_names_to_pass);
+}
+
+/* Split function at SPLIT_POINT.  */
+
+static void
+split_function (struct split_point *split_point)
+{
+  VEC (tree, heap) *args_to_pass = NULL;
+  bitmap args_to_skip = BITMAP_ALLOC (NULL);
+  tree parm;
+  int num = 0;
+  struct cgraph_node *node;
+  basic_block return_bb = find_return_bb ();
+  basic_block call_bb;
+  gimple_stmt_iterator gsi;
+  gimple call;
+  edge e;
+  edge_iterator ei;
+  tree retval = NULL, real_retval = NULL;
+  bool split_part_return_p = false;
+  gimple last_stmt = NULL;
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "\n\nSplitting function at:\n");
+      dump_split_point (dump_file, split_point);
+    }
+
+  /* Collect the parameters of new function and args_to_skip bitmap.  */
+  for (parm = DECL_ARGUMENTS (current_function_decl);
+       parm; parm = TREE_CHAIN (parm), num++)
+    if (!is_gimple_reg (parm)
+	|| !gimple_default_def (cfun, parm)
+	|| !bitmap_bit_p (split_point->ssa_names_to_pass,
+			  SSA_NAME_VERSION (gimple_default_def (cfun, parm))))
+      bitmap_set_bit (args_to_skip, num);
+    else
+      VEC_safe_push (tree, heap, args_to_pass, gimple_default_def (cfun, parm));
+
+  /* See if the split functionwill return.  */
+  FOR_EACH_EDGE (e, ei, return_bb->preds)
+    if (bitmap_bit_p (split_point->split_bbs, e->src->index))
+      break;
+  if (e)
+    split_part_return_p = true;
+
+  /* If we return, we will need the return block.  */
+  if (return_bb != EXIT_BLOCK_PTR && split_part_return_p)
+    bitmap_set_bit (split_point->split_bbs, return_bb->index);
+
+  /* Now create the actual clone.  */
+  rebuild_cgraph_edges ();
+  node = cgraph_function_versioning (cgraph_node (current_function_decl),
+				     NULL, NULL,
+				     args_to_skip,
+				     split_point->split_bbs,
+				     split_point->entry_bb, "_part");
+  cgraph_node_remove_callees (cgraph_node (current_function_decl));
+  if (!split_part_return_p)
+    TREE_THIS_VOLATILE (node->decl) = 1;
+  if (dump_file)
+    dump_function_to_file (node->decl, dump_file, dump_flags);
+
+  /* Create the basic block we place call into.  It is the entry basic block
+     split after last label.  */
+  call_bb = split_point->entry_bb;
+  for (gsi = gsi_start_bb (call_bb); !gsi_end_p (gsi);)
+    if (gimple_code (gsi_stmt (gsi)) == GIMPLE_LABEL)
+      {
+	last_stmt = gsi_stmt (gsi);
+	gsi_next (&gsi);
+      }
+    else
+      break;
+  e = split_block (split_point->entry_bb, last_stmt);
+  remove_edge (e);
+
+  /* Produce the call statement.  */
+  gsi = gsi_last_bb (call_bb);
+  call = gimple_build_call_vec (node->decl, args_to_pass);
+  gimple_set_block (call, DECL_INITIAL (current_function_decl));
+
+  /* Update return value.  This is bit tricky.  When we do not return,
+     do nothing.  When we return we might need to update return_bb
+     or produce new return statement.  */
+  if (!split_part_return_p)
+    gsi_insert_after (&gsi, call, GSI_NEW_STMT);
+  else
+    {
+      e = make_edge (call_bb, return_bb,
+		     return_bb == EXIT_BLOCK_PTR ? 0 : EDGE_FALLTHRU);
+      e->count = call_bb->count;
+      e->probability = REG_BR_PROB_BASE;
+      if (return_bb != EXIT_BLOCK_PTR)
+	{
+	  gimple return_stmt = gsi_stmt (gsi_last_bb (return_bb));
+	  gcc_assert (gimple_code (return_stmt) == GIMPLE_RETURN);
+
+	  if ((real_retval = retval = gimple_return_retval (return_stmt))
+	      && !is_gimple_min_invariant (retval)
+	      && (TREE_CODE (retval) != SSA_NAME
+		  || !SSA_NAME_IS_DEFAULT_DEF (retval)))
+	    {
+	      gimple_stmt_iterator psi;
+
+	      /* See if there is PHI definind return value.  */
+	      for (psi = gsi_start_phis (return_bb);
+		   !gsi_end_p (psi); gsi_next (&psi))
+		if (is_gimple_reg (gimple_phi_result (gsi_stmt (psi))))
+		  break;
+
+	      /* When we have PHI, update PHI.  When there is no PHI,
+		 update the return statement itself.  */
+	      if (TREE_CODE (retval) == SSA_NAME)
+		{
+		  retval = make_ssa_name (SSA_NAME_VAR (retval), call);
+		  if (TREE_CODE (retval) == SSA_NAME
+		      && !gsi_end_p (psi))
+		    add_phi_arg (gsi_stmt (psi), retval, e, UNKNOWN_LOCATION);
+		  else if (TREE_CODE (retval) == SSA_NAME)
+		    {
+		      gimple_return_set_retval (return_stmt, retval);
+		      update_stmt (return_stmt);
+		    }
+		}
+	      gimple_call_set_lhs (call, retval);
+	    }
+          gsi_insert_after (&gsi, call, GSI_NEW_STMT);
+	}
+      else
+	{
+	  gimple ret;
+	  if (!VOID_TYPE_P (TREE_TYPE (TREE_TYPE (current_function_decl))))
+	    {
+	      retval
+	        = create_tmp_var (TREE_TYPE (TREE_TYPE (current_function_decl)),
+				  "RET");
+	      if (is_gimple_reg (retval))
+		retval = make_ssa_name (retval, call);
+	      gimple_call_set_lhs (call, retval);
+	    }
+          gsi_insert_after (&gsi, call, GSI_NEW_STMT);
+	  ret = gimple_build_return (retval);
+	  gsi_insert_after (&gsi, ret, GSI_NEW_STMT);
+	}
+    }
+  free_dominance_info (CDI_DOMINATORS);
+  free_dominance_info (CDI_POST_DOMINATORS);
+  compute_inline_parameters (node);
+}
+
+/* Execute function splitting pass.  */
+
+static unsigned int
+execute_split_functions (void)
+{
+  gimple_stmt_iterator bsi;
+  basic_block bb;
+  int overall_time = 0, overall_size = 0;
+  int todo = 0;
+  edge e;
+  edge_iterator ei;
+  int nreturns = 0;
+  struct cgraph_node *node = cgraph_node (current_function_decl);
+
+  if (flags_from_decl_or_type (current_function_decl) & ECF_NORETURN)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: noreturn function.\n");
+      return 0;
+    }
+  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: main function.\n");
+      return 0;
+    }
+  /* This can be relaxed; function might become inlinable after splitting
+     away the uninlinable part.  */
+  if (!node->local.inlinable)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: not inlinable.\n");
+      return 0;
+    }
+  if (node->local.disregard_inline_limits)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: disregading inline limits.\n");
+      return 0;
+    }
+  /* This can be relaxed; most of versioning tests actually prevents
+     a duplication.  */
+  if (!tree_versionable_function_p (current_function_decl))
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: not versionable.\n");
+      return 0;
+    }
+  /* FIXME: we could support this.  */
+  if (DECL_STRUCT_FUNCTION (current_function_decl)->static_chain_decl)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: nested function.\n");
+      return 0;
+    }
+  /* FIXME: Should be easy to support.  */
+  if (DECL_BY_REFERENCE (DECL_RESULT (current_function_decl)))
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: returns value by reference.\n");
+      return 0;
+    }
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+    if (gimple_code (gsi_stmt (gsi_last_bb (e->src))) == GIMPLE_RETURN)
+      nreturns++;
+  /* FIXME: This happens only when we have one return with a value and other
+     return without.  We can handle this easily by making find_return_bb to find
+     version with a return value.  */
+  if (nreturns > 1)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: multiple return statements.\n");
+      return 0;
+    }
+
+  /* See if it makes sense to try to split.
+     It makes sense to split if we inline, that is if we have direct calls to
+     handle or direct calls are possibly going to appear as result of indirect
+     inlining or LTO.
+     Note that we are not completely conservative about disqualifying functions
+     called once.  It is possible that the caller is called more then once and
+     then inlining would still benefit.  */
+  if ((!node->callers || !node->callers->next_caller)
+      && !node->address_taken
+      && ((!flag_lto && !flag_whopr) || !node->local.externally_visible))
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: not called directly or called once.\n");
+      return 0;
+    }
+
+  /* FIXME: We can actually split if splitting reduces call overhead.  */
+  if (!flag_inline_small_functions
+      && !DECL_DECLARED_INLINE_P (current_function_decl))
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not splitting: not autoinlining and function is not inline.\n");
+      return 0;
+    }
+
+  /* Compute local info about basic blocks and determine function size/time.  */
+  VEC_safe_grow_cleared (bb_info, heap, bb_info_vec, last_basic_block + 1);
+  memset (&best_split_point, 0, sizeof (best_split_point));
+  FOR_EACH_BB (bb)
+    {
+      int time = 0;
+      int size = 0;
+      int freq = compute_call_stmt_bb_frequency (current_function_decl, bb);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Basic block %i\n", bb->index);
+
+      for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+	{
+	  int this_time, this_size;
+	  gimple stmt = gsi_stmt (bsi);
+
+	  this_size = estimate_num_insns (stmt, &eni_size_weights);
+	  this_time = estimate_num_insns (stmt, &eni_time_weights) * freq;
+	  size += this_size;
+	  time += this_time;
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "  freq:%6i size:%3i time:%3i ",
+		       freq, this_size, this_time);
+	      print_gimple_stmt (dump_file, stmt, 0, 0);
+	    }
+	}
+      overall_time += time;
+      overall_size += size;
+      VEC_index (bb_info, bb_info_vec, bb->index)->time = time;
+      VEC_index (bb_info, bb_info_vec, bb->index)->size = size;
+    }
+  find_split_points (overall_time, overall_size);
+  if (best_split_point.split_bbs)
+    {
+      split_function (&best_split_point);
+      BITMAP_FREE (best_split_point.ssa_names_to_pass);
+      BITMAP_FREE (best_split_point.split_bbs);
+      todo = TODO_update_ssa | TODO_cleanup_cfg;
+    }
+  VEC_free (bb_info, heap, bb_info_vec);
+  bb_info_vec = NULL;
+  return todo;
+}
+
+static bool
+gate_split_functions (void)
+{
+  return flag_partial_inlining;
+}
+
+struct gimple_opt_pass pass_split_functions =
+{
+ {
+  GIMPLE_PASS,
+  "fnsplit",				/* name */
+  gate_split_functions,			/* gate */
+  execute_split_functions,		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_IPA_FNSPLIT,			/* tv_id */
+  PROP_cfg,				/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func			/* todo_flags_finish */
+ }
+};
Index: common.opt
===================================================================
--- common.opt	(revision 160109)
+++ common.opt	(working copy)
@@ -876,6 +876,10 @@ foptimize-sibling-calls
 Common Report Var(flag_optimize_sibling_calls) Optimization
 Optimize sibling and tail recursive calls
 
+fpartial-inlining
+Common Report Var(flag_partial_inlining)
+Perform partial inlining
+
 fpre-ipa-mem-report
 Common Report Var(pre_ipa_mem_report)
 Report on memory allocation before interprocedural optimization
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 160111)
+++ Makefile.in	(working copy)
@@ -1429,6 +1429,7 @@ OBJS-archive = \
 	cppdefault.o \
 	incpath.o \
 	ipa-cp.o \
+        ipa-split.o \
 	ipa-inline.o \
 	ipa-prop.o \
 	ipa-pure-const.o \
@@ -2966,6 +2967,10 @@ ipa-cp.o : ipa-cp.c $(CONFIG_H) $(SYSTEM
    $(TREE_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(FLAGS_H) $(TIMEVAR_H) $(DIAGNOSTIC_H) $(TREE_DUMP_H) \
    $(TREE_INLINE_H) $(FIBHEAP_H) $(PARAMS_H) tree-pretty-print.h
+ipa-split.o : ipa-split.c $(CONFIG_H) $(SYSTEM_H) coretypes.h  \
+   $(TREE_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) $(TREE_FLOW_H) \
+   $(TREE_PASS_H) $(FLAGS_H) $(TIMEVAR_H) $(DIAGNOSTIC_H) $(TREE_DUMP_H) \
+   $(TREE_INLINE_H) $(FIBHEAP_H) $(PARAMS_H)
 matrix-reorg.o : matrix-reorg.c $(CONFIG_H) $(SYSTEM_H) coretypes.h  \
    $(TM_H) $(TREE_H) $(RTL_H) $(TREE_INLINE_H) $(TREE_FLOW_H) \
    tree-flow-inline.h langhooks.h $(HASHTAB_H) $(TOPLEV_H) $(FLAGS_H) $(GGC_H) \
Index: passes.c
===================================================================
--- passes.c	(revision 160109)
+++ passes.c	(working copy)
@@ -795,6 +795,7 @@ init_optimization_passes (void)
           NEXT_PASS (pass_cleanup_eh);
           NEXT_PASS (pass_profile);
           NEXT_PASS (pass_local_pure_const);
+          NEXT_PASS (pass_split_functions);
 	}
       NEXT_PASS (pass_release_ssa_names);
       NEXT_PASS (pass_rebuild_cgraph_edges);
Index: params.def
===================================================================
--- params.def	(revision 160109)
+++ params.def	(working copy)
@@ -78,11 +78,11 @@ DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE,
    that is applied to functions marked inlined (or defined in the
    class declaration in C++) given by the "max-inline-insns-single"
    parameter.
-   The default value is 90.  */
+   The default value is 40.  */
 DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO,
 	  "max-inline-insns-auto",
 	  "The maximum number of instructions when automatically inlining",
-	  50, 0, 0)
+	  40, 0, 0)
 
 DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE,
 	  "max-inline-insns-recursive",
@@ -117,6 +117,12 @@ DEFPARAM (PARAM_EARLY_INLINER_MAX_ITERAT
 	  "The maximum number of nested indirect inlining performed by early inliner",
 	  10, 0, 0)
 
+/* Limit on probability of entry BB.  */
+DEFPARAM (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY,
+	  "partial-inlining-entry-probability",
+	  "Maximum probability of the entry BB of split region (in percent relative to entry BB of the function) to make partial inlining happen",
+	  70, 0, 0)
+
 /* Limit the number of expansions created by the variable expansion
    optimization to avoid register pressure.  */
 DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS,
Index: doc/invoke.texi
===================================================================
*** doc/invoke.texi	(revision 159621)
--- doc/invoke.texi	(working copy)
*************** Objective-C and Objective-C++ Dialects}.
*** 361,367 ****
  -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
  -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
  -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls @gol
! -fpeel-loops -fpredictive-commoning -fprefetch-loop-arrays @gol
  -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
  -fprofile-generate=@var{path} @gol
  -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
--- 361,367 ----
  -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
  -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
  -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls @gol
! -fpartial-inlining -fpeel-loops -fpredictive-commoning -fprefetch-loop-arrays @gol
  -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
  -fprofile-generate=@var{path} @gol
  -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
*************** also turns on the following optimization
*** 5851,5856 ****
--- 5851,5857 ----
  -findirect-inlining @gol
  -fipa-sra @gol
  -foptimize-sibling-calls @gol
+ -fpartial-inlining @gol
  -fpeephole2 @gol
  -fregmove @gol
  -freorder-blocks  -freorder-functions @gol
*************** This optimization is enabled by default.
*** 7001,7006 ****
--- 7002,7015 ----
  With this option, the compiler will create multiple copies of some
  local variables when unrolling a loop which can result in superior code.
  
+ @item -fpartial-inlining
+ @opindex fpartial-inlining
+ Inline parts of functions.  This option has any effect only
+ when inlining itself is turned on by the @option{-finline-functions}
+ or @option{-finline-small-functions} options.
+ 
+ Enabled at level @option{-O2}.
+ 
  @item -fpredictive-commoning
  @opindex fpredictive-commoning
  Perform predictive commoning optimization, i.e., reusing computations


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]