This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA} Patch to implement "Switch statement fdo reordering"


Here is the updated patch / Changelog with the corrections / suggestions so far.

Thanks
Edmar

2008-04-22  Edmar Wienskoski  <edmar@freescale.com>

	* common.opt (fvpt-case-rebalance): New option.
	* opts.c (flag_value_profile_case_rebalance_set): New
	variable.
	(common_handle_option): New option fvpt_case_rebalance.
	* params.def (PARAM_SWITCH_HISTOGRAM_SIZE,
	PARAM_MIN_CASE_PROMOTION_RATIO): New parameters.
	* target.h (profitable_case_rebalance_p): New target hook.
	* target-def.h (TARGET_PROFITABLE_CASE_REBALANCE_P): New
	define.
	* value-prof.c: Include params.h
	(tree_case_values_to_profile): New function
	(tree_values_to_profile): Call tree_case_values_to_profile as
	appropriate.
	* stmt.c: Include value-prof.h and params.h.
	(case_node): Add fdo_cost field.
	(fdo_switch_use_histogram_p): New function.
	(fdo_balance_case_nodes): Likewise.
	(expand_case): Add switch statement case re-balance
	optimization.
	* Makefile.in (stmt.o, value-prof.o): Added dependencies.
	* config/rs6000/rs6000.c (rs6000_profitable_case_rebalance_p):
	New function.
	(TARGET_PROFITABLE_CASE_REBALANCE_P): Target macro defined.
	* doc/invoke.texi (Optimize Options): Document new option and
	new parameters.

2008-04-22  Edmar Wienskoski  <edmar@freescale.com>

	* gcc.dg/tree-prof/val-prof-7.c: New test case for switch case
	re-balance optimization.

Index: gcc/gcc/doc/invoke.texi
===================================================================
--- gcc/gcc/doc/invoke.texi	(revision 134561)
+++ gcc/gcc/doc/invoke.texi	(working copy)
@@ -6348,10 +6348,11 @@
 optimization.  You must use @option{-fprofile-generate} both when
 compiling and when linking your program.
 
-The following options are enabled: @code{-fprofile-arcs}, @code{-fprofile-values}, @code{-fvpt}.
+The following options are enabled: @code{-fprofile-arcs}, @code{-fprofile-values},
+@code{-fvpt}, @code{-fvpt-case-rebalance}.
 
 If @var{path} is specified, GCC will look at the @var{path} to find
-the profile feeedback data files. See @option{-fprofile-dir}.
+the profile feedback data files. See @option{-fprofile-dir}.
 
 @item -fprofile-use
 @itemx -fprofile-use=@var{path}
@@ -6359,8 +6360,9 @@
 Enable profile feedback directed optimizations, and optimizations
 generally profitable only with profile feedback available.
 
-The following options are enabled: @code{-fbranch-probabilities}, @code{-fvpt},
-@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}
+The following options are enabled: @code{-fbranch-probabilities},
+@code{-fvpt}, @code{-fvpt-case-rebalance}, @code{-funroll-loops},
+@code{-fpeel-loops}, @code{-ftracer}
 
 By default, GCC emits an error message if the feedback profiles do not
 match the source code.  This error can be turned into a warning by using
@@ -6626,6 +6628,26 @@
 Currently the optimizations include specialization of division operation
 using the knowledge about the value of the denominator.
 
+@item -fvpt-case-rebalance
+@opindex fvpt-case-rebalance
+If combined with @option{-fvpt}, it instructs the compiler to add a code
+to gather information about the condition expression of switch statements.
+
+It reads back the occurrence frequency of each case value, and
+generate code for a switch statement from a binary tree weighted with
+those occurrence frequencies.
+
+This optimization is controlled by two parameters, which may be
+specified individually by using @option{--param @var{name}=@var{value}}.
+
+The range of case values that are profiled is controlled by
+@option{--param fdo-switch-histogram-size}.
+
+The minimum frequency of a default value to cause its promotion to a explicit
+case label is controlled by @option{--param fdo-min-case-promotion-rate}.
+
+Enabled with @option{-fprofile-generate} and @option{-fprofile-use}.
+
 @item -frename-registers
 @opindex frename-registers
 Attempt to avoid false dependencies in scheduled code by making use
@@ -7008,6 +7030,22 @@
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
+@item fdo-switch-histogram-size
+Defines the range of switch statement case values for which
+frequency execution will be collected. Default value is 256.
+
+@item fdo-min-case-promotion-rate
+Defines the minimum frequency for a switch statement default value
+be promoted to a explicit case label.
+
+Normally, a binary search has to be exhausted before the code in the
+default label can be executed. If a particular default value is found
+to have a significant execution frequency, a new node is created in
+the binary search tree with the frequency of that particular value
+used as weight.
+
+The default value is 10 percent.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables below that
 all candidates are considered for each use in induction variable
Index: gcc/gcc/value-prof.c
===================================================================
--- gcc/gcc/value-prof.c	(revision 134561)
+++ gcc/gcc/value-prof.c	(working copy)
@@ -44,6 +44,7 @@
 #include "tree-pass.h"
 #include "toplev.h"
 #include "pointer-set.h"
+#include "params.h"
 
 static struct value_prof_hooks *value_prof_hooks;
 
@@ -1609,6 +1610,32 @@
 						 stmt, dest));
 }
 
+/* Find values inside INSN for that we want to measure histograms for
+   switch re-ordering optimization.  */
+static void
+tree_case_values_to_profile (tree stmt, histogram_values *values)
+{
+  tree op, type;
+  histogram_value hist;
+
+  if (TREE_CODE (stmt) != SWITCH_EXPR)
+    return;
+  op = SWITCH_COND (stmt);
+  type = TREE_TYPE (op);
+  if (!INTEGRAL_TYPE_P (type))
+    return;
+
+  if (TREE_CODE (op) != INTEGER_CST)
+    {
+      hist = gimple_alloc_histogram_value (cfun, HIST_TYPE_INTERVAL,
+					   stmt, op);
+      hist->hdata.intvl.int_start = 0;
+      hist->hdata.intvl.steps = PARAM_VALUE (PARAM_SWITCH_HISTOGRAM_SIZE);
+      VEC_safe_push (histogram_value, heap, *values, hist);
+    }
+  return;
+}
+
 /* Find values inside STMT for that we want to measure histograms and adds
    them to list VALUES.  */
 
@@ -1620,6 +1647,8 @@
       tree_divmod_values_to_profile (stmt, values);
       tree_stringops_values_to_profile (stmt, values);
       tree_indirect_call_to_profile (stmt, values);
+      if (flag_value_profile_case_rebalance)
+	tree_case_values_to_profile (stmt, values);
     }
 }
 
Index: gcc/gcc/target.h
===================================================================
--- gcc/gcc/target.h	(revision 134561)
+++ gcc/gcc/target.h	(working copy)
@@ -768,6 +768,10 @@
      checks to  handle_dll_attribute ().  */
   bool (* valid_dllimport_attribute_p) (const_tree decl);
 
+  /* This target can execute re-balanced binary search trees faster than
+     indirect jump tables */
+  bool (* profitable_case_rebalance_p) (void);
+
   /* Functions relating to calls - argument passing, returns, etc.  */
   struct calls {
     bool (*promote_function_args) (const_tree fntype);
Index: gcc/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
===================================================================
--- gcc/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c	(revision 0)
+++ gcc/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c	(revision 0)
@@ -0,0 +1,52 @@
+/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fdump-tree-expand -mcpu=power4" { target { powerpc*-*-* && powerpc_altivec_ok } } } */
+
+enum {NEVER=1, HARDLY=2, FOO=3, OFTEN=5, BAR=6};
+
+int seq[] = {
+  OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+  OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+  OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+  OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+  300, -1, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+};
+int result;
+
+int
+test_it (int k)
+{
+  int i, j;
+
+  j = k;
+  for (i = 0; i < sizeof(seq)/sizeof(seq[0]); i++) {
+    switch (seq[i]) {
+    case NEVER:
+      j = j * i;
+      break;
+    case HARDLY:
+      j = j + i;
+      break;
+    case FOO:
+      j = j + k;
+      break;
+    case BAR:
+      j = j + i + k;
+    default:
+      j = j * k;
+      break;
+    }
+  }
+  return j;
+}
+
+int
+main (int argc, char **argv)
+{
+  result = test_it (1);
+  return 0;
+}
+/* For now, only power[456] has a target hook definition that will
+   cause the optimization to happen. Although no altivec code is
+   used, any machine that is altivec capable can be used to run this
+   test case */
+/* { dg-final-use { scan-tree-dump "default case 5 with cost 774 promoted" "expand" { target { powerpc*-*-* && powerpc_altivec_ok } } } } */
Index: gcc/gcc/opts.c
===================================================================
--- gcc/gcc/opts.c	(revision 134561)
+++ gcc/gcc/opts.c	(working copy)
@@ -353,6 +353,7 @@
 static bool profile_arc_flag_set, flag_profile_values_set;
 static bool flag_unroll_loops_set, flag_tracer_set;
 static bool flag_value_profile_transformations_set;
+static bool flag_value_profile_case_rebalance_set;
 static bool flag_peel_loops_set, flag_branch_probabilities_set;
 static bool flag_inline_functions_set;
 
@@ -1738,6 +1739,8 @@
         flag_tracer = value;
       if (!flag_value_profile_transformations_set)
         flag_value_profile_transformations = value;
+      if (!flag_value_profile_case_rebalance_set)
+        flag_value_profile_case_rebalance = value;
       if (!flag_inline_functions_set)
         flag_inline_functions = value;
       break;
@@ -1753,6 +1756,8 @@
         flag_profile_values = value;
       if (!flag_value_profile_transformations_set)
         flag_value_profile_transformations = value;
+      if (!flag_value_profile_case_rebalance_set)
+        flag_value_profile_case_rebalance = value;
       if (!flag_inline_functions_set)
         flag_inline_functions = value;
       break;
@@ -1780,6 +1785,10 @@
       flag_value_profile_transformations_set = true;
       break;
 
+    case OPT_fvpt_case_rebalance:
+      flag_value_profile_case_rebalance_set = true;
+      break;
+
     case OPT_frandom_seed:
       /* The real switch is -fno-random-seed.  */
       if (value)
Index: gcc/gcc/common.opt
===================================================================
--- gcc/gcc/common.opt	(revision 134561)
+++ gcc/gcc/common.opt	(working copy)
@@ -1262,6 +1262,10 @@
 Common Report Var(flag_value_profile_transformations) Optimization
 Use expression value profiles in optimizations
 
+fvpt-case-rebalance
+Common Report Var(flag_value_profile_case_rebalance) Optimization
+Use expression value profiles to rebalance the comparison tree in a switch statement
+
 fweb
 Common Report Var(flag_web) Init(2) Optimization
 Construct webs and split unrelated uses of single variable
Index: gcc/gcc/target-def.h
===================================================================
--- gcc/gcc/target-def.h	(revision 134561)
+++ gcc/gcc/target-def.h	(working copy)
@@ -540,6 +540,8 @@
 
 #define TARGET_ARM_EABI_UNWINDER false
 
+#define TARGET_PROFITABLE_CASE_REBALANCE_P NULL
+
 #define TARGET_PROMOTE_FUNCTION_ARGS hook_bool_const_tree_false
 #define TARGET_PROMOTE_FUNCTION_RETURN hook_bool_const_tree_false
 #define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_false
@@ -774,6 +776,7 @@
   TARGET_STACK_PROTECT_FAIL,			\
   TARGET_INVALID_WITHIN_DOLOOP,			\
   TARGET_VALID_DLLIMPORT_ATTRIBUTE_P,		\
+  TARGET_PROFITABLE_CASE_REBALANCE_P,		\
   TARGET_CALLS,					\
   TARGET_INVALID_CONVERSION,			\
   TARGET_INVALID_UNARY_OP,			\
Index: gcc/gcc/Makefile.in
===================================================================
--- gcc/gcc/Makefile.in	(revision 134561)
+++ gcc/gcc/Makefile.in	(working copy)
@@ -2395,8 +2395,8 @@
 stmt.o : stmt.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(TREE_H) $(FLAGS_H) $(FUNCTION_H) insn-config.h hard-reg-set.h $(EXPR_H) \
    libfuncs.h except.h $(RECOG_H) toplev.h output.h $(GGC_H) $(TM_P_H) \
-   langhooks.h $(PREDICT_H) $(OPTABS_H) $(TARGET_H) $(MACHMODE_H) \
-   $(REGS_H) alloc-pool.h
+   langhooks.h $(COVERAGE_H) value-prof.h $(PARAMS_H) $(PREDICT_H) \
+   $(OPTABS_H) $(TARGET_H) $(MACHMODE_H) $(REGS_H) alloc-pool.h
 except.o : except.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(TREE_H) $(FLAGS_H) except.h $(FUNCTION_H) $(EXPR_H) libfuncs.h \
    langhooks.h insn-config.h hard-reg-set.h $(BASIC_BLOCK_H) output.h \
@@ -2638,7 +2638,7 @@
 value-prof.o : value-prof.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(BASIC_BLOCK_H) hard-reg-set.h value-prof.h $(EXPR_H) output.h $(FLAGS_H) \
    $(RECOG_H) insn-config.h $(OPTABS_H) $(REGS_H) $(GGC_H) $(DIAGNOSTIC_H) \
-   $(TREE_H) $(COVERAGE_H) $(RTL_H) $(GCOV_IO_H) $(TREE_FLOW_H) \
+   $(TREE_H) $(COVERAGE_H) $(RTL_H) $(GCOV_IO_H) $(TREE_FLOW_H) $(PARAMS_H) \
    tree-flow-inline.h $(TIMEVAR_H) tree-pass.h
 loop-doloop.o : loop-doloop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) $(FLAGS_H) $(EXPR_H) hard-reg-set.h $(BASIC_BLOCK_H) $(TM_P_H) \
Index: gcc/gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/gcc/config/rs6000/rs6000.c	(revision 134561)
+++ gcc/gcc/config/rs6000/rs6000.c	(working copy)
@@ -887,6 +887,7 @@
 static bool rs6000_is_opaque_type (const_tree);
 static rtx rs6000_dwarf_register_span (rtx);
 static void rs6000_init_dwarf_reg_sizes_extra (tree);
+static bool rs6000_profitable_case_rebalance_p (void);
 static rtx rs6000_legitimize_tls_address (rtx, enum tls_model);
 static void rs6000_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
 static rtx rs6000_tls_get_addr (void);
@@ -1157,6 +1158,9 @@
 #undef TARGET_INIT_DWARF_REG_SIZES_EXTRA
 #define TARGET_INIT_DWARF_REG_SIZES_EXTRA rs6000_init_dwarf_reg_sizes_extra
 
+#undef TARGET_PROFITABLE_CASE_REBALANCE_P
+#define TARGET_PROFITABLE_CASE_REBALANCE_P rs6000_profitable_case_rebalance_p
+
 /* On rs6000, function arguments are promoted, as are function return
    values.  */
 #undef TARGET_PROMOTE_FUNCTION_ARGS
@@ -21937,6 +21941,16 @@
     }
 }
 
+static bool
+rs6000_profitable_case_rebalance_p (void)
+{
+  if (rs6000_cpu == PROCESSOR_POWER4
+      || rs6000_cpu == PROCESSOR_POWER5
+      || rs6000_cpu == PROCESSOR_POWER6)
+    return true;
+  return false;
+}
+
 /* Map internal gcc register numbers to DWARF2 register numbers.  */
 
 unsigned int
Index: gcc/gcc/stmt.c
===================================================================
--- gcc/gcc/stmt.c	(revision 134561)
+++ gcc/gcc/stmt.c	(working copy)
@@ -50,6 +50,8 @@
 #include "target.h"
 #include "regs.h"
 #include "alloc-pool.h"
+#include "value-prof.h"
+#include "params.h"
 
 /* Functions and data structures for expanding case statements.  */
 
@@ -88,6 +90,7 @@
   tree			low;	/* Lowest index value for this label */
   tree			high;	/* Highest index value for this label */
   tree			code_label; /* Label to jump to when node matches */
+  int                   fdo_cost; /* Normalized profiling frequency */
 };
 
 typedef struct case_node case_node;
@@ -116,6 +119,8 @@
 static bool lshift_cheap_p (void);
 static int case_bit_test_cmp (const void *, const void *);
 static void emit_case_bit_tests (tree, tree, tree, tree, case_node_ptr, rtx);
+static bool fdo_switch_use_histogram_p (histogram_value);
+static void fdo_balance_case_nodes (case_node_ptr *, case_node_ptr);
 static void balance_case_nodes (case_node_ptr *, case_node_ptr);
 static int node_has_low_bound (case_node_ptr, tree);
 static int node_has_high_bound (case_node_ptr, tree);
@@ -2306,6 +2311,7 @@
   rtx *labelvec;
   int i;
   rtx before_case, end, lab;
+  histogram_value hist;
 
   tree vec = SWITCH_LABELS (exp);
   tree orig_type = TREE_TYPE (exp);
@@ -2453,6 +2459,222 @@
 			       case_list, default_label);
 	}
 
+      /* If a histogram for the condition is present, give preference
+	 to generate a sequence of conditional branches. That sequence
+	 will have the form of a balanced binary tree, where each node
+	 is weighted with the frequency found in the histogram.
+
+	 Since default cases must exhaust the binary search tree, this
+	 optimization also checks for frequently occurring default
+	 values, and artificially add nodes in the balanced tree to
+	 represent those values. That way default cases can also be
+	 optimized */
+
+      else if ((hist = gimple_histogram_value_of_type
+		(cfun, exp, HIST_TYPE_INTERVAL)) != NULL
+	       && fdo_switch_use_histogram_p(hist))
+	{
+	  gcov_type all;
+	  int toobig, negative;
+	  int i, k;
+	  struct case_node *cn = 0;
+	  int count_negs, count_bigs;
+	  int cost_negs, cost_bigs;
+	  int range_start, range_end;
+	  int *hcounters;
+
+	  index = expand_normal (index_expr);
+
+	  /* If the index is a short or char that we do not have
+	     an insn to handle comparisons directly, convert it to
+	     a full integer now, rather than letting each comparison
+	     generate the conversion.  */
+
+	  if (GET_MODE_CLASS (GET_MODE (index)) == MODE_INT
+	      && ! have_insn_for (COMPARE, GET_MODE (index)))
+	    {
+	      enum machine_mode wider_mode;
+	      for (wider_mode = GET_MODE (index); wider_mode != VOIDmode;
+		   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
+		if (have_insn_for (COMPARE, wider_mode))
+		  {
+		    index = convert_to_mode (wider_mode, index, unsignedp);
+		    break;
+		  }
+	    }
+
+	  do_pending_stack_adjust ();
+
+	  if (MEM_P (index))
+	    index = copy_to_reg (index);
+
+	  /* Normalize the counters in the histogram */
+	  hcounters = (int *) xcalloc (1, ((int)hist->hdata.intvl.steps + 2)
+				       * sizeof(int));
+	  all = 0;
+	  for (i = 0; i < (int)hist->hdata.intvl.steps + 2; i++)
+	    all += hist->hvalue.counters[i];
+	  for (i = 0; i < (int)hist->hdata.intvl.steps + 2; i++)
+	    hcounters[i] = (hist->hvalue.counters[i] * 1000) / all;
+	  toobig = hcounters[hist->hdata.intvl.steps];
+	  negative = hcounters[hist->hdata.intvl.steps + 1];
+
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, "Switch transform based on this histogram:\n");
+	      for (i = 0, k = 0; i < (int)hist->hdata.intvl.steps; i++, k++)
+		{
+		  k &= 0x0F;
+		  if (i == 0)
+		    fprintf (dump_file, "%4d", i);
+		  else if (k == 0)
+		    fprintf (dump_file, "\n%4d", i);
+		  fprintf (dump_file, " %4d", hcounters[i]);
+		}
+	      fprintf (dump_file, "\n   steps=%d   toobig=%d   negative=%d\n",
+		       hist->hdata.intvl.steps, toobig, negative);
+	    }
+	  
+	  /* Our histogram is limited by a fixed range,
+	     exam the case list for cases out of this range */
+	  count_negs = 0;
+	  count_bigs = 0;
+	  for (cn = case_list; cn; cn = cn->right)
+	    {
+	      cn->fdo_cost = 0;
+	      if (tree_int_cst_equal (cn->low, cn->high))
+		{
+		  if (TREE_INT_CST_HIGH(cn->low) < 0)
+		    count_negs++;
+		  else if (TREE_INT_CST_HIGH(cn->low) > 0
+			   || (TREE_INT_CST_HIGH(cn->low) == 0
+			       && (TREE_INT_CST_LOW(cn->low)
+				   >= hist->hdata.intvl.steps)))
+		    count_bigs++;
+		  /* single value, within range, transfer cost */
+		  else
+		    {
+		      cn->fdo_cost = hcounters[TREE_INT_CST_LOW(cn->low)];
+		      hcounters[TREE_INT_CST_LOW(cn->low)] = 0;
+		    }
+		}
+	      /* It is a range */
+	      else
+		{
+		  if (TREE_INT_CST_HIGH(cn->low) < 0)
+		    count_negs += 2;
+		  if (TREE_INT_CST_HIGH(cn->high) > 0
+		      || (TREE_INT_CST_HIGH(cn->high) == 0
+			  && (TREE_INT_CST_LOW(cn->high)
+			      >= hist->hdata.intvl.steps)))
+		    count_bigs += 2;
+		  /* Transfer all costs within the range */
+		  if (TREE_INT_CST_HIGH(cn->low) < 0)
+		    range_start = 0;
+		  else
+		    range_start = TREE_INT_CST_LOW(cn->low);
+		  if (TREE_INT_CST_HIGH(cn->high) > 0
+		      || (TREE_INT_CST_HIGH(cn->high) == 0
+			  && (TREE_INT_CST_LOW(cn->high)
+			      >= hist->hdata.intvl.steps)))
+		    range_end = (int)hist->hdata.intvl.steps - 1;
+		  else if (TREE_INT_CST_HIGH(cn->high) < 0)
+		    range_end = 0;
+		  else
+		    range_end = TREE_INT_CST_LOW(cn->high);
+		  for (i = range_start; i <= range_end; i++)
+		    {
+		      cn->fdo_cost = hcounters[i];
+		      hcounters[i] = 0;
+		    }
+		}
+	    }
+
+	  /* Calculate the cost for each indiviual big/negative case */
+	  if (count_bigs)
+	    cost_bigs = toobig / count_bigs;
+	  else
+	    cost_bigs = 0;
+	  if (count_negs)
+	    cost_negs = negative / count_negs;
+	  else
+	    cost_negs = 0;
+
+	  /* Distribute the cost of bigs and negatives uniformly */
+	  for (cn = case_list; cn; cn = cn->right)
+	    {
+	      if (tree_int_cst_equal (cn->low, cn->high))
+		{
+		  /* case is a single value */
+		  if (TREE_INT_CST_HIGH(cn->low) < 0)
+		    cn->fdo_cost = cost_negs;
+		  else if (TREE_INT_CST_HIGH(cn->low) > 0
+			   || (TREE_INT_CST_HIGH(cn->low) == 0
+			       && (TREE_INT_CST_LOW(cn->low)
+				   >= hist->hdata.intvl.steps)))
+		    cn->fdo_cost = cost_bigs;
+		}
+	      else
+		{
+		  /* case is a range */
+		  if (TREE_INT_CST_HIGH(cn->low) < 0)
+		    cn->fdo_cost += 2 * cost_negs;
+		  if (TREE_INT_CST_HIGH(cn->high) > 0
+		      || (TREE_INT_CST_HIGH(cn->high) == 0
+			  && (TREE_INT_CST_LOW(cn->high)
+			      >= hist->hdata.intvl.steps)))
+		    cn->fdo_cost += 2 * cost_bigs;
+		}
+	    }
+	  /* All left overs goes to "default" case node,
+	     if we have a significant case, create an entry in
+	     the case_node list for it */
+	  for (i = 0; i < (int)hist->hdata.intvl.steps; i++)
+	    if ((hcounters[i]) > 10
+		* PARAM_VALUE (PARAM_MIN_CASE_PROMOTION_RATIO))
+	      {
+		tree cn_new = build_int_cst (NULL, i);
+
+		if (TREE_INT_CST_HIGH(case_list->high) == 0
+		    && TREE_INT_CST_LOW(case_list->high) > (unsigned)i)
+		  {
+		    case_list = add_case_node (case_list, index_type, cn_new,
+					       cn_new, default_label_decl,
+					       case_node_pool);
+		    case_list->fdo_cost = hcounters[i];
+		  }
+		else
+		  {
+		    for (cn = case_list; cn; cn = cn->right)
+		      if ((TREE_INT_CST_HIGH(cn->high) < 0
+			   || (TREE_INT_CST_HIGH(cn->high) == 0
+			       && TREE_INT_CST_LOW(cn->high) < (unsigned)i))
+			  && (cn->right == NULL
+			      || (TREE_INT_CST_HIGH(cn->right->low) == 0
+				  && TREE_INT_CST_LOW(cn->right->low) > (unsigned)i)))
+			break;
+		    gcc_assert (cn != NULL);
+		    cn->right = add_case_node (cn->right, index_type, cn_new,
+					       cn_new, default_label_decl,
+					       case_node_pool);
+		    cn->right->fdo_cost = hcounters[i];
+		  }
+		if (dump_file)
+		  fprintf (dump_file, "default case %d with cost %d promoted\n",
+			   i, hcounters[i]);
+		hcounters[i] = 0;
+	      }
+	  free (hcounters);
+	  if (dump_file)
+	    for (cn = case_list; cn; cn = cn->right)
+	      fprintf (dump_file, "case %d, cost %d\n",
+		       (int)TREE_INT_CST_LOW(cn->low), cn->fdo_cost);
+	  fdo_balance_case_nodes (&case_list, NULL);
+
+	  emit_case_nodes (index, case_list, default_label, index_type);
+	  emit_jump (default_label);	  
+	}
+
       /* If range of values is much bigger than number of values,
 	 make a sequence of conditional branches instead of a dispatch.
 	 If the switch-index is a constant, do it this way
@@ -2692,6 +2914,23 @@
   return 1;
 }
 
+static bool
+fdo_switch_use_histogram_p (histogram_value hist)
+{
+  gcov_type all;
+  int i;
+
+  if (targetm.profitable_case_rebalance_p == NULL
+      || (targetm.profitable_case_rebalance_p() == false))
+    return false;
+
+  all = 0;
+  for (i = 0; i < (int)hist->hdata.intvl.steps + 2; i++)
+    all += hist->hvalue.counters[i];
+
+  return (all != 0);
+}
+
 /* Take an ordered list of case nodes
    and transform them into a near optimal binary tree,
    on the assumption that any target code selection value is as
@@ -2703,6 +2942,85 @@
    branch is then transformed recursively.  */
 
 static void
+fdo_balance_case_nodes (case_node_ptr *head, case_node_ptr parent)
+{
+  case_node_ptr np;
+
+  np = *head;
+  if (np)
+    {
+      int i = 0;
+      int cost = 0;
+      case_node_ptr *npp;
+      case_node_ptr left;
+      
+      /* Count the number of entries on branch. */
+      
+      while (np)
+	{
+	  i++;
+          cost += np->fdo_cost;
+	  np = np->right;
+	}
+      
+      if (i > 2)
+	{
+	  /* Split this list if it is long enough for that to help.  */
+	  npp = head;
+	  left = *npp;
+	  if (i == 3)
+	    npp = &(*npp)->right;
+	  else
+	    {
+              if (cost == 0)
+                {
+                  /* Ensure some progress in the recursive function
+                     in case we got stuck with a large string of cost zero */
+                  i = (i + 1) / 2;
+                  while (1)
+                    {
+                      i--;
+                      if (i <= 0)
+                        break;
+                      npp = &(*npp)->right;
+                    }
+                }
+              else
+                {
+                  /* Find the place in the list that bisects the list's
+                     total cost. */
+                  cost /= 2;
+                  while (cost >= 0 && (*npp)->right != NULL)
+                    {
+                      cost -= (*npp)->fdo_cost;
+                      npp = &(*npp)->right;
+                      if ((cost - (*npp)->fdo_cost) < 0)
+                        break;
+                    }
+                }
+	    }
+	  *head = np = *npp;
+	  *npp = 0;
+	  np->parent = parent;
+	  np->left = left;
+	  
+	  /* Optimize each of the two split parts.  */
+	  fdo_balance_case_nodes (&np->left, np);
+	  fdo_balance_case_nodes (&np->right, np);
+	}
+      else
+	{
+	  /* Else leave this branch as one level,
+	     but fill in `parent' fields.  */
+	  np = *head;
+	  np->parent = parent;
+	  for (; np->right; np = np->right)
+	    np->right->parent = np;
+	}
+    }
+}
+
+static void
 balance_case_nodes (case_node_ptr *head, case_node_ptr parent)
 {
   case_node_ptr np;
Index: gcc/gcc/params.def
===================================================================
--- gcc/gcc/params.def	(revision 134561)
+++ gcc/gcc/params.def	(working copy)
@@ -435,6 +435,19 @@
 	 "The minimum cost of an expensive expression in the loop invariant motion",
 	 20, 0, 0)
 
+/* The number steps in the interval histogram used to profile switch statement condition */
+DEFPARAM(PARAM_SWITCH_HISTOGRAM_SIZE,
+	 "fdo-switch-histogram-size",
+	 "The number steps in the interval histogram used to profile switch statement condition",
+	 256, 32, 0)
+
+/* The minimum frequency for which a default value in a switch 
+   statement is promoted to a labeled case (percentage) */
+DEFPARAM(PARAM_MIN_CASE_PROMOTION_RATIO,
+	 "fdo-min-case-promotion-rate",
+	 "The minimum frequency for which a default value in a switch statement is promoted to a labeled case (percentage)",
+	 10, 0, 100)
+
 /* Bound on number of candidates for induction variables below that
    all candidates are considered for each use in induction variable
    optimizations.  */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]