This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFA} Patch to implement "Switch statement fdo reordering"
Here is the updated patch / Changelog with the corrections / suggestions
so far.
Thanks
Edmar
2008-04-22 Edmar Wienskoski <edmar@freescale.com>
* common.opt (fvpt-case-rebalance): New option.
* opts.c (flag_value_profile_case_rebalance_set): New
variable.
(common_handle_option): New option fvpt_case_rebalance.
* params.def (PARAM_SWITCH_HISTOGRAM_SIZE,
PARAM_MIN_CASE_PROMOTION_RATIO): New parameters.
* target.h (profitable_case_rebalance_p): New target hook.
* target-def.h (TARGET_PROFITABLE_CASE_REBALANCE_P): New
define.
* value-prof.c: Include params.h
(tree_case_values_to_profile): New function
(tree_values_to_profile): Call tree_case_values_to_profile as
appropriate.
* stmt.c: Include value-prof.h and params.h.
(case_node): Add fdo_cost field.
(fdo_switch_use_histogram_p): New function.
(fdo_balance_case_nodes): Likewise.
(expand_case): Add switch statement case re-balance
optimization.
* Makefile.in (stmt.o, value-prof.o): Added dependencies.
* config/rs6000/rs6000.c (rs6000_profitable_case_rebalance_p):
New function.
(TARGET_PROFITABLE_CASE_REBALANCE_P): Target macro defined.
* doc/invoke.texi (Optimize Options): Document new option and
new parameters.
2008-04-22 Edmar Wienskoski <edmar@freescale.com>
* gcc.dg/tree-prof/val-prof-7.c: New test case for switch case
re-balance optimization.
Index: gcc/gcc/doc/invoke.texi
===================================================================
--- gcc/gcc/doc/invoke.texi (revision 134561)
+++ gcc/gcc/doc/invoke.texi (working copy)
@@ -6348,10 +6348,11 @@
optimization. You must use @option{-fprofile-generate} both when
compiling and when linking your program.
-The following options are enabled: @code{-fprofile-arcs}, @code{-fprofile-values}, @code{-fvpt}.
+The following options are enabled: @code{-fprofile-arcs}, @code{-fprofile-values},
+@code{-fvpt}, @code{-fvpt-case-rebalance}.
If @var{path} is specified, GCC will look at the @var{path} to find
-the profile feeedback data files. See @option{-fprofile-dir}.
+the profile feedback data files. See @option{-fprofile-dir}.
@item -fprofile-use
@itemx -fprofile-use=@var{path}
@@ -6359,8 +6360,9 @@
Enable profile feedback directed optimizations, and optimizations
generally profitable only with profile feedback available.
-The following options are enabled: @code{-fbranch-probabilities}, @code{-fvpt},
-@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}
+The following options are enabled: @code{-fbranch-probabilities},
+@code{-fvpt}, @code{-fvpt-case-rebalance}, @code{-funroll-loops},
+@code{-fpeel-loops}, @code{-ftracer}
By default, GCC emits an error message if the feedback profiles do not
match the source code. This error can be turned into a warning by using
@@ -6626,6 +6628,26 @@
Currently the optimizations include specialization of division operation
using the knowledge about the value of the denominator.
+@item -fvpt-case-rebalance
+@opindex fvpt-case-rebalance
+If combined with @option{-fvpt}, it instructs the compiler to add a code
+to gather information about the condition expression of switch statements.
+
+It reads back the occurrence frequency of each case value, and
+generate code for a switch statement from a binary tree weighted with
+those occurrence frequencies.
+
+This optimization is controlled by two parameters, which may be
+specified individually by using @option{--param @var{name}=@var{value}}.
+
+The range of case values that are profiled is controlled by
+@option{--param fdo-switch-histogram-size}.
+
+The minimum frequency of a default value to cause its promotion to a explicit
+case label is controlled by @option{--param fdo-min-case-promotion-rate}.
+
+Enabled with @option{-fprofile-generate} and @option{-fprofile-use}.
+
@item -frename-registers
@opindex frename-registers
Attempt to avoid false dependencies in scheduled code by making use
@@ -7008,6 +7030,22 @@
@item lim-expensive
The minimum cost of an expensive expression in the loop invariant motion.
+@item fdo-switch-histogram-size
+Defines the range of switch statement case values for which
+frequency execution will be collected. Default value is 256.
+
+@item fdo-min-case-promotion-rate
+Defines the minimum frequency for a switch statement default value
+be promoted to a explicit case label.
+
+Normally, a binary search has to be exhausted before the code in the
+default label can be executed. If a particular default value is found
+to have a significant execution frequency, a new node is created in
+the binary search tree with the frequency of that particular value
+used as weight.
+
+The default value is 10 percent.
+
@item iv-consider-all-candidates-bound
Bound on number of candidates for induction variables below that
all candidates are considered for each use in induction variable
Index: gcc/gcc/value-prof.c
===================================================================
--- gcc/gcc/value-prof.c (revision 134561)
+++ gcc/gcc/value-prof.c (working copy)
@@ -44,6 +44,7 @@
#include "tree-pass.h"
#include "toplev.h"
#include "pointer-set.h"
+#include "params.h"
static struct value_prof_hooks *value_prof_hooks;
@@ -1609,6 +1610,32 @@
stmt, dest));
}
+/* Find values inside INSN for that we want to measure histograms for
+ switch re-ordering optimization. */
+static void
+tree_case_values_to_profile (tree stmt, histogram_values *values)
+{
+ tree op, type;
+ histogram_value hist;
+
+ if (TREE_CODE (stmt) != SWITCH_EXPR)
+ return;
+ op = SWITCH_COND (stmt);
+ type = TREE_TYPE (op);
+ if (!INTEGRAL_TYPE_P (type))
+ return;
+
+ if (TREE_CODE (op) != INTEGER_CST)
+ {
+ hist = gimple_alloc_histogram_value (cfun, HIST_TYPE_INTERVAL,
+ stmt, op);
+ hist->hdata.intvl.int_start = 0;
+ hist->hdata.intvl.steps = PARAM_VALUE (PARAM_SWITCH_HISTOGRAM_SIZE);
+ VEC_safe_push (histogram_value, heap, *values, hist);
+ }
+ return;
+}
+
/* Find values inside STMT for that we want to measure histograms and adds
them to list VALUES. */
@@ -1620,6 +1647,8 @@
tree_divmod_values_to_profile (stmt, values);
tree_stringops_values_to_profile (stmt, values);
tree_indirect_call_to_profile (stmt, values);
+ if (flag_value_profile_case_rebalance)
+ tree_case_values_to_profile (stmt, values);
}
}
Index: gcc/gcc/target.h
===================================================================
--- gcc/gcc/target.h (revision 134561)
+++ gcc/gcc/target.h (working copy)
@@ -768,6 +768,10 @@
checks to handle_dll_attribute (). */
bool (* valid_dllimport_attribute_p) (const_tree decl);
+ /* This target can execute re-balanced binary search trees faster than
+ indirect jump tables */
+ bool (* profitable_case_rebalance_p) (void);
+
/* Functions relating to calls - argument passing, returns, etc. */
struct calls {
bool (*promote_function_args) (const_tree fntype);
Index: gcc/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
===================================================================
--- gcc/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (revision 0)
+++ gcc/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (revision 0)
@@ -0,0 +1,52 @@
+/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fdump-tree-expand -mcpu=power4" { target { powerpc*-*-* && powerpc_altivec_ok } } } */
+
+enum {NEVER=1, HARDLY=2, FOO=3, OFTEN=5, BAR=6};
+
+int seq[] = {
+ OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+ OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+ OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+ OFTEN, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+ 300, -1, OFTEN, OFTEN, OFTEN, OFTEN, HARDLY,
+};
+int result;
+
+int
+test_it (int k)
+{
+ int i, j;
+
+ j = k;
+ for (i = 0; i < sizeof(seq)/sizeof(seq[0]); i++) {
+ switch (seq[i]) {
+ case NEVER:
+ j = j * i;
+ break;
+ case HARDLY:
+ j = j + i;
+ break;
+ case FOO:
+ j = j + k;
+ break;
+ case BAR:
+ j = j + i + k;
+ default:
+ j = j * k;
+ break;
+ }
+ }
+ return j;
+}
+
+int
+main (int argc, char **argv)
+{
+ result = test_it (1);
+ return 0;
+}
+/* For now, only power[456] has a target hook definition that will
+ cause the optimization to happen. Although no altivec code is
+ used, any machine that is altivec capable can be used to run this
+ test case */
+/* { dg-final-use { scan-tree-dump "default case 5 with cost 774 promoted" "expand" { target { powerpc*-*-* && powerpc_altivec_ok } } } } */
Index: gcc/gcc/opts.c
===================================================================
--- gcc/gcc/opts.c (revision 134561)
+++ gcc/gcc/opts.c (working copy)
@@ -353,6 +353,7 @@
static bool profile_arc_flag_set, flag_profile_values_set;
static bool flag_unroll_loops_set, flag_tracer_set;
static bool flag_value_profile_transformations_set;
+static bool flag_value_profile_case_rebalance_set;
static bool flag_peel_loops_set, flag_branch_probabilities_set;
static bool flag_inline_functions_set;
@@ -1738,6 +1739,8 @@
flag_tracer = value;
if (!flag_value_profile_transformations_set)
flag_value_profile_transformations = value;
+ if (!flag_value_profile_case_rebalance_set)
+ flag_value_profile_case_rebalance = value;
if (!flag_inline_functions_set)
flag_inline_functions = value;
break;
@@ -1753,6 +1756,8 @@
flag_profile_values = value;
if (!flag_value_profile_transformations_set)
flag_value_profile_transformations = value;
+ if (!flag_value_profile_case_rebalance_set)
+ flag_value_profile_case_rebalance = value;
if (!flag_inline_functions_set)
flag_inline_functions = value;
break;
@@ -1780,6 +1785,10 @@
flag_value_profile_transformations_set = true;
break;
+ case OPT_fvpt_case_rebalance:
+ flag_value_profile_case_rebalance_set = true;
+ break;
+
case OPT_frandom_seed:
/* The real switch is -fno-random-seed. */
if (value)
Index: gcc/gcc/common.opt
===================================================================
--- gcc/gcc/common.opt (revision 134561)
+++ gcc/gcc/common.opt (working copy)
@@ -1262,6 +1262,10 @@
Common Report Var(flag_value_profile_transformations) Optimization
Use expression value profiles in optimizations
+fvpt-case-rebalance
+Common Report Var(flag_value_profile_case_rebalance) Optimization
+Use expression value profiles to rebalance the comparison tree in a switch statement
+
fweb
Common Report Var(flag_web) Init(2) Optimization
Construct webs and split unrelated uses of single variable
Index: gcc/gcc/target-def.h
===================================================================
--- gcc/gcc/target-def.h (revision 134561)
+++ gcc/gcc/target-def.h (working copy)
@@ -540,6 +540,8 @@
#define TARGET_ARM_EABI_UNWINDER false
+#define TARGET_PROFITABLE_CASE_REBALANCE_P NULL
+
#define TARGET_PROMOTE_FUNCTION_ARGS hook_bool_const_tree_false
#define TARGET_PROMOTE_FUNCTION_RETURN hook_bool_const_tree_false
#define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_false
@@ -774,6 +776,7 @@
TARGET_STACK_PROTECT_FAIL, \
TARGET_INVALID_WITHIN_DOLOOP, \
TARGET_VALID_DLLIMPORT_ATTRIBUTE_P, \
+ TARGET_PROFITABLE_CASE_REBALANCE_P, \
TARGET_CALLS, \
TARGET_INVALID_CONVERSION, \
TARGET_INVALID_UNARY_OP, \
Index: gcc/gcc/Makefile.in
===================================================================
--- gcc/gcc/Makefile.in (revision 134561)
+++ gcc/gcc/Makefile.in (working copy)
@@ -2395,8 +2395,8 @@
stmt.o : stmt.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
$(TREE_H) $(FLAGS_H) $(FUNCTION_H) insn-config.h hard-reg-set.h $(EXPR_H) \
libfuncs.h except.h $(RECOG_H) toplev.h output.h $(GGC_H) $(TM_P_H) \
- langhooks.h $(PREDICT_H) $(OPTABS_H) $(TARGET_H) $(MACHMODE_H) \
- $(REGS_H) alloc-pool.h
+ langhooks.h $(COVERAGE_H) value-prof.h $(PARAMS_H) $(PREDICT_H) \
+ $(OPTABS_H) $(TARGET_H) $(MACHMODE_H) $(REGS_H) alloc-pool.h
except.o : except.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
$(TREE_H) $(FLAGS_H) except.h $(FUNCTION_H) $(EXPR_H) libfuncs.h \
langhooks.h insn-config.h hard-reg-set.h $(BASIC_BLOCK_H) output.h \
@@ -2638,7 +2638,7 @@
value-prof.o : value-prof.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(BASIC_BLOCK_H) hard-reg-set.h value-prof.h $(EXPR_H) output.h $(FLAGS_H) \
$(RECOG_H) insn-config.h $(OPTABS_H) $(REGS_H) $(GGC_H) $(DIAGNOSTIC_H) \
- $(TREE_H) $(COVERAGE_H) $(RTL_H) $(GCOV_IO_H) $(TREE_FLOW_H) \
+ $(TREE_H) $(COVERAGE_H) $(RTL_H) $(GCOV_IO_H) $(TREE_FLOW_H) $(PARAMS_H) \
tree-flow-inline.h $(TIMEVAR_H) tree-pass.h
loop-doloop.o : loop-doloop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(RTL_H) $(FLAGS_H) $(EXPR_H) hard-reg-set.h $(BASIC_BLOCK_H) $(TM_P_H) \
Index: gcc/gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/gcc/config/rs6000/rs6000.c (revision 134561)
+++ gcc/gcc/config/rs6000/rs6000.c (working copy)
@@ -887,6 +887,7 @@
static bool rs6000_is_opaque_type (const_tree);
static rtx rs6000_dwarf_register_span (rtx);
static void rs6000_init_dwarf_reg_sizes_extra (tree);
+static bool rs6000_profitable_case_rebalance_p (void);
static rtx rs6000_legitimize_tls_address (rtx, enum tls_model);
static void rs6000_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
static rtx rs6000_tls_get_addr (void);
@@ -1157,6 +1158,9 @@
#undef TARGET_INIT_DWARF_REG_SIZES_EXTRA
#define TARGET_INIT_DWARF_REG_SIZES_EXTRA rs6000_init_dwarf_reg_sizes_extra
+#undef TARGET_PROFITABLE_CASE_REBALANCE_P
+#define TARGET_PROFITABLE_CASE_REBALANCE_P rs6000_profitable_case_rebalance_p
+
/* On rs6000, function arguments are promoted, as are function return
values. */
#undef TARGET_PROMOTE_FUNCTION_ARGS
@@ -21937,6 +21941,16 @@
}
}
+static bool
+rs6000_profitable_case_rebalance_p (void)
+{
+ if (rs6000_cpu == PROCESSOR_POWER4
+ || rs6000_cpu == PROCESSOR_POWER5
+ || rs6000_cpu == PROCESSOR_POWER6)
+ return true;
+ return false;
+}
+
/* Map internal gcc register numbers to DWARF2 register numbers. */
unsigned int
Index: gcc/gcc/stmt.c
===================================================================
--- gcc/gcc/stmt.c (revision 134561)
+++ gcc/gcc/stmt.c (working copy)
@@ -50,6 +50,8 @@
#include "target.h"
#include "regs.h"
#include "alloc-pool.h"
+#include "value-prof.h"
+#include "params.h"
/* Functions and data structures for expanding case statements. */
@@ -88,6 +90,7 @@
tree low; /* Lowest index value for this label */
tree high; /* Highest index value for this label */
tree code_label; /* Label to jump to when node matches */
+ int fdo_cost; /* Normalized profiling frequency */
};
typedef struct case_node case_node;
@@ -116,6 +119,8 @@
static bool lshift_cheap_p (void);
static int case_bit_test_cmp (const void *, const void *);
static void emit_case_bit_tests (tree, tree, tree, tree, case_node_ptr, rtx);
+static bool fdo_switch_use_histogram_p (histogram_value);
+static void fdo_balance_case_nodes (case_node_ptr *, case_node_ptr);
static void balance_case_nodes (case_node_ptr *, case_node_ptr);
static int node_has_low_bound (case_node_ptr, tree);
static int node_has_high_bound (case_node_ptr, tree);
@@ -2306,6 +2311,7 @@
rtx *labelvec;
int i;
rtx before_case, end, lab;
+ histogram_value hist;
tree vec = SWITCH_LABELS (exp);
tree orig_type = TREE_TYPE (exp);
@@ -2453,6 +2459,222 @@
case_list, default_label);
}
+ /* If a histogram for the condition is present, give preference
+ to generate a sequence of conditional branches. That sequence
+ will have the form of a balanced binary tree, where each node
+ is weighted with the frequency found in the histogram.
+
+ Since default cases must exhaust the binary search tree, this
+ optimization also checks for frequently occurring default
+ values, and artificially add nodes in the balanced tree to
+ represent those values. That way default cases can also be
+ optimized */
+
+ else if ((hist = gimple_histogram_value_of_type
+ (cfun, exp, HIST_TYPE_INTERVAL)) != NULL
+ && fdo_switch_use_histogram_p(hist))
+ {
+ gcov_type all;
+ int toobig, negative;
+ int i, k;
+ struct case_node *cn = 0;
+ int count_negs, count_bigs;
+ int cost_negs, cost_bigs;
+ int range_start, range_end;
+ int *hcounters;
+
+ index = expand_normal (index_expr);
+
+ /* If the index is a short or char that we do not have
+ an insn to handle comparisons directly, convert it to
+ a full integer now, rather than letting each comparison
+ generate the conversion. */
+
+ if (GET_MODE_CLASS (GET_MODE (index)) == MODE_INT
+ && ! have_insn_for (COMPARE, GET_MODE (index)))
+ {
+ enum machine_mode wider_mode;
+ for (wider_mode = GET_MODE (index); wider_mode != VOIDmode;
+ wider_mode = GET_MODE_WIDER_MODE (wider_mode))
+ if (have_insn_for (COMPARE, wider_mode))
+ {
+ index = convert_to_mode (wider_mode, index, unsignedp);
+ break;
+ }
+ }
+
+ do_pending_stack_adjust ();
+
+ if (MEM_P (index))
+ index = copy_to_reg (index);
+
+ /* Normalize the counters in the histogram */
+ hcounters = (int *) xcalloc (1, ((int)hist->hdata.intvl.steps + 2)
+ * sizeof(int));
+ all = 0;
+ for (i = 0; i < (int)hist->hdata.intvl.steps + 2; i++)
+ all += hist->hvalue.counters[i];
+ for (i = 0; i < (int)hist->hdata.intvl.steps + 2; i++)
+ hcounters[i] = (hist->hvalue.counters[i] * 1000) / all;
+ toobig = hcounters[hist->hdata.intvl.steps];
+ negative = hcounters[hist->hdata.intvl.steps + 1];
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "Switch transform based on this histogram:\n");
+ for (i = 0, k = 0; i < (int)hist->hdata.intvl.steps; i++, k++)
+ {
+ k &= 0x0F;
+ if (i == 0)
+ fprintf (dump_file, "%4d", i);
+ else if (k == 0)
+ fprintf (dump_file, "\n%4d", i);
+ fprintf (dump_file, " %4d", hcounters[i]);
+ }
+ fprintf (dump_file, "\n steps=%d toobig=%d negative=%d\n",
+ hist->hdata.intvl.steps, toobig, negative);
+ }
+
+ /* Our histogram is limited by a fixed range,
+ exam the case list for cases out of this range */
+ count_negs = 0;
+ count_bigs = 0;
+ for (cn = case_list; cn; cn = cn->right)
+ {
+ cn->fdo_cost = 0;
+ if (tree_int_cst_equal (cn->low, cn->high))
+ {
+ if (TREE_INT_CST_HIGH(cn->low) < 0)
+ count_negs++;
+ else if (TREE_INT_CST_HIGH(cn->low) > 0
+ || (TREE_INT_CST_HIGH(cn->low) == 0
+ && (TREE_INT_CST_LOW(cn->low)
+ >= hist->hdata.intvl.steps)))
+ count_bigs++;
+ /* single value, within range, transfer cost */
+ else
+ {
+ cn->fdo_cost = hcounters[TREE_INT_CST_LOW(cn->low)];
+ hcounters[TREE_INT_CST_LOW(cn->low)] = 0;
+ }
+ }
+ /* It is a range */
+ else
+ {
+ if (TREE_INT_CST_HIGH(cn->low) < 0)
+ count_negs += 2;
+ if (TREE_INT_CST_HIGH(cn->high) > 0
+ || (TREE_INT_CST_HIGH(cn->high) == 0
+ && (TREE_INT_CST_LOW(cn->high)
+ >= hist->hdata.intvl.steps)))
+ count_bigs += 2;
+ /* Transfer all costs within the range */
+ if (TREE_INT_CST_HIGH(cn->low) < 0)
+ range_start = 0;
+ else
+ range_start = TREE_INT_CST_LOW(cn->low);
+ if (TREE_INT_CST_HIGH(cn->high) > 0
+ || (TREE_INT_CST_HIGH(cn->high) == 0
+ && (TREE_INT_CST_LOW(cn->high)
+ >= hist->hdata.intvl.steps)))
+ range_end = (int)hist->hdata.intvl.steps - 1;
+ else if (TREE_INT_CST_HIGH(cn->high) < 0)
+ range_end = 0;
+ else
+ range_end = TREE_INT_CST_LOW(cn->high);
+ for (i = range_start; i <= range_end; i++)
+ {
+ cn->fdo_cost = hcounters[i];
+ hcounters[i] = 0;
+ }
+ }
+ }
+
+ /* Calculate the cost for each indiviual big/negative case */
+ if (count_bigs)
+ cost_bigs = toobig / count_bigs;
+ else
+ cost_bigs = 0;
+ if (count_negs)
+ cost_negs = negative / count_negs;
+ else
+ cost_negs = 0;
+
+ /* Distribute the cost of bigs and negatives uniformly */
+ for (cn = case_list; cn; cn = cn->right)
+ {
+ if (tree_int_cst_equal (cn->low, cn->high))
+ {
+ /* case is a single value */
+ if (TREE_INT_CST_HIGH(cn->low) < 0)
+ cn->fdo_cost = cost_negs;
+ else if (TREE_INT_CST_HIGH(cn->low) > 0
+ || (TREE_INT_CST_HIGH(cn->low) == 0
+ && (TREE_INT_CST_LOW(cn->low)
+ >= hist->hdata.intvl.steps)))
+ cn->fdo_cost = cost_bigs;
+ }
+ else
+ {
+ /* case is a range */
+ if (TREE_INT_CST_HIGH(cn->low) < 0)
+ cn->fdo_cost += 2 * cost_negs;
+ if (TREE_INT_CST_HIGH(cn->high) > 0
+ || (TREE_INT_CST_HIGH(cn->high) == 0
+ && (TREE_INT_CST_LOW(cn->high)
+ >= hist->hdata.intvl.steps)))
+ cn->fdo_cost += 2 * cost_bigs;
+ }
+ }
+ /* All left overs goes to "default" case node,
+ if we have a significant case, create an entry in
+ the case_node list for it */
+ for (i = 0; i < (int)hist->hdata.intvl.steps; i++)
+ if ((hcounters[i]) > 10
+ * PARAM_VALUE (PARAM_MIN_CASE_PROMOTION_RATIO))
+ {
+ tree cn_new = build_int_cst (NULL, i);
+
+ if (TREE_INT_CST_HIGH(case_list->high) == 0
+ && TREE_INT_CST_LOW(case_list->high) > (unsigned)i)
+ {
+ case_list = add_case_node (case_list, index_type, cn_new,
+ cn_new, default_label_decl,
+ case_node_pool);
+ case_list->fdo_cost = hcounters[i];
+ }
+ else
+ {
+ for (cn = case_list; cn; cn = cn->right)
+ if ((TREE_INT_CST_HIGH(cn->high) < 0
+ || (TREE_INT_CST_HIGH(cn->high) == 0
+ && TREE_INT_CST_LOW(cn->high) < (unsigned)i))
+ && (cn->right == NULL
+ || (TREE_INT_CST_HIGH(cn->right->low) == 0
+ && TREE_INT_CST_LOW(cn->right->low) > (unsigned)i)))
+ break;
+ gcc_assert (cn != NULL);
+ cn->right = add_case_node (cn->right, index_type, cn_new,
+ cn_new, default_label_decl,
+ case_node_pool);
+ cn->right->fdo_cost = hcounters[i];
+ }
+ if (dump_file)
+ fprintf (dump_file, "default case %d with cost %d promoted\n",
+ i, hcounters[i]);
+ hcounters[i] = 0;
+ }
+ free (hcounters);
+ if (dump_file)
+ for (cn = case_list; cn; cn = cn->right)
+ fprintf (dump_file, "case %d, cost %d\n",
+ (int)TREE_INT_CST_LOW(cn->low), cn->fdo_cost);
+ fdo_balance_case_nodes (&case_list, NULL);
+
+ emit_case_nodes (index, case_list, default_label, index_type);
+ emit_jump (default_label);
+ }
+
/* If range of values is much bigger than number of values,
make a sequence of conditional branches instead of a dispatch.
If the switch-index is a constant, do it this way
@@ -2692,6 +2914,23 @@
return 1;
}
+static bool
+fdo_switch_use_histogram_p (histogram_value hist)
+{
+ gcov_type all;
+ int i;
+
+ if (targetm.profitable_case_rebalance_p == NULL
+ || (targetm.profitable_case_rebalance_p() == false))
+ return false;
+
+ all = 0;
+ for (i = 0; i < (int)hist->hdata.intvl.steps + 2; i++)
+ all += hist->hvalue.counters[i];
+
+ return (all != 0);
+}
+
/* Take an ordered list of case nodes
and transform them into a near optimal binary tree,
on the assumption that any target code selection value is as
@@ -2703,6 +2942,85 @@
branch is then transformed recursively. */
static void
+fdo_balance_case_nodes (case_node_ptr *head, case_node_ptr parent)
+{
+ case_node_ptr np;
+
+ np = *head;
+ if (np)
+ {
+ int i = 0;
+ int cost = 0;
+ case_node_ptr *npp;
+ case_node_ptr left;
+
+ /* Count the number of entries on branch. */
+
+ while (np)
+ {
+ i++;
+ cost += np->fdo_cost;
+ np = np->right;
+ }
+
+ if (i > 2)
+ {
+ /* Split this list if it is long enough for that to help. */
+ npp = head;
+ left = *npp;
+ if (i == 3)
+ npp = &(*npp)->right;
+ else
+ {
+ if (cost == 0)
+ {
+ /* Ensure some progress in the recursive function
+ in case we got stuck with a large string of cost zero */
+ i = (i + 1) / 2;
+ while (1)
+ {
+ i--;
+ if (i <= 0)
+ break;
+ npp = &(*npp)->right;
+ }
+ }
+ else
+ {
+ /* Find the place in the list that bisects the list's
+ total cost. */
+ cost /= 2;
+ while (cost >= 0 && (*npp)->right != NULL)
+ {
+ cost -= (*npp)->fdo_cost;
+ npp = &(*npp)->right;
+ if ((cost - (*npp)->fdo_cost) < 0)
+ break;
+ }
+ }
+ }
+ *head = np = *npp;
+ *npp = 0;
+ np->parent = parent;
+ np->left = left;
+
+ /* Optimize each of the two split parts. */
+ fdo_balance_case_nodes (&np->left, np);
+ fdo_balance_case_nodes (&np->right, np);
+ }
+ else
+ {
+ /* Else leave this branch as one level,
+ but fill in `parent' fields. */
+ np = *head;
+ np->parent = parent;
+ for (; np->right; np = np->right)
+ np->right->parent = np;
+ }
+ }
+}
+
+static void
balance_case_nodes (case_node_ptr *head, case_node_ptr parent)
{
case_node_ptr np;
Index: gcc/gcc/params.def
===================================================================
--- gcc/gcc/params.def (revision 134561)
+++ gcc/gcc/params.def (working copy)
@@ -435,6 +435,19 @@
"The minimum cost of an expensive expression in the loop invariant motion",
20, 0, 0)
+/* The number steps in the interval histogram used to profile switch statement condition */
+DEFPARAM(PARAM_SWITCH_HISTOGRAM_SIZE,
+ "fdo-switch-histogram-size",
+ "The number steps in the interval histogram used to profile switch statement condition",
+ 256, 32, 0)
+
+/* The minimum frequency for which a default value in a switch
+ statement is promoted to a labeled case (percentage) */
+DEFPARAM(PARAM_MIN_CASE_PROMOTION_RATIO,
+ "fdo-min-case-promotion-rate",
+ "The minimum frequency for which a default value in a switch statement is promoted to a labeled case (percentage)",
+ 10, 0, 100)
+
/* Bound on number of candidates for induction variables below that
all candidates are considered for each use in induction variable
optimizations. */