This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFA] optimizing predictable branches on x86
> But I can also hide the cfun->function_frequency trick in
> DEFAULT_BRANCH_COST macro if it seems to help. (in longer term I hope
> they will all go away as expansion needs to be aware of hotness info
> anyway)
Well, it definitly helps. I originally hoped there will be fewer places
querying BRANCH_COST without profile info. I am testing updated patch.
Honza
* optabs.c (expand_abs_nojump): Update BRANCH_COST call.
* fold-cost.c (LOGICAL_OP_NON_SHORT_CIRCUIT, fold_truthop): Likewise.
* dojump.c (do_jump): Likewise.
* ifcvt.c (MAX_CONDITIONAL_EXECUTE): Likewise.
(note-if_info): Add BRANCH_COST.
(noce_try_store_flag_constants, noce_try_addcc, noce_try_store_flag_mask,
noce_try_cmove_arith, noce_try_cmove_arith, noce_try_cmove_arith,
noce_find_if_block, find_if_case_1, find_if_case_2): Use compuated
branch cost.
* expr.h (BRANCH_COST): Update default.
(DEFAULT_BRANCH_COST): Define.
* predict.c (predictable_edge_p): New function.
* expmed.c (expand_smod_pow2, expand_sdiv_pow2, emit_store_flag):
Update BRANCH_COST call.
* basic-block.h (predictable_edge_p): Declare.
* config/alpha/alpha.h (BRANCH_COST): Update.
* config/frv/frv.h (BRANCH_COST): Update.
* config/s390/s390.h (BRANCH_COST): Update.
* config/spu/spu.h (BRANCH_COST): Update.
* config/sparc/sparc.h (BRANCH_COST): Update.
* config/m32r/m32r.h (BRANCH_COST): Update.
* config/i386/i386.h (BRANCH_COST): Update.
* config/i386/i386.c (ix86_expand_int_movcc): Update use of BRANCH_COST.
* config/sh/sh.h (BRANCH_COST): Update.
* config/pdp11/pdp11.h (BRANCH_COST): Update.
* config/avr/avr.h (BRANCH_COST): Update.
* config/crx/crx.h (BRANCH_COST): Update.
* config/xtensa/xtensa.h (BRANCH_COST): Update.
* config/stormy16/stormy16.h (BRANCH_COST): Update.
* config/m68hc11/m68hc11.h (BRANCH_COST): Update.
* config/iq2000/iq2000.h (BRANCH_COST): Update.
* config/ia64/ia64.h (BRANCH_COST): Update.
* config/rs6000/rs6000.h (BRANCH_COST): Update.
* config/arc/arc.h (BRANCH_COST): Update.
* config/score/score.h (BRANCH_COST): Update.
* config/arm/arm.h (BRANCH_COST): Update.
* config/pa/pa.h (BRANCH_COST): Update.
* config/mips/mips.h (BRANCH_COST): Update.
* config/vax/vax.h (BRANCH_COST): Update.
* config/h8300/h8300.h (BRANCH_COST): Update.
* params.def (PARAM_PREDICTABLE_BRANCH_OUTCOME): New.
* doc/invoke.texi (predictable-branch-cost-outcome): Document.
* doc/tm.texi (BRANCH_COST): Update.
Index: doc/tm.texi
===================================================================
*** doc/tm.texi (revision 132800)
--- doc/tm.texi (working copy)
*************** value to the result of that function. T
*** 5828,5836 ****
are the same as to this macro.
@end defmac
! @defmac BRANCH_COST
! A C expression for the cost of a branch instruction. A value of 1 is
! the default; other values are interpreted relative to that.
@end defmac
Here are additional macros which do not specify precise relative costs,
--- 5828,5841 ----
are the same as to this macro.
@end defmac
! @defmac BRANCH_COST (@var{hot_p}, @var{predictable_p})
! A C expression for the cost of a branch instruction. A value of 1 is the
! default; other values are interpreted relative to that. Parameter @var{hot_p}
! is true when the branch in question might be hot in the compiled program. When
! it is false, @code{BRANCH_COST} should be returning value optimal for code size
! rather then performance considerations. @var{predictable_p} is true for well
! predictable branches. On many architectures the @code{BRANCH_COST} can be
! reduced then.
@end defmac
Here are additional macros which do not specify precise relative costs,
Index: doc/invoke.texi
===================================================================
*** doc/invoke.texi (revision 132800)
--- doc/invoke.texi (working copy)
*************** to the hottest structure frequency in th
*** 6807,6812 ****
--- 6807,6816 ----
parameter, then structure reorganization is not applied to this structure.
The default is 10.
+ @item predictable-branch-cost-outcome
+ When branch is predicted to be taken with probability lower than this threshold
+ (in percent), then it is considered well predictable. The default is 10.
+
@item max-crossjump-edges
The maximum number of incoming edges to consider for crossjumping.
The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in
Index: optabs.c
===================================================================
*** optabs.c (revision 132800)
--- optabs.c (working copy)
*************** expand_abs_nojump (enum machine_mode mod
*** 3425,3431 ****
value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)),
where W is the width of MODE. */
! if (GET_MODE_CLASS (mode) == MODE_INT && BRANCH_COST >= 2)
{
rtx extended = expand_shift (RSHIFT_EXPR, mode, op0,
size_int (GET_MODE_BITSIZE (mode) - 1),
--- 3425,3432 ----
value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)),
where W is the width of MODE. */
! if (GET_MODE_CLASS (mode) == MODE_INT
! && DEFAULT_BRANCH_COST)
{
rtx extended = expand_shift (RSHIFT_EXPR, mode, op0,
size_int (GET_MODE_BITSIZE (mode) - 1),
Index: fold-const.c
===================================================================
*** fold-const.c (revision 132800)
--- fold-const.c (working copy)
*************** fold_cond_expr_with_comparison (tree typ
*** 5317,5323 ****
#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
! #define LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST >= 2)
#endif
/* EXP is some logical combination of boolean tests. See if we can
--- 5317,5323 ----
#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
! #define LOGICAL_OP_NON_SHORT_CIRCUIT (DEFAULT_BRANCH_COST >= 2)
#endif
/* EXP is some logical combination of boolean tests. See if we can
*************** fold_truthop (enum tree_code code, tree
*** 5565,5571 ****
that can be merged. Avoid doing this if the RHS is a floating-point
comparison since those can trap. */
! if (BRANCH_COST >= 2
&& ! FLOAT_TYPE_P (TREE_TYPE (rl_arg))
&& simple_operand_p (rl_arg)
&& simple_operand_p (rr_arg))
--- 5565,5571 ----
that can be merged. Avoid doing this if the RHS is a floating-point
comparison since those can trap. */
! if (DEFAULT_BRANCH_COST >= 2
&& ! FLOAT_TYPE_P (TREE_TYPE (rl_arg))
&& simple_operand_p (rl_arg)
&& simple_operand_p (rr_arg))
Index: dojump.c
===================================================================
*** dojump.c (revision 132800)
--- dojump.c (working copy)
*************** do_jump (tree exp, rtx if_false_label, r
*** 515,521 ****
/* High branch cost, expand as the bitwise AND of the conditions.
Do the same if the RHS has side effects, because we're effectively
turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR. */
! if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
goto normal;
if (if_false_label == NULL_RTX)
--- 515,522 ----
/* High branch cost, expand as the bitwise AND of the conditions.
Do the same if the RHS has side effects, because we're effectively
turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR. */
! if (DEFAULT_BRANCH_COST >= 4
! || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
goto normal;
if (if_false_label == NULL_RTX)
*************** do_jump (tree exp, rtx if_false_label, r
*** 535,541 ****
/* High branch cost, expand as the bitwise OR of the conditions.
Do the same if the RHS has side effects, because we're effectively
turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */
! if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
goto normal;
if (if_true_label == NULL_RTX)
--- 536,543 ----
/* High branch cost, expand as the bitwise OR of the conditions.
Do the same if the RHS has side effects, because we're effectively
turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */
! if (DEFAULT_BRANCH_COST >= 4
! || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
goto normal;
if (if_true_label == NULL_RTX)
Index: ipa-inline.c
===================================================================
*** ipa-inline.c (revision 132800)
--- ipa-inline.c (working copy)
*************** cgraph_decide_inlining_of_small_function
*** 925,931 ****
not_good = N_("function not declared inline and code size would grow");
if (optimize_size)
not_good = N_("optimizing for size and code size would grow");
! if (not_good && growth > 0)
{
if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
&edge->inline_failed))
--- 925,931 ----
not_good = N_("function not declared inline and code size would grow");
if (optimize_size)
not_good = N_("optimizing for size and code size would grow");
! if (not_good && growth > 0 && cgraph_estimate_growth (edge->callee))
{
if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
&edge->inline_failed))
Index: ifcvt.c
===================================================================
*** ifcvt.c (revision 132800)
--- ifcvt.c (working copy)
***************
*** 67,73 ****
#endif
#ifndef MAX_CONDITIONAL_EXECUTE
! #define MAX_CONDITIONAL_EXECUTE (BRANCH_COST + 1)
#endif
#define IFCVT_MULTIPLE_DUMPS 1
--- 67,73 ----
#endif
#ifndef MAX_CONDITIONAL_EXECUTE
! #define MAX_CONDITIONAL_EXECUTE (DEFAULT_BRANCH_COST + 1)
#endif
#define IFCVT_MULTIPLE_DUMPS 1
*************** struct noce_if_info
*** 626,631 ****
--- 626,634 ----
from TEST_BB. For the noce transformations, we allow the symmetric
form as well. */
bool then_else_reversed;
+
+ /* Estimated cost of the particular branch instruction. */
+ int branch_cost;
};
static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
*************** noce_try_store_flag_constants (struct no
*** 963,982 ****
normalize = 0;
else if (ifalse == 0 && exact_log2 (itrue) >= 0
&& (STORE_FLAG_VALUE == 1
! || BRANCH_COST >= 2))
normalize = 1;
else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
! && (STORE_FLAG_VALUE == 1 || BRANCH_COST >= 2))
normalize = 1, reversep = 1;
else if (itrue == -1
&& (STORE_FLAG_VALUE == -1
! || BRANCH_COST >= 2))
normalize = -1;
else if (ifalse == -1 && can_reverse
! && (STORE_FLAG_VALUE == -1 || BRANCH_COST >= 2))
normalize = -1, reversep = 1;
! else if ((BRANCH_COST >= 2 && STORE_FLAG_VALUE == -1)
! || BRANCH_COST >= 3)
normalize = -1;
else
return FALSE;
--- 966,985 ----
normalize = 0;
else if (ifalse == 0 && exact_log2 (itrue) >= 0
&& (STORE_FLAG_VALUE == 1
! || if_info->branch_cost >= 2))
normalize = 1;
else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
! && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
normalize = 1, reversep = 1;
else if (itrue == -1
&& (STORE_FLAG_VALUE == -1
! || if_info->branch_cost >= 2))
normalize = -1;
else if (ifalse == -1 && can_reverse
! && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
normalize = -1, reversep = 1;
! else if ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
! || if_info->branch_cost >= 3)
normalize = -1;
else
return FALSE;
*************** noce_try_addcc (struct noce_if_info *if_
*** 1107,1113 ****
/* If that fails, construct conditional increment or decrement using
setcc. */
! if (BRANCH_COST >= 2
&& (XEXP (if_info->a, 1) == const1_rtx
|| XEXP (if_info->a, 1) == constm1_rtx))
{
--- 1110,1116 ----
/* If that fails, construct conditional increment or decrement using
setcc. */
! if (if_info->branch_cost >= 2
&& (XEXP (if_info->a, 1) == const1_rtx
|| XEXP (if_info->a, 1) == constm1_rtx))
{
*************** noce_try_store_flag_mask (struct noce_if
*** 1158,1164 ****
int reversep;
reversep = 0;
! if ((BRANCH_COST >= 2
|| STORE_FLAG_VALUE == -1)
&& ((if_info->a == const0_rtx
&& rtx_equal_p (if_info->b, if_info->x))
--- 1161,1167 ----
int reversep;
reversep = 0;
! if ((if_info->branch_cost >= 2
|| STORE_FLAG_VALUE == -1)
&& ((if_info->a == const0_rtx
&& rtx_equal_p (if_info->b, if_info->x))
*************** noce_try_cmove_arith (struct noce_if_inf
*** 1317,1323 ****
/* ??? FIXME: Magic number 5. */
if (cse_not_expected
&& MEM_P (a) && MEM_P (b)
! && BRANCH_COST >= 5)
{
a = XEXP (a, 0);
b = XEXP (b, 0);
--- 1320,1326 ----
/* ??? FIXME: Magic number 5. */
if (cse_not_expected
&& MEM_P (a) && MEM_P (b)
! && if_info->branch_cost >= 5)
{
a = XEXP (a, 0);
b = XEXP (b, 0);
*************** noce_try_cmove_arith (struct noce_if_inf
*** 1347,1353 ****
if (insn_a)
{
insn_cost = insn_rtx_cost (PATTERN (insn_a));
! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST))
return FALSE;
}
else
--- 1350,1356 ----
if (insn_a)
{
insn_cost = insn_rtx_cost (PATTERN (insn_a));
! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
return FALSE;
}
else
*************** noce_try_cmove_arith (struct noce_if_inf
*** 1356,1362 ****
if (insn_b)
{
insn_cost += insn_rtx_cost (PATTERN (insn_b));
! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST))
return FALSE;
}
--- 1359,1365 ----
if (insn_b)
{
insn_cost += insn_rtx_cost (PATTERN (insn_b));
! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
return FALSE;
}
*************** noce_find_if_block (basic_block test_bb,
*** 2803,2808 ****
--- 2806,2813 ----
if_info.cond_earliest = cond_earliest;
if_info.jump = jump;
if_info.then_else_reversed = then_else_reversed;
+ if_info.branch_cost = BRANCH_COST (maybe_hot_bb_p (test_bb),
+ predictable_edge_p (then_edge));
/* Do the real work. */
*************** find_if_case_1 (basic_block test_bb, edg
*** 3569,3575 ****
test_bb->index, then_bb->index);
/* THEN is small. */
! if (! cheap_bb_rtx_cost_p (then_bb, COSTS_N_INSNS (BRANCH_COST)))
return FALSE;
/* Registers set are dead, or are predicable. */
--- 3574,3582 ----
test_bb->index, then_bb->index);
/* THEN is small. */
! if (! cheap_bb_rtx_cost_p (then_bb,
! COSTS_N_INSNS (BRANCH_COST (maybe_hot_bb_p (then_edge->src),
! predictable_edge_p (then_edge)))))
return FALSE;
/* Registers set are dead, or are predicable. */
*************** find_if_case_2 (basic_block test_bb, edg
*** 3683,3689 ****
test_bb->index, else_bb->index);
/* ELSE is small. */
! if (! cheap_bb_rtx_cost_p (else_bb, COSTS_N_INSNS (BRANCH_COST)))
return FALSE;
/* Registers set are dead, or are predicable. */
--- 3690,3698 ----
test_bb->index, else_bb->index);
/* ELSE is small. */
! if (! cheap_bb_rtx_cost_p (else_bb,
! COSTS_N_INSNS (BRANCH_COST (maybe_hot_bb_p (else_edge->src),
! predictable_edge_p (else_edge)))))
return FALSE;
/* Registers set are dead, or are predicable. */
Index: expr.h
===================================================================
*** expr.h (revision 132800)
--- expr.h (working copy)
*************** along with GCC; see the file COPYING3.
*** 36,44 ****
/* The default branch cost is 1. */
#ifndef BRANCH_COST
! #define BRANCH_COST 1
#endif
/* This is the 4th arg to `expand_expr'.
EXPAND_STACK_PARM means we are possibly expanding a call param onto
the stack.
--- 36,52 ----
/* The default branch cost is 1. */
#ifndef BRANCH_COST
! #define BRANCH_COST(hot_p, predictable_p) 1
#endif
+ /* When profile information is not known, make conservative assumptions. Use
+ of this macro should be avoided in favour of BRANCH_COST. */
+ #define DEFAULT_BRANCH_COST \
+ BRANCH_COST (optimize_size \
+ ? 0 \
+ : !cfun || cfun->function_frequency > FUNCTION_FREQUENCY_NORMAL,\
+ false)
+
/* This is the 4th arg to `expand_expr'.
EXPAND_STACK_PARM means we are possibly expanding a call param onto
the stack.
Index: predict.c
===================================================================
*** predict.c (revision 132800)
--- predict.c (working copy)
*************** gate_estimate_probability (void)
*** 1915,1920 ****
--- 1923,1944 ----
return flag_guess_branch_prob;
}
+ /* Return true when edge E is likely to be well predictable by branch
+ predictor. */
+
+ bool
+ predictable_edge_p (edge e)
+ {
+ if (profile_status == PROFILE_ABSENT)
+ return false;
+ if ((e->probability
+ <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100)
+ || (REG_BR_PROB_BASE - e->probability
+ <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100))
+ return true;
+ return false;
+ }
+
struct tree_opt_pass pass_profile =
{
"profile", /* name */
Index: expmed.c
===================================================================
*** expmed.c (revision 132800)
--- expmed.c (working copy)
*************** expand_smod_pow2 (enum machine_mode mode
*** 3560,3566 ****
result = gen_reg_rtx (mode);
/* Avoid conditional branches when they're expensive. */
! if (BRANCH_COST >= 2
&& !optimize_size)
{
rtx signmask = emit_store_flag (result, LT, op0, const0_rtx,
--- 3560,3566 ----
result = gen_reg_rtx (mode);
/* Avoid conditional branches when they're expensive. */
! if (DEFAULT_BRANCH_COST >= 2
&& !optimize_size)
{
rtx signmask = emit_store_flag (result, LT, op0, const0_rtx,
*************** expand_sdiv_pow2 (enum machine_mode mode
*** 3660,3666 ****
logd = floor_log2 (d);
shift = build_int_cst (NULL_TREE, logd);
! if (d == 2 && BRANCH_COST >= 1)
{
temp = gen_reg_rtx (mode);
temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1);
--- 3660,3667 ----
logd = floor_log2 (d);
shift = build_int_cst (NULL_TREE, logd);
! if (d == 2
! && DEFAULT_BRANCH_COST >= 1)
{
temp = gen_reg_rtx (mode);
temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1);
*************** expand_sdiv_pow2 (enum machine_mode mode
*** 3670,3676 ****
}
#ifdef HAVE_conditional_move
! if (BRANCH_COST >= 2)
{
rtx temp2;
--- 3671,3677 ----
}
#ifdef HAVE_conditional_move
! if (DEFAULT_BRANCH_COST >= 2)
{
rtx temp2;
*************** expand_sdiv_pow2 (enum machine_mode mode
*** 3699,3705 ****
}
#endif
! if (BRANCH_COST >= 2)
{
int ushift = GET_MODE_BITSIZE (mode) - logd;
--- 3700,3706 ----
}
#endif
! if (DEFAULT_BRANCH_COST >= 2)
{
int ushift = GET_MODE_BITSIZE (mode) - logd;
*************** emit_store_flag (rtx target, enum rtx_co
*** 5413,5419 ****
comparison with zero. Don't do any of these cases if branches are
very cheap. */
! if (BRANCH_COST > 0
&& GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE)
&& op1 != const0_rtx)
{
--- 5414,5420 ----
comparison with zero. Don't do any of these cases if branches are
very cheap. */
! if (DEFAULT_BRANCH_COST > 0
&& GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE)
&& op1 != const0_rtx)
{
*************** emit_store_flag (rtx target, enum rtx_co
*** 5436,5445 ****
do LE and GT if branches are expensive since they are expensive on
2-operand machines. */
! if (BRANCH_COST == 0
|| GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx
|| (code != EQ && code != NE
! && (BRANCH_COST <= 1 || (code != LE && code != GT))))
return 0;
/* See what we need to return. We can only return a 1, -1, or the
--- 5437,5446 ----
do LE and GT if branches are expensive since they are expensive on
2-operand machines. */
! if (DEFAULT_BRANCH_COST == 0
|| GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx
|| (code != EQ && code != NE
! && (DEFAULT_BRANCH_COST <= 1 || (code != LE && code != GT))))
return 0;
/* See what we need to return. We can only return a 1, -1, or the
*************** emit_store_flag (rtx target, enum rtx_co
*** 5535,5541 ****
that "or", which is an extra insn, so we only handle EQ if branches
are expensive. */
! if (tem == 0 && (code == NE || BRANCH_COST > 1))
{
if (rtx_equal_p (subtarget, op0))
subtarget = 0;
--- 5536,5544 ----
that "or", which is an extra insn, so we only handle EQ if branches
are expensive. */
! if (tem == 0
! && (code == NE
! || DEFAULT_BRANCH_COST > 1))
{
if (rtx_equal_p (subtarget, op0))
subtarget = 0;
Index: basic-block.h
===================================================================
*** basic-block.h (revision 132800)
--- basic-block.h (working copy)
*************** extern void guess_outgoing_edge_probabil
*** 839,844 ****
--- 839,845 ----
extern void remove_predictions_associated_with_edge (edge);
extern bool edge_probability_reliable_p (const_edge);
extern bool br_prob_note_reliable_p (const_rtx);
+ extern bool predictable_edge_p (edge);
/* In cfg.c */
extern void dump_regset (regset, FILE *);
Index: config/alpha/alpha.h
===================================================================
*** config/alpha/alpha.h (revision 132800)
--- config/alpha/alpha.h (working copy)
*************** extern int alpha_memory_latency;
*** 631,637 ****
#define MEMORY_MOVE_COST(MODE,CLASS,IN) (2*alpha_memory_latency)
/* Provide the cost of a branch. Exact meaning under development. */
! #define BRANCH_COST 5
/* Stack layout; function entry, exit and calling. */
--- 631,637 ----
#define MEMORY_MOVE_COST(MODE,CLASS,IN) (2*alpha_memory_latency)
/* Provide the cost of a branch. Exact meaning under development. */
! #define BRANCH_COST(hot_p, predictable_p) 5
/* Stack layout; function entry, exit and calling. */
Index: config/frv/frv.h
===================================================================
*** config/frv/frv.h (revision 132800)
--- config/frv/frv.h (working copy)
*************** do { \
*** 2193,2199 ****
/* A C expression for the cost of a branch instruction. A value of 1 is the
default; other values are interpreted relative to that. */
! #define BRANCH_COST frv_branch_cost_int
/* Define this macro as a C expression which is nonzero if accessing less than
a word of memory (i.e. a `char' or a `short') is no faster than accessing a
--- 2193,2199 ----
/* A C expression for the cost of a branch instruction. A value of 1 is the
default; other values are interpreted relative to that. */
! #define BRANCH_COST(hot_p, predictable_p) frv_branch_cost_int
/* Define this macro as a C expression which is nonzero if accessing less than
a word of memory (i.e. a `char' or a `short') is no faster than accessing a
Index: config/s390/s390.h
===================================================================
*** config/s390/s390.h (revision 132800)
--- config/s390/s390.h (working copy)
*************** extern struct rtx_def *s390_compare_op0,
*** 780,786 ****
/* A C expression for the cost of a branch instruction. A value of 1
is the default; other values are interpreted relative to that. */
! #define BRANCH_COST 1
/* Nonzero if access to memory by bytes is slow and undesirable. */
#define SLOW_BYTE_ACCESS 1
--- 780,786 ----
/* A C expression for the cost of a branch instruction. A value of 1
is the default; other values are interpreted relative to that. */
! #define BRANCH_COST(hot_p, predictable_p) 1
/* Nonzero if access to memory by bytes is slow and undesirable. */
#define SLOW_BYTE_ACCESS 1
Index: config/spu/spu.h
===================================================================
*** config/spu/spu.h (revision 132800)
--- config/spu/spu.h (working copy)
*************** targetm.resolve_overloaded_builtin = spu
*** 456,462 ****
/* Costs */
! #define BRANCH_COST spu_branch_cost
#define SLOW_BYTE_ACCESS 0
--- 456,462 ----
/* Costs */
! #define BRANCH_COST(hot_p, predictable_p) spu_branch_cost
#define SLOW_BYTE_ACCESS 0
Index: config/sparc/sparc.h
===================================================================
*** config/sparc/sparc.h (revision 132800)
--- config/sparc/sparc.h (working copy)
*************** do {
*** 2180,2186 ****
On Niagara-2, a not-taken branch costs 1 cycle whereas a taken
branch costs 6 cycles. */
! #define BRANCH_COST \
((sparc_cpu == PROCESSOR_V9 \
|| sparc_cpu == PROCESSOR_ULTRASPARC) \
? 7 \
--- 2180,2186 ----
On Niagara-2, a not-taken branch costs 1 cycle whereas a taken
branch costs 6 cycles. */
! #define BRANCH_COST (hot_p, predictable_p) \
((sparc_cpu == PROCESSOR_V9 \
|| sparc_cpu == PROCESSOR_ULTRASPARC) \
? 7 \
Index: config/m32r/m32r.h
===================================================================
*** config/m32r/m32r.h (revision 132800)
--- config/m32r/m32r.h (working copy)
*************** L2: .word STATIC
*** 1219,1225 ****
/* A value of 2 here causes GCC to avoid using branches in comparisons like
while (a < N && a). Branches aren't that expensive on the M32R so
we define this as 1. Defining it as 2 had a heavy hit in fp-bit.c. */
! #define BRANCH_COST ((TARGET_BRANCH_COST) ? 2 : 1)
/* Nonzero if access to memory by bytes is slow and undesirable.
For RISC chips, it means that access to memory by bytes is no
--- 1219,1225 ----
/* A value of 2 here causes GCC to avoid using branches in comparisons like
while (a < N && a). Branches aren't that expensive on the M32R so
we define this as 1. Defining it as 2 had a heavy hit in fp-bit.c. */
! #define BRANCH_COST(hot_p, predictable_p) ((TARGET_BRANCH_COST) ? 2 : 1)
/* Nonzero if access to memory by bytes is slow and undesirable.
For RISC chips, it means that access to memory by bytes is no
Index: config/i386/i386.h
===================================================================
*** config/i386/i386.h (revision 132800)
--- config/i386/i386.h (working copy)
*************** do { \
*** 2052,2058 ****
/* A C expression for the cost of a branch instruction. A value of 1
is the default; other values are interpreted relative to that. */
! #define BRANCH_COST ix86_branch_cost
/* Define this macro as a C expression which is nonzero if accessing
less than a word of memory (i.e. a `char' or a `short') is no
--- 2052,2059 ----
/* A C expression for the cost of a branch instruction. A value of 1
is the default; other values are interpreted relative to that. */
! #define BRANCH_COST(hot_p, predictable_p) \
! (!(hot_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost)
/* Define this macro as a C expression which is nonzero if accessing
less than a word of memory (i.e. a `char' or a `short') is no
Index: config/i386/i386.c
===================================================================
*** config/i386/i386.c (revision 132800)
--- config/i386/i386.c (working copy)
*************** ix86_expand_int_movcc (rtx operands[])
*** 12819,12825 ****
*/
if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL))
! && BRANCH_COST >= 2)
{
if (cf == 0)
{
--- 12819,12826 ----
*/
if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL))
! && BRANCH_COST (cfun->function_frequency >= FUNCTION_FREQUENCY_NORMAL,
! false) >= 2)
{
if (cf == 0)
{
*************** ix86_expand_int_movcc (rtx operands[])
*** 12904,12910 ****
optab op;
rtx var, orig_out, out, tmp;
! if (BRANCH_COST <= 2)
return 0; /* FAIL */
/* If one of the two operands is an interesting constant, load a
--- 12905,12912 ----
optab op;
rtx var, orig_out, out, tmp;
! if (BRANCH_COST (cfun->function_frequency >= FUNCTION_FREQUENCY_NORMAL,
! false) <= 2)
return 0; /* FAIL */
/* If one of the two operands is an interesting constant, load a
Index: config/sh/sh.h
===================================================================
*** config/sh/sh.h (revision 132800)
--- config/sh/sh.h (working copy)
*************** struct sh_args {
*** 2822,2828 ****
The SH1 does not have delay slots, hence we get a pipeline stall
at every branch. The SH4 is superscalar, so the single delay slot
is not sufficient to keep both pipelines filled. */
! #define BRANCH_COST (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1)
/* Assembler output control. */
--- 2822,2829 ----
The SH1 does not have delay slots, hence we get a pipeline stall
at every branch. The SH4 is superscalar, so the single delay slot
is not sufficient to keep both pipelines filled. */
! #define BRANCH_COST(hot_p, predictable_p) \
! (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1)
/* Assembler output control. */
Index: config/pdp11/pdp11.h
===================================================================
*** config/pdp11/pdp11.h (revision 132800)
--- config/pdp11/pdp11.h (working copy)
*************** JMP FUNCTION 0x0058 0x0000 <- FUNCTION
*** 1059,1065 ****
/* there is no point in avoiding branches on a pdp,
since branches are really cheap - I just want to find out
how much difference the BRANCH_COST macro makes in code */
! #define BRANCH_COST (TARGET_BRANCH_CHEAP ? 0 : 1)
#define COMPARE_FLAG_MODE HImode
--- 1059,1065 ----
/* there is no point in avoiding branches on a pdp,
since branches are really cheap - I just want to find out
how much difference the BRANCH_COST macro makes in code */
! #define BRANCH_COST(hot_p, predictable_p) (TARGET_BRANCH_CHEAP ? 0 : 1)
#define COMPARE_FLAG_MODE HImode
Index: config/avr/avr.h
===================================================================
*** config/avr/avr.h (revision 132800)
--- config/avr/avr.h (working copy)
*************** do { \
*** 481,487 ****
(MODE)==SImode ? 8 : \
(MODE)==SFmode ? 8 : 16)
! #define BRANCH_COST 0
#define SLOW_BYTE_ACCESS 0
--- 481,487 ----
(MODE)==SImode ? 8 : \
(MODE)==SFmode ? 8 : 16)
! #define BRANCH_COST(hot_p, predictable_p) 0
#define SLOW_BYTE_ACCESS 0
Index: config/crx/crx.h
===================================================================
*** config/crx/crx.h (revision 132800)
--- config/crx/crx.h (working copy)
*************** struct cumulative_args
*** 420,426 ****
/* Moving to processor register flushes pipeline - thus asymmetric */
#define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2)
/* Assume best case (branch predicted) */
! #define BRANCH_COST 2
#define SLOW_BYTE_ACCESS 1
--- 420,426 ----
/* Moving to processor register flushes pipeline - thus asymmetric */
#define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2)
/* Assume best case (branch predicted) */
! #define BRANCH_COST(hot_p, predictable_p) 2
#define SLOW_BYTE_ACCESS 1
Index: config/xtensa/xtensa.h
===================================================================
*** config/xtensa/xtensa.h (revision 132800)
--- config/xtensa/xtensa.h (working copy)
*************** typedef struct xtensa_args
*** 898,904 ****
#define MEMORY_MOVE_COST(MODE, CLASS, IN) 4
! #define BRANCH_COST 3
/* How to refer to registers in assembler output.
This sequence is indexed by compiler's hard-register-number (see above). */
--- 898,904 ----
#define MEMORY_MOVE_COST(MODE, CLASS, IN) 4
! #define BRANCH_COST(hot_p, predictable_p) 3
/* How to refer to registers in assembler output.
This sequence is indexed by compiler's hard-register-number (see above). */
Index: config/stormy16/stormy16.h
===================================================================
*** config/stormy16/stormy16.h (revision 132800)
--- config/stormy16/stormy16.h (working copy)
*************** do { \
*** 582,588 ****
#define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I))
! #define BRANCH_COST 5
#define SLOW_BYTE_ACCESS 0
--- 582,588 ----
#define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I))
! #define BRANCH_COST(hot_p, predictable_p) 5
#define SLOW_BYTE_ACCESS 0
Index: config/m68hc11/m68hc11.h
===================================================================
*** config/m68hc11/m68hc11.h (revision 132800)
--- config/m68hc11/m68hc11.h (working copy)
*************** extern unsigned char m68hc11_reg_valid_f
*** 1266,1272 ****
Pretend branches are cheap because GCC generates sub-optimal code
for the default value. */
! #define BRANCH_COST 0
/* Nonzero if access to memory by bytes is slow and undesirable. */
#define SLOW_BYTE_ACCESS 0
--- 1266,1272 ----
Pretend branches are cheap because GCC generates sub-optimal code
for the default value. */
! #define BRANCH_COST(hot_p, predictable_p) 0
/* Nonzero if access to memory by bytes is slow and undesirable. */
#define SLOW_BYTE_ACCESS 0
Index: config/iq2000/iq2000.h
===================================================================
*** config/iq2000/iq2000.h (revision 132800)
--- config/iq2000/iq2000.h (working copy)
*************** typedef struct iq2000_args
*** 620,626 ****
#define MEMORY_MOVE_COST(MODE,CLASS,TO_P) \
(TO_P ? 2 : 16)
! #define BRANCH_COST 2
#define SLOW_BYTE_ACCESS 1
--- 620,626 ----
#define MEMORY_MOVE_COST(MODE,CLASS,TO_P) \
(TO_P ? 2 : 16)
! #define BRANCH_COST(hot_p, predictable_p) 2
#define SLOW_BYTE_ACCESS 1
Index: config/ia64/ia64.h
===================================================================
*** config/ia64/ia64.h (revision 132800)
--- config/ia64/ia64.h (working copy)
*************** do { \
*** 1371,1377 ****
many additional insn groups we run into, vs how good the dynamic
branch predictor is. */
! #define BRANCH_COST 6
/* Define this macro as a C expression which is nonzero if accessing less than
a word of memory (i.e. a `char' or a `short') is no faster than accessing a
--- 1371,1377 ----
many additional insn groups we run into, vs how good the dynamic
branch predictor is. */
! #define BRANCH_COST(hot_p, predictable_p) 6
/* Define this macro as a C expression which is nonzero if accessing less than
a word of memory (i.e. a `char' or a `short') is no faster than accessing a
Index: config/rs6000/rs6000.h
===================================================================
*** config/rs6000/rs6000.h (revision 132800)
--- config/rs6000/rs6000.h (working copy)
*************** extern enum rs6000_nop_insertion rs6000_
*** 950,956 ****
Set this to 3 on the RS/6000 since that is roughly the average cost of an
unscheduled conditional branch. */
! #define BRANCH_COST 3
/* Override BRANCH_COST heuristic which empirically produces worse
performance for removing short circuiting from the logical ops. */
--- 950,956 ----
Set this to 3 on the RS/6000 since that is roughly the average cost of an
unscheduled conditional branch. */
! #define BRANCH_COST(hot_p, predictable_p) 3
/* Override BRANCH_COST heuristic which empirically produces worse
performance for removing short circuiting from the logical ops. */
Index: config/arc/arc.h
===================================================================
*** config/arc/arc.h (revision 132800)
--- config/arc/arc.h (working copy)
*************** arc_select_cc_mode (OP, X, Y)
*** 824,830 ****
/* The cost of a branch insn. */
/* ??? What's the right value here? Branches are certainly more
expensive than reg->reg moves. */
! #define BRANCH_COST 2
/* Nonzero if access to memory by bytes is slow and undesirable.
For RISC chips, it means that access to memory by bytes is no
--- 824,830 ----
/* The cost of a branch insn. */
/* ??? What's the right value here? Branches are certainly more
expensive than reg->reg moves. */
! #define BRANCH_COST(hot_p, predictable_p) 2
/* Nonzero if access to memory by bytes is slow and undesirable.
For RISC chips, it means that access to memory by bytes is no
Index: config/score/score.h
===================================================================
*** config/score/score.h (revision 132800)
--- config/score/score.h (working copy)
*************** typedef struct score_args
*** 795,801 ****
(4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P)))
/* Try to generate sequences that don't involve branches. */
! #define BRANCH_COST 2
/* Nonzero if access to memory by bytes is slow and undesirable. */
#define SLOW_BYTE_ACCESS 1
--- 795,801 ----
(4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P)))
/* Try to generate sequences that don't involve branches. */
! #define BRANCH_COST(hot_p, predictable_p) 2
/* Nonzero if access to memory by bytes is slow and undesirable. */
#define SLOW_BYTE_ACCESS 1
Index: config/arm/arm.h
===================================================================
*** config/arm/arm.h (revision 132800)
--- config/arm/arm.h (working copy)
*************** do { \
*** 2271,2277 ****
/* Try to generate sequences that don't involve branches, we can then use
conditional instructions */
! #define BRANCH_COST \
(TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0))
/* Position Independent Code. */
--- 2271,2277 ----
/* Try to generate sequences that don't involve branches, we can then use
conditional instructions */
! #define BRANCH_COST(hot_p, predictable_p) \
(TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0))
/* Position Independent Code. */
Index: config/pa/pa.h
===================================================================
*** config/pa/pa.h (revision 132800)
--- config/pa/pa.h (working copy)
*************** do { \
*** 1569,1575 ****
: 2)
/* Adjust the cost of branches. */
! #define BRANCH_COST (pa_cpu == PROCESSOR_8000 ? 2 : 1)
/* Handling the special cases is going to get too complicated for a macro,
just call `pa_adjust_insn_length' to do the real work. */
--- 1569,1575 ----
: 2)
/* Adjust the cost of branches. */
! #define BRANCH_COST(hot_p, predictable_p) (pa_cpu == PROCESSOR_8000 ? 2 : 1)
/* Handling the special cases is going to get too complicated for a macro,
just call `pa_adjust_insn_length' to do the real work. */
Index: config/mips/mips.h
===================================================================
*** config/mips/mips.h (revision 132800)
--- config/mips/mips.h (working copy)
*************** typedef struct mips_args {
*** 2415,2421 ****
/* A C expression for the cost of a branch instruction. A value of
1 is the default; other values are interpreted relative to that. */
! #define BRANCH_COST mips_branch_cost
#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
/* If defined, modifies the length assigned to instruction INSN as a
--- 2415,2421 ----
/* A C expression for the cost of a branch instruction. A value of
1 is the default; other values are interpreted relative to that. */
! #define BRANCH_COST(hot_p, predictable_p) mips_branch_cost
#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
/* If defined, modifies the length assigned to instruction INSN as a
Index: config/vax/vax.h
===================================================================
*** config/vax/vax.h (revision 132800)
--- config/vax/vax.h (working copy)
*************** enum reg_class { NO_REGS, ALL_REGS, LIM_
*** 652,658 ****
Branches are extremely cheap on the VAX while the shift insns often
used to replace branches can be expensive. */
! #define BRANCH_COST 0
/* Tell final.c how to eliminate redundant test instructions. */
--- 652,658 ----
Branches are extremely cheap on the VAX while the shift insns often
used to replace branches can be expensive. */
! #define BRANCH_COST(hot_p, predictable_p) 0
/* Tell final.c how to eliminate redundant test instructions. */
Index: config/h8300/h8300.h
===================================================================
*** config/h8300/h8300.h (revision 132800)
--- config/h8300/h8300.h (working copy)
*************** struct cum_arg
*** 1004,1010 ****
#define DELAY_SLOT_LENGTH(JUMP) \
(NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2)
! #define BRANCH_COST 0
/* Tell final.c how to eliminate redundant test instructions. */
--- 1004,1010 ----
#define DELAY_SLOT_LENGTH(JUMP) \
(NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2)
! #define BRANCH_COST(hot_p, predictable_p) 0
/* Tell final.c how to eliminate redundant test instructions. */
Index: params.def
===================================================================
*** params.def (revision 132800)
--- params.def (working copy)
*************** DEFPARAM (PARAM_STRUCT_REORG_COLD_STRUCT
*** 93,98 ****
--- 93,105 ----
"The threshold ratio between current and hottest structure counts",
10, 0, 100)
+ /* When branch is predicted to be taken with probability lower than this
+ threshold (in percent), then it is considered well predictable. */
+ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
+ "predictable-branch-outcome",
+ "Maximal esitmated outcome of branch considered predictable",
+ 2, 0, 50)
+
/* The single function inlining limit. This is the maximum size
of a function counted in internal gcc instructions (not in
real machine instructions) that is eligible for inlining