This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Propagate profile counts during switch expansion
Ping.
On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman <eraman@google.com> wrote:
> I have attached a revised patch. The updated ChangeLog is given below
> and I have responded to your comments inline:
>
> 2012-10-08 Easwaran Raman <eraman@google.com>
> * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to
> specificy the probability of taking the jump.
> (emit_cmp_and_jump_insns): Likewise.
> (expand_compare_and_swap_loop): Make the jump predicted not taken.
> * dojump.c (do_compare_rtx_and_jump): Remove the code attaching
> REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns.
> * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing
> probabilities for branches with more than two successors.
> * expr.c (emit_block_move_via_loop): Predict the loop backedge loop
> to be highly taken.
> (try_casesi): Pass the probability of jumping to the default label.
> (try_tablejump): Likewise.
> (do_tablejump): Likewise.
> * expr.h (try_tablejump): Add a new parameter.
> (try_casesi): Likewise.
> (emit_cmp_and_jump_insns): Add probability as default parameter with a
> default value of -1.
> * except.c (sjlj_emit_function_enter): Pass probability to
> emit_cmp_and_jump_insns.
> * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB.
> (do_jump_if_equal): Pass probability for REG_BR_PROB note.
> (add_case_node): Pass estimated probability of jumping to the case
> label.
> (emit_case_decision_tree): Pass default_prob to emit_case_nodes.
> (get_outgoing_edge_probs): New function.
> (conditional_probability): Likewise.
> (reset_out_edges_aux): Likewise.
> (compute_cases_per_edge): Likewise.
> (emit_case_dispatch_table): Update probabilities of edges coming out
> of the switch statement.
> (expand_case): Compute and propagate default edge probability to
> emit_case_dispatch_table.
> (expand_sjlj_dispatch_table): Update calls to add_case_node and
> emit_case_dispatch_table.
> (balance_case_nodes): Update subtree_prob values.
> (emit_case_nodes): Compute edge probabilities and add pass them to
> emit_cmp_and_jump_insns.
>
> gcc/testsuite/ChangeLog:
> 2012-10-02 Easwaran Raman <eraman@google.com>
> * gcc.dg/tree-prof/switch-case-1.c: New test case.
> * gcc.dg/tree-prof/switch-case-2.c: New test case.
>
>
>
>
> On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Hi,
>>> This patch propagates the profile counts during RTL expansion. In
>>> many cases, there is no way to determine the exact count of an edge
>>> generated during the expansion. So this patch uses some simple
>>> heuristics to estimate the edge counts but ensures that the counts of
>>> the basic blocks corresponding to the cases are (nearly the) same as
>>> at the gimple level.
>>>
>>> Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for trunk?
>>> Index: gcc/expr.c
>>> ===================================================================
>>> --- gcc/expr.c (revision 191879)
>>> +++ gcc/expr.c (working copy)
>>> @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
>>> #ifdef PUSH_ROUNDING
>>> static void emit_single_push_insn (enum machine_mode, rtx, tree);
>>> #endif
>>> -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
>>> +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
>>> static rtx const_vector_from_tree (tree);
>>> static void write_complex_part (rtx, rtx, bool);
>>>
>>> @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree
>>>
>>> static void
>>> do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
>>> - rtx default_label)
>>> + rtx default_label, int default_probability)
>>
>> Please document default_probability.
> Done.
>
>>> {
>>> rtx temp, vector;
>>>
>>> @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
>>> the maximum value of the range. */
>>>
>>> if (default_label)
>>> - emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
>>> - default_label);
>>> + {
>>> + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
>>> + default_label);
>>> + if (default_probability != -1)
>>> + {
>>> + rtx jump_insn = get_last_insn();
>>> + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability));
>>> + }
>>> + }
>>
>> dojump already does this kind of logic, but it is bit more cureful:
>> emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp,
>> if_true_label);
>> if (prob != -1 && profile_status != PROFILE_ABSENT)
>> {
>> for (last = NEXT_INSN (last);
>> last && NEXT_INSN (last);
>> last = NEXT_INSN (last))
>> if (JUMP_P (last))
>> break;
>> if (last
>> && JUMP_P (last)
>> && ! NEXT_INSN (last)
>> && any_condjump_p (last))
>> {
>> gcc_assert (!find_reg_note (last, REG_BR_PROB, 0));
>> add_reg_note (last, REG_BR_PROB, GEN_INT (prob));
>> }
>> }
>>
>> What about making emit_cmp_and_jump_insns taking the probability argument
>> and moving the code above inside? Most of other places need updating to
>> propagate probabilities.
>
> Done. Made probability a default parameter with a default parameter of
> -1 and so didn't update all callsites.
>
>> (compare_and_jump_seq in loop-unswitch probably also can be updated)
>>> @@ -10954,7 +10962,7 @@ do_tablejump (rtx index, enum machine_mode mode, r
>>>
>>> int
>>> try_tablejump (tree index_type, tree index_expr, tree minval, tree range,
>>> - rtx table_label, rtx default_label)
>>> + rtx table_label, rtx default_label, int default_probability)
>>
>> Simiarly here.
>>> Index: gcc/cfgbuild.c
>>> ===================================================================
>>> --- gcc/cfgbuild.c (revision 191879)
>>> +++ gcc/cfgbuild.c (working copy)
>>> @@ -533,6 +533,23 @@ find_bb_boundaries (basic_block bb)
>>> purge_dead_tablejump_edges (bb, table);
>>> }
>>>
>>> +/* If there is at least one edge in EDGES with a non-zero count, then
>>> + compute probabilities based on the existing counts. */
>>> +
>>> +static bool
>>> +gen_probabilities_from_existing_counts ( VEC(edge,gc) *edges) {
>>> + edge e;
>>> + edge_iterator ei;
>>> + gcov_type count_sum = 0;
>>> + FOR_EACH_EDGE(e, ei, edges)
>>> + count_sum += e->count;
>>> + if (count_sum == 0)
>>> + return false;
>>> + FOR_EACH_EDGE(e, ei, edges)
>>> + e->probability = e->count * REG_BR_PROB_BASE / count_sum;
>>> + return true;
>>> +}
>>> +
>>> /* Assume that frequency of basic block B is known. Compute frequencies
>>> and probabilities of outgoing edges. */
>>>
>>> @@ -560,7 +577,6 @@ compute_outgoing_frequencies (basic_block b)
>>> return;
>>> }
>>> }
>>> -
>>> if (single_succ_p (b))
>>> {
>>> e = single_succ_edge (b);
>>> @@ -568,7 +584,10 @@ compute_outgoing_frequencies (basic_block b)
>>> e->count = b->count;
>>> return;
>>> }
>>> - guess_outgoing_edge_probabilities (b);
>>> + else if (!gen_probabilities_from_existing_counts (b->succs)){
>>> + /* All outgoing edges of B have zero count. Guess probabilities. */
>>> + guess_outgoing_edge_probabilities (b);
>>> + }
>>
>> Hmm, I do not quite follow logic here.
>> basic block B is one of many basic blocks that the original BB was split from.
>> It is possible that B may have some of original edges, but there may be new ones.
>> How you can guess the outgoing probabilitie shere. Do you have an example?
> I have changed this code. If the bb has >2 edges, I just use the
> probability as it is without guessing. This should work for switch
> statements since the edge probabilities are adjusted in stmt.c. I am
> not sure if there are other cases of multi-way branches that I need to
> worry about here.
>
>> Also gen_probabilities_from_existing_counts could probably also work based
>> on original edge frequencies.
>>> @@ -1664,7 +1668,7 @@ do_jump_if_equal (enum machine_mode mode, rtx op0,
>>>
>>> static struct case_node *
>>> add_case_node (struct case_node *head, tree low, tree high,
>>> - tree label, alloc_pool case_node_pool)
>>> + tree label, gcov_type count, alloc_pool case_node_pool)
>>
>> Please document COUNT here.
> Done.
>
>>> {
>>> struct case_node *r;
>>>
>>> @@ -1677,6 +1681,8 @@ add_case_node (struct case_node *head, tree low, t
>>> r->high = high;
>>> r->code_label = label;
>>> r->parent = r->left = NULL;
>>> + r->count = count;
>>> + r->subtree_count = count;
>>> r->right = head;
>>> return r;
>>> }
>>
>> .. and here
> Done.
>>> @@ -1803,7 +1809,8 @@ expand_switch_as_decision_tree_p (tree range,
>>>
>>> static void
>>> emit_case_decision_tree (tree index_expr, tree index_type,
>>> - struct case_node *case_list, rtx default_label)
>>> + struct case_node *case_list, rtx default_label,
>>> + gcov_type default_count)
>>> {
>>> rtx index = expand_normal (index_expr);
>>>
>>> @@ -1839,15 +1846,29 @@ emit_case_decision_tree (tree index_expr, tree ind
>>> dump_case_nodes (dump_file, case_list, indent_step, 0);
>>> }
>>>
>>> - emit_case_nodes (index, case_list, default_label, index_type);
>>> + emit_case_nodes (index, case_list, default_label, default_count, index_type);
>>> if (default_label)
>>> emit_jump (default_label);
>>> }
>>>
>>
>> And document functio nhere.
> Done.
>>
>>> +static gcov_type
>>> +get_outgoing_edge_counts (basic_block bb)
>>> +{
>>> + edge e;
>>> + edge_iterator ei;
>>> + gcov_type count_sum = 0;
>>> + FOR_EACH_EDGE(e, ei, bb->succs)
>>> + count_sum += e->count;
>>> + return count_sum;
>>> +}
>>> +
>>> +#define case_probability(x, y) \
>>> + ((y) ? (gcc_assert (x <= y), (x) * REG_BR_PROB_BASE / (y)) : -1)
>>
>> You want to use RDIV here for better rounding. I would make this an inline
>> functions these days.
> Done.
>
>>> +
>>> /* Generate a dispatch tabler, switching on INDEX_EXPR and jumping to
>>> one of the labels in CASE_LIST or to the DEFAULT_LABEL.
>>> MINVAL, MAXVAL, and RANGE are the extrema and range of the case
>>> - labels in CASE_LIST.
>>> + labels in CASE_LIST. STMT_BB is the basic block containing the statement.
>>>
>>> First, a jump insn is emitted. First we try "casesi". If that
>>> fails, try "tablejump". A target *must* have one of them (or both).
>>> @@ -1860,19 +1881,23 @@ emit_case_decision_tree (tree index_expr, tree ind
>>> static void
>>> emit_case_dispatch_table (tree index_expr, tree index_type,
>>> struct case_node *case_list, rtx default_label,
>>> - tree minval, tree maxval, tree range)
>>> + tree minval, tree maxval, tree range,
>>> + basic_block stmt_bb)
>>> {
>>> int i, ncases;
>>> struct case_node *n;
>>> rtx *labelvec;
>>> rtx fallback_label = label_rtx (case_list->code_label);
>>> rtx table_label = gen_label_rtx ();
>>> + bool has_gaps = false;
>>> + edge default_edge = EDGE_SUCC(stmt_bb, 0);
>>> + gcov_type default_count = default_edge->count;
>>> + gcov_type count = get_outgoing_edge_counts (stmt_bb);
>>> + bool try_with_tablejump = false;
>>
>> Here, I think you want to first decide whether you base expansion on counts
>> (i.e. any counts involved is non-0) or probabilities. We really want to maintain
>> the profile up-to-date even with guessed profile. The inconsistencies
>> propagate across CFG and easilly confuse later optimizers.
>> Everywhere you are using count comming from the original statement, you can also use
>> the probability.
>
> I have changed the code to use probabilities.
>
>> The newly produced control flow will be optimized and the edges frequencies will
>> be propagated along.
>>
>>> +
>>> + default_edge->count = default_count;
>>> + if (count)
>>> + {
>>> + edge e;
>>> + edge_iterator ei;
>>> + FOR_EACH_EDGE (e, ei, stmt_bb->succs)
>>> + e->probability = e->count * REG_BR_PROB_BASE / count;
>>> + }
>>
>> Hmm, this updates origina BB containing the switch statement? Of course,
>> modulo roundoff errors, this should hold. I wonder where did you got the
>> diferences and why do you need this? You are going to output new control flow
>> and find_many_sub_basic_blocks will recompute all the counts/frequencies inside
>> anyway?
>> Also you want to use RDIV here.
> I need to do this to account for the change in the count of default
> edge. Changed to use RDIV.
>
>>> @@ -2358,6 +2433,20 @@ node_is_bounded (case_node_ptr node, tree index_ty
>>> && node_has_high_bound (node, index_type));
>>> }
>>>
>>> +
>>> +/* Attach a REG_BR_PROB note to the last created RTX instruction if
>>> + PROBABILITY is not -1. */
>>> +
>>> +static void
>>> +add_prob_note_to_last_insn(int probability)
>>> +{
>>> + if (probability != -1)
>>> + {
>>> + rtx jump_insn = get_last_insn();
>>> + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (probability));
>>> + }
>>> +}
>>
>> Well, I am not sure this is safe thing to do since you do not really
>> know what the last instruction is. In any case you want to test
>> it is an conditional jump instruction as in the code I quoted
>> above.
>>
>>> @@ -2430,7 +2523,11 @@ emit_case_nodes (rtx index, case_node_ptr node, rt
>>> unsignedp),
>>> GT, NULL_RTX, mode, unsignedp,
>>> label_rtx (node->right->code_label));
>>> - emit_case_nodes (index, node->left, default_label, index_type);
>>> + probability = case_probability (node->right->count,
>>> + subtree_count + default_count);
>>> + add_prob_note_to_last_insn (probability);
>>> + emit_case_nodes (index, node->left, default_label, default_count,
>>> + index_type);
>>
>> Hmm, here you seem to be distributing the probabilities of the counts reached.
>> What happens in the case when the edge probability needs to be distributed across
>> nodes of the decision tree. I.e. in
>> t(int a)
>> {
>> switch (a)
>> {
>> case 100:
>> case 200:
>> case 300:
>> case 400:
>> case 500:
>> case 600:
>> case 700:
>> case 800:
>> case 900:
>> t();
>> case 101:
>> case 202:
>> case 303:
>> case 404:
>> case 505:
>> case 606:
>> case 707:
>> case 808:
>> case 909:
>> q();
>> }
>> }
>
> I have fixed this.
>>
>>> }
>>>
>>> else if (node_is_bounded (node->left, index_type))
>>> @@ -2442,7 +2539,10 @@ emit_case_nodes (rtx index, case_node_ptr node, rt
>>> unsignedp),
>>> LT, NULL_RTX, mode, unsignedp,
>>> label_rtx (node->left->code_label));
>>> - emit_case_nodes (index, node->right, default_label, index_type);
>>> + probability = case_probability (node->left->count,
>>> + subtree_count + default_count);
>>> + add_prob_note_to_last_insn (probability);
>>
>> Here, and apparently in all uses of add_prob_note_to_last_insn, you want to use
>> the new parameter of emit_cmp_and_jump_insn.
>>
>> Otherwise the patch seems to make sense. Thanks for looking into this issue.
>>
>> Honza