[PATCH] ipa-inline: Improve growth accumulation for recursive calls
Tamar Christina
Tamar.Christina@arm.com
Fri Aug 21 09:17:11 GMT 2020
Hi Martin,
> Hi,
>
> On Thu, Aug 20 2020, Richard Sandiford wrote:
> >>
> >>
> >> Really appreciate for your detailed explanation. BTW, My previous
> >> patch for PGO build on exchange2 takes this similar method by setting
> >> each cloned node to 1/10th of the frequency several month agao :)
> >>
> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html
> >
> > Does it seem likely that we'll reach a resolution on this soon?
> > I take the point that the patch that introduced the exchange
> > regression
> > [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
> > was just uncovering a latent issue, but still, this is a large
> > regression in an important benchmark to be carrying around. For those
> > of us doing regular benchmark CI, the longer the performance trough
> > gets, the harder it is to spot other unrelated regressions in the “properly
> optimised” code.
> >
> > So if we don't have a specific plan for fixing the regression soon, I
> > think we should consider reverting the patch until we have something
> > that avoids the exchange regression, even though the patch wasn't
> > technically wrong.
>
> Honza's changes have been motivated to big extent as an enabler for IPA-CP
> heuristics changes to actually speed up 548.exchange2_r.
>
> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds two
> weeks ago, this week it is 403, but with my WIP (and so far untested) patch
> below it is just 276 seconds - faster than one built with GCC 8 which needs
> 283 seconds.
>
> I'll be interested in knowing if it also works this well on other architectures.
>
Many thanks for working on this!
I tried this on an AArch64 Neoverse-N1 machine and didn't see any difference.
Do I need any flags for it to work? The patch was applied on top of 656218ab982cc22b826227045826c92743143af1
And I tried 3 runs
1) -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once
2) -mcpu=native -Ofast -fomit-frame-pointer -flto -fno-inline-functions-called-once
3) -mcpu=native -Ofast -fomit-frame-pointer -flto
First one used to give us the best result, with this patch there's no difference between 1 and 2 (11% regression) and the 3rd one is about 15% on top of that.
If there's anything I can do to help just let me know.
Cheers,
Tamar
> The patch still needs a bit of a cleanup. The change of the default value of
> ipa-cp-unit-growth needs to be done only for small compilation units (like
> inlining does). I should experiment if the value of
> param_ipa_cp_loop_hint_bonus should be changed or not. And last but not
> least, I also want to clean-up the interfaces between ipa-fnsummary.c and
> ipa-cp.c a bit. I am working on all of this and hope to finish the patch set in a
> few (working) days.
>
> The bottom line is that there is a plan to address this regression.
>
> Martin
>
>
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index e4910a04ffa..0d44310503a
> 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3190,11 +3190,23 @@ devirtualization_time_bonus (struct
> cgraph_node *node,
> /* Return time bonus incurred because of HINTS. */
>
> static int
> -hint_time_bonus (cgraph_node *node, ipa_hints hints)
> +hint_time_bonus (cgraph_node *node, ipa_hints hints, sreal
> known_iter_freq,
> + sreal known_strides_freq)
> {
> int result = 0;
> - if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
> - result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
> + sreal bonus_for_one = opt_for_fn (node->decl,
> + param_ipa_cp_loop_hint_bonus);
> +
> + if (hints & INLINE_HINT_loop_iterations)
> + {
> + /* FIXME: This should probably be way more nuanced. */
> + result += (known_iter_freq * bonus_for_one).to_int ();
> + }
> + if (hints & INLINE_HINT_loop_stride)
> + {
> + /* FIXME: And this as well. */
> + result += (known_strides_freq * bonus_for_one).to_int ();
> + }
> +
> return result;
> }
>
> @@ -3395,12 +3407,13 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec<tree> known_csts,
> int est_move_cost, ipcp_value_base *val) {
> int size, time_benefit;
> - sreal time, base_time;
> + sreal time, base_time, known_iter_freq, known_strides_freq;
> ipa_hints hints;
>
> estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
> known_aggs, &size, &time,
> - &base_time, &hints);
> + &base_time, &hints, &known_iter_freq,
> + &known_strides_freq);
> base_time -= time;
> if (base_time > 65535)
> base_time = 65535;
> @@ -3414,7 +3427,7 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec<tree> known_csts,
> time_benefit = base_time.to_int ()
> + devirtualization_time_bonus (node, known_csts, known_contexts,
> known_aggs)
> - + hint_time_bonus (node, hints)
> + + hint_time_bonus (node, hints, known_iter_freq,
> + known_strides_freq)
> + removable_params_cost + est_move_cost;
>
> gcc_checking_assert (size >=0);
> @@ -3476,7 +3489,7 @@ estimate_local_effects (struct cgraph_node *node)
> {
> struct caller_statistics stats;
> ipa_hints hints;
> - sreal time, base_time;
> + sreal time, base_time, known_iter_freq, known_strides_freq;
> int size;
>
> init_caller_stats (&stats);
> @@ -3484,9 +3497,11 @@ estimate_local_effects (struct cgraph_node
> *node)
> false);
> estimate_ipcp_clone_size_and_time (node, known_csts,
> known_contexts,
> known_aggs, &size, &time,
> - &base_time, &hints);
> + &base_time, &hints,
> &known_iter_freq,
> + &known_strides_freq);
> time -= devirt_bonus;
> - time -= hint_time_bonus (node, hints);
> + time -= hint_time_bonus (node, hints, known_iter_freq,
> + known_strides_freq);
> time -= removable_params_cost;
> size -= stats.n_calls * removable_params_cost;
>
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index
> 2cfab40156e..29fac5db19f 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -310,6 +310,35 @@ set_hint_predicate (predicate **p, predicate
> new_predicate)
> }
> }
>
> +/* Find if NEW_PREDICATE is already in V and if so, increment its freq.
> + Otherwise add a new item to the vector with this predicate and frerq
> equal
> + to add_freq. */
> +
> +static void
> +add_freqcounting_predicate (vec<ipa_freqcounting_predicate, va_gc> **v,
> + const predicate &new_predicate, sreal add_freq) {
> + if (new_predicate == false || new_predicate == true)
> + return;
> + ipa_freqcounting_predicate *f;
> + for (int i = 0; vec_safe_iterate (*v, i, &f); i++)
> + if (new_predicate == f->predicate)
> + {
> + f->freq += add_freq;
> + return;
> + }
> + /* FIXME: Make this a parameter. */
> + if (vec_safe_length (*v) >= 32)
> + /* Too many different predicates to account for. */
> + return;
> +
> + ipa_freqcounting_predicate fcp;
> + fcp.predicate = NULL;
> + set_hint_predicate (&fcp.predicate, new_predicate);
> + fcp.freq = add_freq;
> + vec_safe_push (*v, fcp);
> + return;
> +}
>
> /* Compute what conditions may or may not hold given information about
> parameters. RET_CLAUSE returns truths that may hold in a specialized copy,
> @@ -722,10 +751,12 @@ ipa_call_summary::~ipa_call_summary ()
>
> ipa_fn_summary::~ipa_fn_summary ()
> {
> - if (loop_iterations)
> - edge_predicate_pool.remove (loop_iterations);
> - if (loop_stride)
> - edge_predicate_pool.remove (loop_stride);
> + unsigned len = vec_safe_length (loop_iterations); for (unsigned i =
> + 0; i < len; i++)
> + edge_predicate_pool.remove ((*loop_iterations)[i].predicate);
> + len = vec_safe_length (loop_strides); for (unsigned i = 0; i < len;
> + i++)
> + edge_predicate_pool.remove ((*loop_strides)[i].predicate);
> vec_free (conds);
> vec_free (size_time_table);
> vec_free (call_size_time_table);
> @@ -741,24 +772,33 @@ ipa_fn_summary_t::remove_callees (cgraph_node
> *node)
> ipa_call_summaries->remove (e);
> }
>
> -/* Same as remap_predicate_after_duplication but handle hint predicate *P.
> - Additionally care about allocating new memory slot for updated predicate
> - and set it to NULL when it becomes true or false (and thus uninteresting).
> - */
> +/* Duplicate predicates in loop hint vector, allocating memory for them and
> + remove and deallocate any uninteresting (true or false) ones. Return the
> + result. */
>
> -static void
> -remap_hint_predicate_after_duplication (predicate **p,
> - clause_t possible_truths)
> +static vec<ipa_freqcounting_predicate, va_gc> *
> +remap_freqcounting_preds_after_dup (vec<ipa_freqcounting_predicate,
> va_gc> *v,
> + clause_t possible_truths)
> {
> - predicate new_predicate;
> + if (vec_safe_length (v) == 0)
> + return NULL;
>
> - if (!*p)
> - return;
> + vec<ipa_freqcounting_predicate, va_gc> *res = v->copy ();
> + int len = res->length();
> + for (int i = len - 1; i >= 0; i--)
> + {
> + predicate new_predicate
> + = (*res)[i].predicate->remap_after_duplication (possible_truths);
> + /* We do not want to free previous predicate; it is used by node
> + origin. */
> + (*res)[i].predicate = NULL;
> + set_hint_predicate (&(*res)[i].predicate, new_predicate);
> +
> + if (!(*res)[i].predicate)
> + res->unordered_remove (i);
> + }
>
> - new_predicate = (*p)->remap_after_duplication (possible_truths);
> - /* We do not want to free previous predicate; it is used by node origin. */
> - *p = NULL;
> - set_hint_predicate (p, new_predicate);
> + return res;
> }
>
>
> @@ -874,9 +914,11 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
> optimized_out_size += es->call_stmt_size *
> ipa_fn_summary::size_scale;
> edge_set_predicate (edge, &new_predicate);
> }
> - remap_hint_predicate_after_duplication (&info->loop_iterations,
> + info->loop_iterations
> + = remap_freqcounting_preds_after_dup (info->loop_iterations,
> possible_truths);
> - remap_hint_predicate_after_duplication (&info->loop_stride,
> + info->loop_strides
> + = remap_freqcounting_preds_after_dup (info->loop_strides,
> possible_truths);
>
> /* If inliner or someone after inliner will ever start producing @@ -888,17
> +930,21 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
> else
> {
> info->size_time_table = vec_safe_copy (info->size_time_table);
> - if (info->loop_iterations)
> + info->loop_iterations = vec_safe_copy (info->loop_iterations);
> + info->loop_strides = vec_safe_copy (info->loop_strides);
> +
> + ipa_freqcounting_predicate *f;
> + for (int i = 0; vec_safe_iterate (info->loop_iterations, i, &f);
> + i++)
> {
> - predicate p = *info->loop_iterations;
> - info->loop_iterations = NULL;
> - set_hint_predicate (&info->loop_iterations, p);
> + predicate p = *f->predicate;
> + f->predicate = NULL;
> + set_hint_predicate (&f->predicate, p);
> }
> - if (info->loop_stride)
> + for (int i = 0; vec_safe_iterate (info->loop_strides, i, &f);
> + i++)
> {
> - predicate p = *info->loop_stride;
> - info->loop_stride = NULL;
> - set_hint_predicate (&info->loop_stride, p);
> + predicate p = *f->predicate;
> + f->predicate = NULL;
> + set_hint_predicate (&f->predicate, p);
> }
> }
> if (!dst->inlined_to)
> @@ -1057,15 +1103,28 @@ ipa_dump_fn_summary (FILE *f, struct
> cgraph_node *node)
> }
> fprintf (f, "\n");
> }
> - if (s->loop_iterations)
> + ipa_freqcounting_predicate *fcp;
> + bool first_fcp = true;
> + for (int i = 0; vec_safe_iterate (s->loop_iterations, i, &fcp); i++)
> {
> - fprintf (f, " loop iterations:");
> - s->loop_iterations->dump (f, s->conds);
> + if (first_fcp)
> + {
> + fprintf (f, " loop iterations:");
> + first_fcp = false;
> + }
> + fprintf (f, " %3.2f for ", fcp->freq.to_double ());
> + fcp->predicate->dump (f, s->conds);
> }
> - if (s->loop_stride)
> + first_fcp = true;
> + for (int i = 0; vec_safe_iterate (s->loop_strides, i, &fcp); i++)
> {
> - fprintf (f, " loop stride:");
> - s->loop_stride->dump (f, s->conds);
> + if (first_fcp)
> + {
> + fprintf (f, " loop strides:");
> + first_fcp = false;
> + }
> + fprintf (f, " %3.2f for :", fcp->freq.to_double ());
> + fcp->predicate->dump (f, s->conds);
> }
> fprintf (f, " calls:\n");
> dump_ipa_call_summary (f, 4, node, s); @@ -2514,12 +2573,13 @@
> analyze_function_body (struct cgraph_node *node, bool early)
>
> if (fbi.info)
> compute_bb_predicates (&fbi, node, info, params_summary);
> + const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN
> + (cfun)->count;
> order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> for (n = 0; n < nblocks; n++)
> {
> bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
> - freq = bb->count.to_sreal_scale (ENTRY_BLOCK_PTR_FOR_FN (cfun)-
> >count);
> + freq = bb->count.to_sreal_scale (entry_count);
> if (clobber_only_eh_bb_p (bb))
> {
> if (dump_file && (dump_flags & TDF_DETAILS)) @@ -2758,23
> +2818,27 @@ analyze_function_body (struct cgraph_node *node, bool early)
>
> if (nonconstant_names.exists () && !early)
> {
> + ipa_fn_summary *s = ipa_fn_summaries->get (node);
> class loop *loop;
> - predicate loop_iterations = true;
> - predicate loop_stride = true;
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> flow_loops_dump (dump_file, NULL, 0);
> scev_initialize ();
> FOR_EACH_LOOP (loop, 0)
> {
> + predicate loop_iterations = true;
> + sreal header_freq;
> vec<edge> exits;
> edge ex;
> unsigned int j;
> class tree_niter_desc niter_desc;
> if (loop->header->aux)
> - bb_predicate = *(predicate *) loop->header->aux;
> + {
> + bb_predicate = *(predicate *) loop->header->aux;
> + header_freq = loop->header->count.to_sreal_scale
> (entry_count);
> + }
> else
> - bb_predicate = false;
> + continue;
>
> exits = get_loop_exit_edges (loop);
> FOR_EACH_VEC_ELT (exits, j, ex)
> @@ -2790,10 +2854,10 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
> will_be_nonconstant = bb_predicate & will_be_nonconstant;
> if (will_be_nonconstant != true
> && will_be_nonconstant != false)
> - /* This is slightly inprecise. We may want to represent each
> - loop with independent predicate. */
> loop_iterations &= will_be_nonconstant;
> }
> + add_freqcounting_predicate (&s->loop_iterations, loop_iterations,
> + header_freq);
> exits.release ();
> }
>
> @@ -2803,14 +2867,20 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
> for (loop = loops_for_fn (cfun)->tree_root->inner;
> loop != NULL; loop = loop->next)
> {
> + predicate loop_stride = true;
> + sreal bb_freq;
> basic_block *body = get_loop_body (loop);
> for (unsigned i = 0; i < loop->num_nodes; i++)
> {
> gimple_stmt_iterator gsi;
> if (body[i]->aux)
> - bb_predicate = *(predicate *) body[i]->aux;
> + {
> + bb_predicate = *(predicate *) body[i]->aux;
> + bb_freq = body[i]->count.to_sreal_scale (entry_count);
> + }
> else
> - bb_predicate = false;
> + continue;
> +
> for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);
> gsi_next (&gsi))
> {
> @@ -2839,16 +2909,13 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
> will_be_nonconstant = bb_predicate &
> will_be_nonconstant;
> if (will_be_nonconstant != true
> && will_be_nonconstant != false)
> - /* This is slightly inprecise. We may want to represent
> - each loop with independent predicate. */
> loop_stride = loop_stride & will_be_nonconstant;
> }
> }
> + add_freqcounting_predicate (&s->loop_strides, loop_stride,
> + bb_freq);
> free (body);
> }
> - ipa_fn_summary *s = ipa_fn_summaries->get (node);
> - set_hint_predicate (&s->loop_iterations, loop_iterations);
> - set_hint_predicate (&s->loop_stride, loop_stride);
> scev_finalize ();
> }
> FOR_ALL_BB_FN (bb, my_function)
> @@ -3640,14 +3707,32 @@ ipa_call_context::estimate_size_and_time (int
> *ret_size,
> if (time > nonspecialized_time)
> time = nonspecialized_time;
>
> + m_loops_with_known_iterations = 0;
> + ipa_freqcounting_predicate *fcp;
> + for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
> + {
> + gcc_assert (fcp->predicate);
> + if (!fcp->predicate->evaluate (m_possible_truths))
> + {
> + if (ret_hints)
> + hints |= INLINE_HINT_loop_iterations;
> + m_loops_with_known_iterations += fcp->freq;
> + }
> + }
> + m_loops_with_known_strides = 0;
> + for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
> + {
> + gcc_assert (fcp->predicate);
> + if (!fcp->predicate->evaluate (m_possible_truths))
> + {
> + if (ret_hints)
> + hints |= INLINE_HINT_loop_stride;
> + m_loops_with_known_strides += fcp->freq;
> + }
> + }
> +
> if (ret_hints)
> {
> - if (info->loop_iterations
> - && !info->loop_iterations->evaluate (m_possible_truths))
> - hints |= INLINE_HINT_loop_iterations;
> - if (info->loop_stride
> - && !info->loop_stride->evaluate (m_possible_truths))
> - hints |= INLINE_HINT_loop_stride;
> if (info->scc_no)
> hints |= INLINE_HINT_in_scc;
> if (DECL_DECLARED_INLINE_P (m_node->decl)) @@ -3687,7 +3772,9 @@
> estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
> vec<ipa_agg_value_set> known_aggs,
> int *ret_size, sreal *ret_time,
> sreal *ret_nonspec_time,
> - ipa_hints *hints)
> + ipa_hints *hints,
> + sreal *loops_with_known_iterations,
> + sreal *loops_with_known_strides)
> {
> clause_t clause, nonspec_clause;
>
> @@ -3699,6 +3786,8 @@ estimate_ipcp_clone_size_and_time (struct
> cgraph_node *node,
> known_aggs, vNULL);
> ctx.estimate_size_and_time (ret_size, NULL, ret_time,
> ret_nonspec_time, hints);
> + *loops_with_known_iterations = ctx.m_loops_with_known_iterations;
> + *loops_with_known_strides = ctx.m_loops_with_known_strides;
> }
>
> /* Return stack frame offset where frame of NODE is supposed to start
> inside @@ -3857,32 +3946,31 @@ remap_edge_summaries (struct
> cgraph_edge *inlined_edge,
> }
> }
>
> -/* Same as remap_predicate, but set result into hint *HINT. */
> +/* Run remap_after_inlining on each predicate in V. */
>
> static void
> -remap_hint_predicate (class ipa_fn_summary *info,
> - class ipa_node_params *params_summary,
> - class ipa_fn_summary *callee_info,
> - predicate **hint,
> - vec<int> operand_map,
> - vec<int> offset_map,
> - clause_t possible_truths,
> - predicate *toplev_predicate)
> -{
> - predicate p;
> +remap_freqcounting_predicate (class ipa_fn_summary *info,
> + class ipa_node_params *params_summary,
> + class ipa_fn_summary *callee_info,
> + vec<ipa_freqcounting_predicate, va_gc> *v,
> + vec<int> operand_map,
> + vec<int> offset_map,
> + clause_t possible_truths,
> + predicate *toplev_predicate)
>
> - if (!*hint)
> - return;
> - p = (*hint)->remap_after_inlining
> - (info, params_summary, callee_info,
> - operand_map, offset_map,
> - possible_truths, *toplev_predicate);
> - if (p != false && p != true)
> +{
> + ipa_freqcounting_predicate *fcp;
> + for (int i = 0; vec_safe_iterate (v, i, &fcp); i++)
> {
> - if (!*hint)
> - set_hint_predicate (hint, p);
> - else
> - **hint &= p;
> + predicate p
> + = fcp->predicate->remap_after_inlining (info, params_summary,
> + callee_info, operand_map,
> + offset_map, possible_truths,
> + *toplev_predicate);
> + if (p != false && p != true)
> + /* FIXME: Is this really supposed to be &= and not a plain
> + assignment? */
> + *fcp->predicate &= p;
> }
> }
>
> @@ -3992,12 +4080,12 @@ ipa_merge_fn_summary_after_inlining (struct
> cgraph_edge *edge)
> remap_edge_summaries (edge, edge->callee, info, params_summary,
> callee_info, operand_map,
> offset_map, clause, &toplev_predicate);
> - remap_hint_predicate (info, params_summary, callee_info,
> - &callee_info->loop_iterations,
> - operand_map, offset_map, clause,
> &toplev_predicate);
> - remap_hint_predicate (info, params_summary, callee_info,
> - &callee_info->loop_stride,
> - operand_map, offset_map, clause,
> &toplev_predicate);
> + remap_freqcounting_predicate (info, params_summary, callee_info,
> + info->loop_iterations, operand_map,
> + offset_map, clause, &toplev_predicate);
> + remap_freqcounting_predicate (info, params_summary, callee_info,
> + info->loop_strides, operand_map,
> + offset_map, clause, &toplev_predicate);
>
> HOST_WIDE_INT stack_frame_offset = ipa_get_stack_frame_offset (edge-
> >callee);
> HOST_WIDE_INT peak = stack_frame_offset + callee_info-
> >estimated_stack_size;
> @@ -4322,12 +4410,34 @@ inline_read_section (struct lto_file_decl_data
> *file_data, const char *data,
> info->size_time_table->quick_push (e);
> }
>
> - p.stream_in (&ib);
> - if (info)
> - set_hint_predicate (&info->loop_iterations, p);
> - p.stream_in (&ib);
> - if (info)
> - set_hint_predicate (&info->loop_stride, p);
> + count2 = streamer_read_uhwi (&ib);
> + for (j = 0; j < count2; j++)
> + {
> + p.stream_in (&ib);
> + sreal fcp_freq = sreal::stream_in (&ib);
> + if (info)
> + {
> + ipa_freqcounting_predicate fcp;
> + fcp.predicate = NULL;
> + set_hint_predicate (&fcp.predicate, p);
> + fcp.freq = fcp_freq;
> + vec_safe_push (info->loop_iterations, fcp);
> + }
> + }
> + count2 = streamer_read_uhwi (&ib);
> + for (j = 0; j < count2; j++)
> + {
> + p.stream_in (&ib);
> + sreal fcp_freq = sreal::stream_in (&ib);
> + if (info)
> + {
> + ipa_freqcounting_predicate fcp;
> + fcp.predicate = NULL;
> + set_hint_predicate (&fcp.predicate, p);
> + fcp.freq = fcp_freq;
> + vec_safe_push (info->loop_strides, fcp);
> + }
> + }
> for (e = node->callees; e; e = e->next_callee)
> read_ipa_call_summary (&ib, e, info != NULL);
> for (e = node->indirect_calls; e; e = e->next_callee) @@ -4487,14
> +4597,19 @@ ipa_fn_summary_write (void)
> e->exec_predicate.stream_out (ob);
> e->nonconst_predicate.stream_out (ob);
> }
> - if (info->loop_iterations)
> - info->loop_iterations->stream_out (ob);
> - else
> - streamer_write_uhwi (ob, 0);
> - if (info->loop_stride)
> - info->loop_stride->stream_out (ob);
> - else
> - streamer_write_uhwi (ob, 0);
> + ipa_freqcounting_predicate *fcp;
> + streamer_write_uhwi (ob, vec_safe_length (info->loop_iterations));
> + for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
> + {
> + fcp->predicate->stream_out (ob);
> + fcp->freq.stream_out (ob);
> + }
> + streamer_write_uhwi (ob, vec_safe_length (info->loop_strides));
> + for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
> + {
> + fcp->predicate->stream_out (ob);
> + fcp->freq.stream_out (ob);
> + }
> for (edge = cnode->callees; edge; edge = edge->next_callee)
> write_ipa_call_summary (ob, edge);
> for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h index
> c6ddc9f3199..d8429afdbef 100644
> --- a/gcc/ipa-fnsummary.h
> +++ b/gcc/ipa-fnsummary.h
> @@ -101,6 +101,19 @@ public:
> }
> };
>
> +/* Structure to capture how frequently some interesting events occur given
> a
> + particular predicate. The structure is used to estimate how often we
> + encounter loops with known iteration count or stride in various
> + contexts. */
> +
> +struct GTY(()) ipa_freqcounting_predicate {
> + /* The described event happens with this frequency... */
> + sreal freq;
> + /* ...when this predicate evaluates to false. */
> + class predicate * GTY((skip)) predicate; };
> +
> /* Function inlining information. */
> class GTY(()) ipa_fn_summary
> {
> @@ -112,8 +125,9 @@ public:
> inlinable (false), single_caller (false),
> fp_expressions (false), estimated_stack_size (false),
> time (0), conds (NULL),
> - size_time_table (NULL), call_size_time_table (NULL), loop_iterations
> (NULL),
> - loop_stride (NULL), growth (0), scc_no (0)
> + size_time_table (NULL), call_size_time_table (NULL),
> + loop_iterations (NULL), loop_strides (NULL),
> + growth (0), scc_no (0)
> {
> }
>
> @@ -125,13 +139,12 @@ public:
> estimated_stack_size (s.estimated_stack_size),
> time (s.time), conds (s.conds), size_time_table (s.size_time_table),
> call_size_time_table (NULL),
> - loop_iterations (s.loop_iterations), loop_stride (s.loop_stride),
> + loop_iterations (s.loop_iterations), loop_strides (s.loop_strides),
> growth (s.growth), scc_no (s.scc_no)
> {}
>
> /* Default constructor. */
> ~ipa_fn_summary ();
> -
> /* Information about the function body itself. */
>
> /* Minimal size increase after inlining. */ @@ -164,12 +177,10 @@ public:
> vec<size_time_entry, va_gc> *size_time_table;
> vec<size_time_entry, va_gc> *call_size_time_table;
>
> - /* Predicate on when some loop in the function becomes to have known
> - bounds. */
> - predicate * GTY((skip)) loop_iterations;
> - /* Predicate on when some loop in the function becomes to have known
> - stride. */
> - predicate * GTY((skip)) loop_stride;
> + /* Predicates on when some loops in the function can have known
> + bounds. */ vec<ipa_freqcounting_predicate, va_gc> *loop_iterations;
> + /* Predicates on when some loops in the function can have known
> + strides. */ vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
> /* Estimated growth for inlining all copies of the function before start
> of small functions inlining.
> This value will get out of date as the callers are duplicated, but @@ -316,6
> +327,15 @@ public:
> {
> return m_node != NULL;
> }
> +
> +
> + /* How often loops will have known iterations. Calculated in
> + estimate_size_and_time. */
> + sreal m_loops_with_known_iterations;
> + /* How often loops will have known strides. Calculated in
> + estimate_size_and_time. */
> + sreal m_loops_with_known_strides;
> +
> private:
> /* Called function. */
> cgraph_node *m_node;
> @@ -353,7 +373,7 @@ void estimate_ipcp_clone_size_and_time (struct
> cgraph_node *,
> vec<ipa_polymorphic_call_context>,
> vec<ipa_agg_value_set>,
> int *, sreal *, sreal *,
> - ipa_hints *);
> + ipa_hints *, sreal *, sreal *);
> void ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge);
> void ipa_update_overall_fn_summary (struct cgraph_node *node, bool
> reset = true); void compute_fn_summary (struct cgraph_node *, bool); diff -
> -git a/gcc/params.opt b/gcc/params.opt index f39e5d1a012..2a5f3d61727
> 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -211,7 +211,7 @@ Common Joined UInteger
> Var(param_ipa_cp_single_call_penalty) Init(15) IntegerRan Percentage
> penalty functions containing a single call to another function will receive
> when they are evaluated for cloning.
>
> -param=ipa-cp-unit-growth=
> -Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param
> Optimization
> +Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(80) Param
> +Optimization
> How much can given compilation unit grow because of the interprocedural
> constant propagation (in percent).
>
> -param=ipa-cp-value-list-size=
More information about the Gcc-patches
mailing list