ipa-cp heuristics fixes
Martin Jambor
mjambor@suse.cz
Thu Dec 10 10:47:00 GMT 2015
Hi,
thanks for looking into this, I only have one question:
On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> Martin,
> while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
> First of all, ipcp_cloning_candidate_p calls
> optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
> which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
> around. I replaced it by node->optimize_for_size_p ().
>
> Second we perform incredible number of clones because we do obtain some sort of
> polymorphic call context for them. In wast majority of cases this is useless
> effort, because the functions in question do not contain virtual calls and do
> not pass the parameter further. For firefox about 40k out of 50k clones
> created are created just because we found some context.
>
> I changed the code to only clone if this immediately leads to devirtualization.
> This do not cause any noticeable drop in number of devirtualized calls on
> Firefox. I suppose we will miss the case where cloning a caller may allow
> devirtualization in a clone of callee, but I do not think the heuristics for
> context independent values can handle this as implemented right now and it
> simply have way to many false positives.
>
> What we can do is to devirtualize w/o cloning for local functions and
> speculatively devirtualize in case we would otherwise clone.
>
> Third problem I noticed is that
> will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
> ignore the function size when deciding about the code size impact.
> This function is doing some analysis for inliner where it, for example, analyses
> if a comdat which is going to be inlined consistently in the whole program
> will be removed.
>
> In the cloning case I do not see this to apply: we have no evidence that the
> other units will pass the same constants to the function. I think you
> basically want to assume that the function will be removed if it has no
> address taken and it is not externally visibible. This is what local flag
> is for.
>
> I gathered some stats:
>
> number of clones for all contexts: 49948->11102
> number of clones: 4376->4383
>
> good_cloning_opportunity_p is called about 70k times, I wonder if the
> thresholds are not simply set too high. For example, inliner does about 300k
> inlines at Firefox.
>
> number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I do not have data on unpatched tree for this)
> number of devirts: 956->933
> number of devirts happening at inline: 781->868
> number of indirect calls promoted: 512->512
>
> Inliner stats from: Unit growth for small function inlining: 7965701->9130051 (14%)
> to: Unit growth for small function inlining: 7965010->9138577
>
> So it seems that except for large drop in number of clones there is no significant difference.
>
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
>
> Honza
>
> * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> (good_cloning_opportunity_p): Likewise.
> (gather_context_independent_values): Do not return true when
> polymorphic call context is known or when we have known aggregate
> value of unused parameter.
> (estimate_local_effects): Try to create clone for all context
> when either some params are substituted or devirtualization is possible
> or some params can be removed; use local flag instead of
> node->will_be_removed_from_program_if_no_direct_calls_p.
> (identify_dead_nodes): Likewise.
> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c (revision 231477)
> +++ ipa-cp.c (working copy)
> @@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_
> return false;
> }
>
> - if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> + if (node->optimize_for_size_p ())
> {
> if (dump_file)
> fprintf (dump_file, "Not considering %s for cloning; "
> @@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap
> {
> if (time_benefit == 0
> || !opt_for_fn (node->decl, flag_ipa_cp_clone)
> - || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> + || node->optimize_for_size_p ())
> return false;
>
> gcc_assert (size_cost > 0);
> @@ -2387,12 +2387,14 @@ gather_context_independent_values (struc
> *removable_params_cost
> += ipa_get_param_move_cost (info, i);
>
> + if (!ipa_is_param_used (info, i))
> + continue;
> +
Is this really necessary, is it not enough to remove the assignment to
ret below? If the parameter is not used, devirtualization time bonus,
which you then rely on estimate_local_effects, should be zero for it.
It is a very minor point, I suppose, but if the function gets cloned
for a different reason, it might still be beneficial to have as much
context-independent information for it as possible too, because that
can then be used in a callee (see the second call of
gather_context_independent_values).
Other than that, all the changes seem like a clear improvement.
Thanks,
Martin
> ipcp_lattice<ipa_polymorphic_call_context> *ctxlat = &plats->ctxlat;
> + /* Do not account known context as reason for cloning. We can see
> + if it permits devirtualization. */
> if (ctxlat->is_single_const ())
> - {
> - (*known_contexts)[i] = ctxlat->values->value;
> - ret = true;
> - }
> + (*known_contexts)[i] = ctxlat->values->value;
>
> if (known_aggs)
> {
> @@ -2494,7 +2496,9 @@ estimate_local_effects (struct cgraph_no
> &known_contexts, &known_aggs,
> &removable_params_cost);
> known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
> - if (always_const)
> + int devirt_bonus = devirtualization_time_bonus (node, known_csts,
> + known_contexts, known_aggs_ptrs);
> + if (always_const || devirt_bonus || removable_params_cost)
> {
> struct caller_statistics stats;
> inline_hints hints;
> @@ -2505,8 +2509,7 @@ estimate_local_effects (struct cgraph_no
> false);
> estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
> known_aggs_ptrs, &size, &time, &hints);
> - time -= devirtualization_time_bonus (node, known_csts, known_contexts,
> - known_aggs_ptrs);
> + time -= devirt_bonus;
> time -= hint_time_bonus (hints);
> time -= removable_params_cost;
> size -= stats.n_calls * removable_params_cost;
> @@ -2515,8 +2518,7 @@ estimate_local_effects (struct cgraph_no
> fprintf (dump_file, " - context independent values, size: %i, "
> "time_benefit: %i\n", size, base_time - time);
>
> - if (size <= 0
> - || node->will_be_removed_from_program_if_no_direct_calls_p ())
> + if (size <= 0 || node->local.local)
> {
> info->do_clone_for_all_contexts = true;
> base_time = time;
> @@ -2544,6 +2546,10 @@ estimate_local_effects (struct cgraph_no
> "max_new_size would be reached with %li.\n",
> size + overall_size);
> }
> + else if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, " Not cloning for all contexts because "
> + "!good_cloning_opportunity_p.\n");
> +
> }
>
> for (i = 0; i < count ; i++)
> @@ -4419,7 +4425,7 @@ identify_dead_nodes (struct cgraph_node
> {
> struct cgraph_node *v;
> for (v = node; v ; v = ((struct ipa_dfs_info *) v->aux)->next_cycle)
> - if (v->will_be_removed_from_program_if_no_direct_calls_p ()
> + if (v->local.local
> && !v->call_for_symbol_thunks_and_aliases
> (has_undead_caller_from_outside_scc_p, NULL, true))
> IPA_NODE_REF (v)->node_dead = 1;
More information about the Gcc-patches
mailing list