ipa-cp heuristics fixes

Martin Jambor mjambor@suse.cz
Thu Dec 10 10:47:00 GMT 2015


Hi,

thanks for looking into this, I only have one question:

On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> Martin,
> while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
> First of all, ipcp_cloning_candidate_p calls
>  optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
> which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
> around.  I replaced it by node->optimize_for_size_p ().
> 
> Second we perform incredible number of clones because we do obtain some sort of
> polymorphic call context for them.  In wast majority of cases this is useless
> effort, because the functions in question do not contain virtual calls and do
> not pass the parameter further.  For firefox about 40k out of 50k clones
> created are created just because we found some context.
> 
> I changed the code to only clone if this immediately leads to devirtualization.
> This do not cause any noticeable drop in number of devirtualized calls on
> Firefox. I suppose we will miss the case where cloning a caller may allow
> devirtualization in a clone of callee, but I do not think the heuristics for
> context independent values can handle this as implemented right now and it
> simply have way to many false positives.
> 
> What we can do is to devirtualize w/o cloning for local functions and
> speculatively devirtualize in case we would otherwise clone.
> 
> Third problem I noticed is that
> will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
> ignore the function size when deciding about the code size impact.
> This function is doing some analysis for inliner where it, for example, analyses
> if a comdat which is going to be inlined consistently in the whole program
> will be removed.
> 
> In the cloning case I do not see this to apply: we have no evidence that the
> other units will pass the same constants to the function.  I think you
> basically want to assume that the  function will be removed if it has no
> address taken and it is not externally visibible. This is what local flag
> is for.
> 
> I gathered some stats:
> 
> number of clones for all contexts: 49948->11102
> number of clones: 4376->4383
> 
> good_cloning_opportunity_p is called about 70k times, I wonder if the
> thresholds are not simply set too high.  For example, inliner does about 300k
> inlines at Firefox.
> 
> number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I do not have data on unpatched tree for this)
> number of devirts: 956->933
> number of devirts happening at inline: 781->868
> number of indirect calls promoted: 512->512
> 
> Inliner stats from: Unit growth for small function inlining: 7965701->9130051 (14%)
> to: Unit growth for small function inlining: 7965010->9138577
> 
> So it seems that except for large drop in number of clones there is no significant difference.
> 
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> Honza
> 
> 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> 	(good_cloning_opportunity_p): Likewise.
> 	(gather_context_independent_values): Do not return true when
> 	polymorphic call context is known or when we have known aggregate
> 	value of unused parameter.
> 	(estimate_local_effects): Try to create clone for all context
> 	when either some params are substituted or devirtualization is possible
> 	or some params can be removed; use local flag instead of
> 	node->will_be_removed_from_program_if_no_direct_calls_p.
> 	(identify_dead_nodes): Likewise.
> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c	(revision 231477)
> +++ ipa-cp.c	(working copy)
> @@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_
>        return false;
>      }
>  
> -  if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +  if (node->optimize_for_size_p ())
>      {
>        if (dump_file)
>          fprintf (dump_file, "Not considering %s for cloning; "
> @@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap
>  {
>    if (time_benefit == 0
>        || !opt_for_fn (node->decl, flag_ipa_cp_clone)
> -      || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +      || node->optimize_for_size_p ())
>      return false;
>  
>    gcc_assert (size_cost > 0);
> @@ -2387,12 +2387,14 @@ gather_context_independent_values (struc
>  	*removable_params_cost
>  	  += ipa_get_param_move_cost (info, i);
>  
> +      if (!ipa_is_param_used (info, i))
> +	continue;
> +

Is this really necessary, is it not enough to remove the assignment to
ret below?  If the parameter is not used, devirtualization time bonus,
which you then rely on estimate_local_effects, should be zero for it.

It is a very minor point, I suppose, but if the function gets cloned
for a different reason, it might still be beneficial to have as much
context-independent information for it as possible too, because that
can then be used in a callee (see the second call of
gather_context_independent_values).

Other than that, all the changes seem like a clear improvement.

Thanks,

Martin

>        ipcp_lattice<ipa_polymorphic_call_context> *ctxlat = &plats->ctxlat;
> +      /* Do not account known context as reason for cloning.  We can see
> +	 if it permits devirtualization.  */
>        if (ctxlat->is_single_const ())
> -	{
> -	  (*known_contexts)[i] = ctxlat->values->value;
> -	  ret = true;
> -	}
> +	(*known_contexts)[i] = ctxlat->values->value;
>  
>        if (known_aggs)
>  	{
> @@ -2494,7 +2496,9 @@ estimate_local_effects (struct cgraph_no
>  						    &known_contexts, &known_aggs,
>  						    &removable_params_cost);
>    known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
> -  if (always_const)
> +  int devirt_bonus = devirtualization_time_bonus (node, known_csts,
> +					   known_contexts, known_aggs_ptrs);
> +  if (always_const || devirt_bonus || removable_params_cost)
>      {
>        struct caller_statistics stats;
>        inline_hints hints;
> @@ -2505,8 +2509,7 @@ estimate_local_effects (struct cgraph_no
>  					      false);
>        estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
>  					 known_aggs_ptrs, &size, &time, &hints);
> -      time -= devirtualization_time_bonus (node, known_csts, known_contexts,
> -					   known_aggs_ptrs);
> +      time -= devirt_bonus;
>        time -= hint_time_bonus (hints);
>        time -= removable_params_cost;
>        size -= stats.n_calls * removable_params_cost;
> @@ -2515,8 +2518,7 @@ estimate_local_effects (struct cgraph_no
>  	fprintf (dump_file, " - context independent values, size: %i, "
>  		 "time_benefit: %i\n", size, base_time - time);
>  
> -      if (size <= 0
> -	  || node->will_be_removed_from_program_if_no_direct_calls_p ())
> +      if (size <= 0 || node->local.local)
>  	{
>  	  info->do_clone_for_all_contexts = true;
>  	  base_time = time;
> @@ -2544,6 +2546,10 @@ estimate_local_effects (struct cgraph_no
>  		     "max_new_size would be reached with %li.\n",
>  		     size + overall_size);
>  	}
> +      else if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file, "   Not cloning for all contexts because "
> +		 "!good_cloning_opportunity_p.\n");
> +	
>      }
>  
>    for (i = 0; i < count ; i++)
> @@ -4419,7 +4425,7 @@ identify_dead_nodes (struct cgraph_node
>  {
>    struct cgraph_node *v;
>    for (v = node; v ; v = ((struct ipa_dfs_info *) v->aux)->next_cycle)
> -    if (v->will_be_removed_from_program_if_no_direct_calls_p ()
> +    if (v->local.local
>  	&& !v->call_for_symbol_thunks_and_aliases
>  	     (has_undead_caller_from_outside_scc_p, NULL, true))
>        IPA_NODE_REF (v)->node_dead = 1;



More information about the Gcc-patches mailing list