[PATCH 3/4] ipa-cp: Fix updating of profile counts and self-gen value evaluation

Martin Jambor mjambor@suse.cz
Wed Oct 27 13:18:12 GMT 2021


On Mon, Oct 18 2021, Martin Jambor wrote:
>
[...]
>
> IPA-CP does not do a reasonable job when it is updating profile counts
> after it has created clones of recursive functions.  This patch
> addresses that by:
>
> 1. Only updating counts for special-context clones.  When a clone is
> created for all contexts, the original is going to be dead and the
> cgraph machinery has copied counts to the new node which is the right
> thing to do.  Therefore updating counts has been moved from
> create_specialized_node to decide_about_value and
> decide_whether_version_node.
>
> 2. The current profile updating code artificially increased the assumed
> old count when the sum of counts of incoming edges to both the
> original and new node were bigger than the count of the original
> node.  This always happened when self-recursive edge from the clone
> was also redirected to the clone because both the original edge and
> its clone had original high counts.  This clutch was removed and
> replaced by the next point.
>
> 3. When cloning also redirects a self-recursive clone to the clone
> itself, new logic has been added to divide the counts brought by such
> recursive edges between the original node and the clone.  This is
> impossible to do well without special knowledge about the function and
> which non-recursive entry calls are responsible for what portion of
> recursion depth, so the approach taken is rather crude.
>
> For local nodes, we detect the case when the original node is never
> called (in the training run at least) with another value and if so,
> steal all its counts like if it was dead.  If that is not the case, we
> try to divide the count brought by recursive edges (or rather not
> brought by direct edges) proportionally to the counts brought by
> non-recursive edges - but with artificial limits in place so that we
> do not take too many or too few, because that was happening with
> detrimental effect in mcf_r.
>
> 4. When cloning creates extra clones for values brought by a formerly
> self-recursive edge with an arithmetic pass-through jump function on
> it, such as it does in exchange2_r, all such clones are processed at
> once rather than one after another.  The counts of all such nodes are
> distributed evenly (modulo even-formerly-non-recursive-edges) and the
> whole situation is then fixed up so that the edge counts fit.  This is
> what new function update_counts_for_self_gen_clones does.
>
> 5. When values brought by a formerly self-recursive edge with an
> arithmetic pass-through jump function on it are evaluated by
> heuristics which assumes vast majority of node counts are result of
> recursive calls and so we simply divide those with the number of
> clones there would be if we created another one.
>
> 6. The mechanisms in init_caller_stats and gather_caller_stats and
> get_info_about_necessary_edges was enhanced to gather data required
> for the above and a missing check not to count dead incoming edges was
> also added.
>
> gcc/ChangeLog:
>
> 2021-10-15  Martin Jambor  <mjambor@suse.cz>
>
> 	* ipa-cp.c (struct caller_statistics): New fields rec_count_sum,
> 	n_nonrec_calls and itself, document all fields.
> 	(init_caller_stats): Initialize the above new fields.
> 	(gather_caller_stats): Gather self-recursive counts and calls number.
> 	(get_info_about_necessary_edges): Gather counts of self-recursive and
> 	other edges bringing in the requested value separately.
> 	(dump_profile_updates): Rework to dump info about a single node only.
> 	(lenient_count_portion_handling): New function.
> 	(struct gather_other_count_struct): New type.
> 	(gather_count_of_non_rec_edges): New function.
> 	(struct desc_incoming_count_struct): New type.
> 	(analyze_clone_icoming_counts): New function.
> 	(adjust_clone_incoming_counts): Likewise.
> 	(update_counts_for_self_gen_clones): Likewise.
> 	(update_profiling_info): Rewritten.
> 	(update_specialized_profile): Adjust call to dump_profile_updates.
> 	(create_specialized_node): Do not update profiling info.
> 	(decide_about_value): New parameter self_gen_clones, either push new
> 	clones into it or updat their profile counts.  For self-recursively
> 	generated values, use a portion of the node count instead of count
> 	from self-recursive edges to estimate goodness.
> 	(decide_whether_version_node): Gather clones for self-generated values
> 	in a new vector, update their profiles at once at the end.


Honza approved the patch in a private conversation and I have pushed it
to master as commit d1e2e4f9ce4df50564f1244dcea9befc3066faa8.

Thanks,

Martin


More information about the Gcc-patches mailing list