[PATCH 3/4] ipa-cp: Fix updating of profile counts and self-gen value evaluation
Martin Jambor
mjambor@suse.cz
Wed Oct 27 13:18:12 GMT 2021
On Mon, Oct 18 2021, Martin Jambor wrote:
>
[...]
>
> IPA-CP does not do a reasonable job when it is updating profile counts
> after it has created clones of recursive functions. This patch
> addresses that by:
>
> 1. Only updating counts for special-context clones. When a clone is
> created for all contexts, the original is going to be dead and the
> cgraph machinery has copied counts to the new node which is the right
> thing to do. Therefore updating counts has been moved from
> create_specialized_node to decide_about_value and
> decide_whether_version_node.
>
> 2. The current profile updating code artificially increased the assumed
> old count when the sum of counts of incoming edges to both the
> original and new node were bigger than the count of the original
> node. This always happened when self-recursive edge from the clone
> was also redirected to the clone because both the original edge and
> its clone had original high counts. This clutch was removed and
> replaced by the next point.
>
> 3. When cloning also redirects a self-recursive clone to the clone
> itself, new logic has been added to divide the counts brought by such
> recursive edges between the original node and the clone. This is
> impossible to do well without special knowledge about the function and
> which non-recursive entry calls are responsible for what portion of
> recursion depth, so the approach taken is rather crude.
>
> For local nodes, we detect the case when the original node is never
> called (in the training run at least) with another value and if so,
> steal all its counts like if it was dead. If that is not the case, we
> try to divide the count brought by recursive edges (or rather not
> brought by direct edges) proportionally to the counts brought by
> non-recursive edges - but with artificial limits in place so that we
> do not take too many or too few, because that was happening with
> detrimental effect in mcf_r.
>
> 4. When cloning creates extra clones for values brought by a formerly
> self-recursive edge with an arithmetic pass-through jump function on
> it, such as it does in exchange2_r, all such clones are processed at
> once rather than one after another. The counts of all such nodes are
> distributed evenly (modulo even-formerly-non-recursive-edges) and the
> whole situation is then fixed up so that the edge counts fit. This is
> what new function update_counts_for_self_gen_clones does.
>
> 5. When values brought by a formerly self-recursive edge with an
> arithmetic pass-through jump function on it are evaluated by
> heuristics which assumes vast majority of node counts are result of
> recursive calls and so we simply divide those with the number of
> clones there would be if we created another one.
>
> 6. The mechanisms in init_caller_stats and gather_caller_stats and
> get_info_about_necessary_edges was enhanced to gather data required
> for the above and a missing check not to count dead incoming edges was
> also added.
>
> gcc/ChangeLog:
>
> 2021-10-15 Martin Jambor <mjambor@suse.cz>
>
> * ipa-cp.c (struct caller_statistics): New fields rec_count_sum,
> n_nonrec_calls and itself, document all fields.
> (init_caller_stats): Initialize the above new fields.
> (gather_caller_stats): Gather self-recursive counts and calls number.
> (get_info_about_necessary_edges): Gather counts of self-recursive and
> other edges bringing in the requested value separately.
> (dump_profile_updates): Rework to dump info about a single node only.
> (lenient_count_portion_handling): New function.
> (struct gather_other_count_struct): New type.
> (gather_count_of_non_rec_edges): New function.
> (struct desc_incoming_count_struct): New type.
> (analyze_clone_icoming_counts): New function.
> (adjust_clone_incoming_counts): Likewise.
> (update_counts_for_self_gen_clones): Likewise.
> (update_profiling_info): Rewritten.
> (update_specialized_profile): Adjust call to dump_profile_updates.
> (create_specialized_node): Do not update profiling info.
> (decide_about_value): New parameter self_gen_clones, either push new
> clones into it or updat their profile counts. For self-recursively
> generated values, use a portion of the node count instead of count
> from self-recursive edges to estimate goodness.
> (decide_whether_version_node): Gather clones for self-generated values
> in a new vector, update their profiles at once at the end.
Honza approved the patch in a private conversation and I have pushed it
to master as commit d1e2e4f9ce4df50564f1244dcea9befc3066faa8.
Thanks,
Martin
More information about the Gcc-patches
mailing list