This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] minor FDO profile related fixes
- From: Martin Liška <mliska at suse dot cz>
- To: Indu Bhagat <indu dot bhagat at oracle dot com>, gcc-patches at gcc dot gnu dot org
- Cc: Jan Hubicka <hubicka at ucw dot cz>
- Date: Mon, 12 Nov 2018 10:48:52 +0100
- Subject: Re: [PATCH] minor FDO profile related fixes
- References: <3391e0e6-4acb-cd88-985d-4788d650c878@oracle.com>
On 11/8/18 1:49 AM, Indu Bhagat wrote:
> I have been looking at -fdump-ipa-profile dump with an intention to sanitize
> bits of information so that one may use it to judge the "quality of a profile"
> in FDO.
>
> The overall question I want to address is - are there ways to know which
> functions were not run in the training run, i.e. have ZERO profile ?
> (This patch corrects some dumped info; in a subsequent patch I would like to add
> some more explicit information/diagnostics.)
>
> Towards that end, I noticed that there are a couple of misleading bits of
> information (so I think) in the symbol table dump listing all functions in the
> compilation unit :
> --- "globally 0" appears even when profile data has not been fed by feedback
> profile (not the intent as the documentation of profile_guessed_global0
> in profile-count.h suggests).
> --- "unlikely_executed" should appear only when there is profile feedback or
> a function attribute is specified (as per documentation of
> node_frequency in coretypes.h). "unlikely_executed" in case of STALE or
> NO profile is misleading in my opinion.
>
> Summary of changes :
>
> 1. This patch makes some adjustments around how x_profile_status of a function
> is set - x_profile_status should be set to PROFILE_READ only when there is a
> profile for a function read from the .gcda file. So, instead of relying on
> profile_info (set whenever the gcda feedback file is present, even if the
> function does not have a profile available in the file), use exec_counts
> (non null when function has a profile (the latter may or may not be zero)). In
> essence, x_profile_status and profile_count::m_quality
> are set consistent to the stated intent (in code comments.)
>
> 2. A minor change in coverage.c is for more precise location of the message
>
> Following -fdump-ipa-profile dump excerpts show the effect :
>
> ------------------------------------------------
> -O1, -O2, -O3
> ------------------------------------------------
>
> 0. APPLICABLE PROFILE
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
>
> 1. STALE PROFILE
> (i.e., those cases covered by Wcoverage-mismatch; when control flow changes
> between profile-generate and profile-use)
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
>
> 2. NO PROFILE
> (i.e., those cases covered by Wmissing-profile; when function has no profile
> available in the .gcda file)
> Trunk (missing .gcda file) : Function flags: count:1073741824 (estimated locally) body
> Trunk (missing function) : Function flags: count: 1073741824 (estimated locally, globally 0) body unlikely_executed
> After Patch (missing .gcda file) : Function flags: count:1073741824 (estimated locally) body
> After Patch (missing function) : Function flags: count:1073741824 (estimated locally) body
>
> 3. ZERO PROFILE (functions not run in training run)
> Trunk : Function flags: count: 1073741824 (estimated locally, globally 0) body unlikely_executed
> After Patch (remains the same) : count: 1073741824 (estimated locally, globally 0) body unlikely_executed
>
> --------------------------------------------------
> O0
> --------------------------------------------------
> In O0, flag_guess_branch_prob is not set. This makes the profile_quality set to
> (precise) for most of the above cases.
>
> 0. APPLICABLE PROFILE
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
>
> 1. STALE PROFILE
> (i.e., those cases covered by Wcoverage-mismatch; when control flow changes
> between profile-generate and profile-use)
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
>
> 2. NO PROFILE
> (i.e., those cases covered by Wmissing-profile; when function has no profile
> available in the .gcda file)
> Trunk (missing file) : Function flags: body
> Trunk (missing function) : Function flags: count:0 body unlikely_executed
> After Patch (missing file) : Function flags: body
> *** After Patch (missing function) : Function flags: count:0 (precise) body
> (*** This remains misleading, and I do not have a solution for this; as use of heuristics
> to guess branch probability is not allowed in O0)
>
> 3. ZERO PROFILE (functions not run in training run)
> Trunk : Function flags: count:0 body unlikely_executed
> After Patch : Function flags: count:0 (precise) body
>
> --------------------------------------------------
>
> make check-gcc on x86_64 shows no new failures.
>
> (A related PR was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86957 where we added diagnostics for the NO PROFILE case.)
Hi.
Thanks for the patch. I'm not a maintainer, but the idea of the patch looks correct to me.
One question about adding "(precise)", have you verified that the patch can survive regression
tests?
Thanks,
Martin
>
>
> precise-ipa-dump-optinfo.patch.ver1
>
> diff --git a/gcc/coverage.c b/gcc/coverage.c
> index 599a3bb..7595e6c 100644
> --- a/gcc/coverage.c
> +++ b/gcc/coverage.c
> @@ -358,7 +358,7 @@ get_coverage_counts (unsigned counter, unsigned cfg_checksum,
> if (warning_printed && dump_enabled_p ())
> {
> dump_user_location_t loc
> - = dump_user_location_t::from_location_t (input_location);
> + = dump_user_location_t::from_function_decl (current_function_decl);
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
> "use -Wno-error=coverage-mismatch to tolerate "
> "the mismatch but performance may drop if the "
> diff --git a/gcc/profile-count.c b/gcc/profile-count.c
> index f4ab244..90f4feb 100644
> --- a/gcc/profile-count.c
> +++ b/gcc/profile-count.c
> @@ -83,6 +83,8 @@ profile_count::dump (FILE *f) const
> fprintf (f, " (auto FDO)");
> else if (m_quality == profile_guessed)
> fprintf (f, " (guessed)");
> + else if (m_quality == profile_precise)
> + fprintf (f, " (precise)");
> }
> }
>
> diff --git a/gcc/profile.c b/gcc/profile.c
> index 2130319..57e3f3c 100644
> --- a/gcc/profile.c
> +++ b/gcc/profile.c
> @@ -698,6 +698,11 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum)
> }
> }
>
> + if (exec_counts)
> + {
> + profile_status_for_fn (cfun) = PROFILE_READ;
> + }
> +
> /* If we have real data, use them! */
> if (bb_gcov_count (ENTRY_BLOCK_PTR_FOR_FN (cfun))
> || !flag_guess_branch_prob)
> @@ -705,7 +710,7 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum)
> bb->count = profile_count::from_gcov_type (bb_gcov_count (bb));
> /* If function was not trained, preserve local estimates including statically
> determined zero counts. */
> - else
> + else if (profile_status_for_fn (cfun) == PROFILE_READ)
> FOR_ALL_BB_FN (bb, cfun)
> if (!(bb->count == profile_count::zero ()))
> bb->count = bb->count.global0 ();
> @@ -718,6 +723,11 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum)
>
> if (dump_file)
> {
> + fprintf (dump_file, " Profile feedback for function");
> + fprintf (dump_file, ((profile_status_for_fn (cfun) == PROFILE_READ)
> + ? " is available \n"
> + : " is not available \n"));
> +
> fprintf (dump_file, "%d branches\n", num_branches);
> if (num_branches)
> for (i = 0; i < 10; i++)
> @@ -1317,12 +1327,12 @@ branch_prob (void)
> values.release ();
> free_edge_list (el);
> coverage_end_function (lineno_checksum, cfg_checksum);
> - if (flag_branch_probabilities && profile_info)
> + if (flag_branch_probabilities
> + && (profile_status_for_fn (cfun) == PROFILE_READ))
> {
> struct loop *loop;
> if (dump_file && (dump_flags & TDF_DETAILS))
> report_predictor_hitrates ();
> - profile_status_for_fn (cfun) = PROFILE_READ;
>
> /* At this moment we have precise loop iteration count estimates.
> Record them to loop structure before the profile gets out of date. */
>