This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling
- From: "marxin at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 12 Apr 2017 13:25:49 +0000
- Subject: [Bug gcov-profile/77698] Unrolled loop not considered hot after profiling
- Auto-submitted: auto-generated
- References: <bug-77698-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2017-04-12
CC| |marxin at gcc dot gnu.org
Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Confirmed. Actually I talked with Honza last week about usage of working sets
and problems it has. Your sample nicely illustrates one of them:
As you have a really dominant edge and the rest of program has sum really
really small, then hotness is equal to execution count of the maximal edge.
Which is obviously very wrong even in case where loop unrolling is asking for a
split edge with fraction of the frequency.
Just for your information, before the current state (r193747), we used to
compute the hotness threshold as follows:
profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)
where the param used to have value 10000. That results in value 100. By the
way, I believe the value should be also divided by profile_info->runs (number
of runs) as sum_max is increasing with # of a binary is executed.
Second issue I see is quite huge performance overhead during instrumentation
run. For programs that are executed repeatedly, one can see in perf top:
11.23% git [.] gcov_do_dump
9.14% git [.] __gcov_write_summary
5.60% libc-2.25.so [.] __memset_sse2_unaligned_erms
3.82% git [.] __gcov_read_summary
and of course it occupies space both in profile and an instrumented binary.
That said I'm planning to test the original mechanism and compare it to the
current one.
And we can add an option to switch in between these 2 methods.