This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

From: "marxin at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 12 Apr 2017 13:25:49 +0000
Subject: [Bug gcov-profile/77698] Unrolled loop not considered hot after profiling
Auto-submitted: auto-generated
References: <bug-77698-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-04-12
                 CC|                            |marxin at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |marxin at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Confirmed. Actually I talked with Honza last week about usage of working sets
and problems it has. Your sample nicely illustrates one of them:

As you have a really dominant edge and the rest of program has sum really
really small, then hotness is equal to execution count of the maximal edge.
Which is obviously very wrong even in case where loop unrolling is asking for a
split edge with fraction of the frequency.

Just for your information, before the current state (r193747), we used to
compute the hotness threshold as follows:

profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)

where the param used to have value 10000. That results in value 100. By the
way, I believe the value should be also divided by profile_info->runs (number
of runs) as sum_max is increasing with # of a binary is executed.

Second issue I see is quite huge performance overhead during instrumentation
run. For programs that are executed repeatedly, one can see in perf top:

    11.23%  git                 [.] gcov_do_dump
     9.14%  git                 [.] __gcov_write_summary
     5.60%  libc-2.25.so        [.] __memset_sse2_unaligned_erms
     3.82%  git                 [.] __gcov_read_summary

and of course it occupies space both in profile and an instrumented binary.

That said I'm planning to test the original mechanism and compare it to the
current one.
And we can add an option to switch in between these 2 methods.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]