This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
- From: Teresa Johnson <tejohnson at google dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Bernhard Reutner-Fischer <rep dot dot dot nop at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Steven Bosscher <stevenb dot gcc at gmail dot com>, Jeff Law <law at redhat dot com>, "marxin.liska" <marxin dot liska at gmail dot com>, Sriraman Tallam <tmsriram at google dot com>, Rong Xu <xur at google dot com>
- Date: Wed, 28 Aug 2013 11:20:08 -0700
- Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
- Authentication-results: sourceware.org; auth=none
- References: <20130809095843 dot GC31755 at kam dot mff dot cuni dot cz> <CAAe5K+XXT6t5CXBDXPWMNSrLWwqfw8F_J2fNUAN2afqb5qPhzQ at mail dot gmail dot com> <20130809152804 dot GA6579 at kam dot mff dot cuni dot cz> <CAAe5K+UcMevsDcXOeq5tu0+u-FrLVVWR6Wd20ZhBHNWdNsQ4Zw at mail dot gmail dot com> <CAAe5K+W+VEERrsaCCjCD=n40_MO2EPashDp5qnKoS8SLaSjBjQ at mail dot gmail dot com> <20130817204408 dot GA16557 at kam dot mff dot cuni dot cz> <CAAe5K+XGKvd+_8Ukp0kpOWvc2k165F=4fdemf-iDz+QkirLPmg at mail dot gmail dot com> <20130819150942 dot GA28264 at kam dot mff dot cuni dot cz> <CAAe5K+UnqBfxYrZxSkjRudq8NYni_9ih+t=us+2hH+UPsrwDLA at mail dot gmail dot com> <CAAe5K+UhCzfwm_WomE1yk+ET1tBiNT5svfn_LAc57MUSqJbnaQ at mail dot gmail dot com> <20130828165830 dot GC12167 at kam dot mff dot cuni dot cz>
On Wed, Aug 28, 2013 at 9:58 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> with Martin we did bit of progress on analyzing the problems. We now have
> COMDAT profile merging for FDO
Great! Is this the LTO merging you were talking about in an earlier
message, or the gcov runtime fix (that would presumably not be
lto-specific)?
> and we also noticed that forks can make your
> basic blocks appear never executed even though they are executed every run:
> the fork is accounted as 3 independnet runs of the program. First run
> is until fork, the second 2 runs are both variant.
>
> I have patch to track this. Moreover vforks seems to produce repeated
> merging of results.
Aha, does this explain the gimp issue as well?
>
> These two factors seems to help to riddle enought the firefox profiles
> so it took us weeks to understand what happens.
>> + if (changed)
>> + {
>> + /* Edge forwarding in particular can cause hot blocks previously
>> + reached by both hot and cold blocks to become dominated only
>> + by cold blocks. This will cause the verification
>> below to fail,
>> + and lead to now cold code in the hot section. This is not easy
>> + to detect and fix during edge forwarding, and in some cases
>> + is only visible after newly unreachable blocks are deleted,
>> + which will be done in fixup_partitions. */
>> + fixup_partitions ();
>
> Is it really necessary to run this from internal loop of the cfgcleanup?
The reason I added it here is that just below there is a call to
verify_flow_info, and that will fail with the new verification.
> It seems
> you will play back and forth game where edge forwarding will remove your fallthru
> and you will be re-adding it?
fixup_partitions will not add new fall through edges. (It may invoke
force_nonfallthru to do the opposite.) So there shouldn't be any
ping-ponging effect.
>
> I would wait for cfgcleanup to finish its job (I don't really think it needs to be
> iterative) and then do the fixup possibly cleaning up when the blocks was repoisitoined
> (I suppose it is only case where the code above introduces new cfgcleanup oppurtunities).
As noted above, I can't do this due to the call to verify_flow_info
for each iteration. One option is to move both down outside the loop.
>
>> + /* Walk the preds/succs and check if there is at least one already
>> + marked hot. Keep track of the most frequent pred/succ so that we
>> + can mark it hot if we don't find one. */
>> + FOR_EACH_EDGE (e, ei, edges)
>> + {
>> + basic_block reach_bb = walk_up ? e->src : e->dest;
>> +
>> + if (e->flags & EDGE_DFS_BACK)
>> + continue;
>> +
>> + if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
>> + {
>> + found = true;
>> + break;
>> + }
>> + if (e->probability > highest_probability)
>> + highest_probability = e->probability;
>
> When doing predecestor walk if you have two predecestors, one executing 100000
> times and getting to the block with probability 1%, you want to chose it over
> block executing once and getting to you with probability 100%.
>
> You probably want to look for most likely predecestor here. You need to look
> for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for
> maximal probability only if all fails?
Yes, thanks, let me do that.
>
>> + }
>> +
>> + /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
>> + block (or unpartitioned, e.g. the entry block) then it is ok. If not,
>> + then the most frequent pred (or succ) needs to be adjusted. In the
>> + case where multiple preds/succs have the same probability (e.g. a
>> + 50-50 branch), then both will be adjusted. */
>> + if (found)
>> + continue;
>> +
>> + FOR_EACH_EDGE (e, ei, edges)
>> + {
>> + if (e->flags & EDGE_DFS_BACK)
>> + continue;
>> + if (e->probability < highest_probability)
>> + continue;
>
> Again for predecestor walk you need to wind down the bit crazy logic described above.
>> Index: predict.c
>> ===================================================================
>> --- predict.c (revision 202021)
>> +++ predict.c (working copy)
>> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun
>> return false;
>> }
>>
>> +
>> +/* Return true in case edge E is probably never executed. */
>> +
>> +bool
>> +probably_never_executed_edge_p (struct function *fun, edge e)
>> +{
>> + gcc_checking_assert (fun);
>> + if (profile_info && flag_branch_probabilities)
>> + return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
>> + if ((!profile_info || !flag_branch_probabilities)
>> + && (cgraph_get_node (fun->decl)->frequency
>> + == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>> + return true;
>> + return false;
> Instead of duplicating the conditional, break out the tests into
> probably_never_executed_count_p, like we have for maybe_hot_count_p.
ok
>
> It would be nice to extend this to work w/o profile: probably_never_executed_edge_p
> can return true for EH edges, setjmp edges and we can then walk bodies ignoring
> EH/setjmp flagging blocks that are probably never executed enabling the partitioning
> to do a job w/o profile.
agreed. although I would prefer to leave that for a follow-on patch so
it can be tuned a bit.
>
> Otherwise the patch look OK to me. Thanks for working on this. Do we have agreement on C++ way of mixing
> declarations and code?
According to http://gcc.gnu.org/wiki/CppConventions:
"In new code variables which are used in a small scope should be
defined at the point of first use, rather than at the top of the
function. Variables which are used throughout the function may be
defined at the top of the function, as in C."
I think I am following that, but let me know if you see something that
needs to be fixed.
Thanks,
Teresa
>
> Honza
--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413