This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
- From: Martin LiÅka <marxin dot liska at gmail dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Teresa Johnson <tejohnson at google dot com>, Bernhard Reutner-Fischer <rep dot dot dot nop at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Steven Bosscher <stevenb dot gcc at gmail dot com>, Jeff Law <law at redhat dot com>, Sriraman Tallam <tmsriram at google dot com>
- Date: Fri, 9 Aug 2013 17:54:36 +0200
- Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
- References: <CAAe5K+U6+xyy95KSeA7+SZ0tUdFt2dmF-vSxNsBsqg53NSyU3Q at mail dot gmail dot com> <CAC1BbcSnwiAfzKqngLvLBGnmPDLNhYdwbyWDMJ++PzsMd=M-3Q at mail dot gmail dot com> <CAAe5K+XM_RCJBmtKSyQCfD86hi2Ls_BCyuZkd2WKSoGUFm84DA at mail dot gmail dot com> <20130802150529 dot GC15776 at kam dot mff dot cuni dot cz> <CAAe5K+WiR02Rs1jYMFRF2F8ey60UO7LwRa8WWq7coQ5Pq8HhiQ at mail dot gmail dot com> <CAAe5K+WToUznwFFfm5beapXAOOrOgxHR8LXmYBTL70C4VVsT+w at mail dot gmail dot com> <20130808222332 dot GA31755 at kam dot mff dot cuni dot cz> <CAAe5K+W+8borbPkt4BB1ayRgDbFBtd6oyZsGuUiC854o9t0Rjg at mail dot gmail dot com> <20130809095843 dot GC31755 at kam dot mff dot cuni dot cz> <CAAe5K+XXT6t5CXBDXPWMNSrLWwqfw8F_J2fNUAN2afqb5qPhzQ at mail dot gmail dot com> <20130809152804 dot GA6579 at kam dot mff dot cuni dot cz>
On 9 August 2013 17:28, Jan Hubicka <email@example.com> wrote:
>> > Do we sanity check that the cold partition does not contain any blocks of
>> > count 0? It may be that the profile is broken enough to make partitioning
>> > not work.
>> Do you mean sanity check that the cold partition does not contain any
>> blocks of count > 0? (they should all be zero) I don't think that
>> sanity check is there, but I can try adding that.
> Thanks, lets start with this - I suppose we need to figure out if
> 1) the reachable blocks goes to cold section because partitioning decides
> so even if they have non-0 count.
> 2) the reachable blocks goes to cold section because they have incorrectly
> updated count to 0 by someone
> 3) profiling gets some blocks wrong.
>> The issue with such a sanity check may be due to the later fixup I
>> have in this patch (fixup_partitions). It is invoked after certain
>> optimizations on the cfg that may make hot blocks previously reached
>> by both hot and cold edges only reachable by cold blocks. These blocks
>> are remarked cold. If the profile data hasn't been updated correctly
>> it is possible that they would still have a non-0 count, although they
>> are essentially cold after the cfg transformation.
> Well, or the other posibility is that the edges was updated wrong
> and the blocks are really cold. We need to figure out if that happens
> commonly enough.
> I will try to think of some artificial testcases.
>> But certainly such a sanity check should always succeed after the
>> original partitioning.
>> > I can think of inlining where the count gets scaled all way down to 0. Perhaps
>> > count scaling code can be modified to never round towards 0 for block executing
>> > non-0 times...
>> This reminds me of why this situation could happen. When I have been
>> testing this on the google branch I found situations where COMDAT
>> routines have 0 profile counts (if I remember correctly, this happens
>> when profile-gen binary has call to out-of-line copy of COMDAT in
>> module A, linker chooses the out-of-line copy from module B, therefore
>> the profile data for COMDAT in module A is 0). When the COMDAT gets
>> inlined, the 0 counts on its bbs are scaled to 0, even though the
>> callsite is non-zero. I have a patch that I was planning to send as a
>> follow-up that handles this case by propagating the callsite bb's
>> count to the inlined code when it has 0 counts, scaling by the edge
>> frequencies. I can either include that patch in this one, or send it
>> for review separately right now. Do you want to give it a try with
>> this one to see if it addresses the issue?
> This scenario should not happen with LTO setup: the LTO symbol tables contains
> code before early optimization and should be identical with profiling or
> without (modulo the new references and call from profile code).
> But this patch seems useful as a backup solution for non-LTO, so yes, please
> send it separately and I can try to double check that it really do not happen
> with LTO.
> (acutally LTO symtab may just chose COMDAT from module that has counts with it.
> It has all the info for it. I was thinkin about it few weeks back. It is
> bit hard to do - you need to verify that all references from the function are
> the same or linking might fail if you overwrite linker's decisiosns).
>> Also, can you send me reproduction instructions for gimp? I don't
>> think I need Martin's patch, but which version of gimp and what is the
>> equivalent way for me to train it? I have some scripts to generate a
>> similar type of instruction heat map graph that I have been using to
>> tune partitioning and function reordering. Essentially it uses linux
>> perf to sample on instructions_retired and then munge the data in
>> several ways to produce various stats and graphs. One thing that has
>> been useful has been to combine the perf data with nm output to
>> determine which cold functions are being executed at runtime.
I use gimp from git repository, commit:
88ecd59c3436d302b644a5d25c1938c0e7b60ae0 (from Fet 5 2013)
>> However, for this to tell me which split cold bbs are being executed I
>> need to use a patch that Sri sent for review several months back that
>> gives the split cold section its own name:
>> Steven had some follow up comments that Sri hasn't had a chance to address yet:
>> (cc'ing Sri as we should probably revive this patch soon to address
>> gdb and other issues with detecting split functions properly)
> Intresting, I used linker script for this purposes, but that his GNU ld only...
>> > Honza
>> >> Thanks,
>> >> Teresa
>> >> > I think we are really looking primarily for dead parts of the functions (sanity checks/error handling)
>> >> > that should not be visited by train run. We can then see how to make the heuristic more aggressive?
>> >> >
>> >> > Honza
>> >> --
>> >> Teresa Johnson | Software Engineer | firstname.lastname@example.org | 408-460-2413
>> Teresa Johnson | Software Engineer | email@example.com | 408-460-2413