This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition


On Fri, Aug 9, 2013 at 8:28 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > Do we sanity check that the cold partition does not contain any blocks of
>> > count 0?  It may be that the profile is broken enough to make partitioning
>> > not work.
>>
>> Do you mean sanity check that the cold partition does not contain any
>> blocks of count > 0? (they should all be zero) I don't think that
>> sanity check is there, but I can try adding that.
>
> Thanks, lets start with this - I suppose we need to figure out if
>  1) the reachable blocks goes to cold section because partitioning decides
>     so even if they have non-0 count.

Right, this should be easy enough to check and should hopefully never happen.

>  2) the reachable blocks goes to cold section because they have incorrectly
>     updated count to 0 by someone

A sanity check should find this too. But it can happen now for various
reasons like the comdat issue I described below. But it will be good
to flag these and fix them.

>  3) profiling gets some blocks wrong.

This is the one that will be tough to fix, if the training run isn't
representative.

>>
>> The issue with such a sanity check may be due to the later fixup I
>> have in this patch (fixup_partitions). It is invoked after certain
>> optimizations on the cfg that may make hot blocks previously reached
>> by both hot and cold edges only reachable by cold blocks. These blocks
>> are remarked cold. If the profile data hasn't been updated correctly
>> it is possible that they would still have a non-0 count, although they
>> are essentially cold after the cfg transformation.
>
> Well, or the other posibility is that the edges was updated wrong
> and the blocks are really cold.  We need to figure out if that happens
> commonly enough.
>
> I will try to think of some artificial testcases.
>
>>
>> But certainly such a sanity check should always succeed after the
>> original partitioning.
>>
>> > I can think of inlining where the count gets scaled all way down to 0.  Perhaps
>> > count scaling code can be modified to never round towards 0 for block executing
>> > non-0 times...
>>
>> This reminds me of why this situation could happen. When I have been
>> testing this on the google branch I found situations where COMDAT
>> routines have 0 profile counts (if I remember correctly, this happens
>> when profile-gen binary has call to out-of-line copy of COMDAT in
>> module A, linker chooses the out-of-line copy from module B, therefore
>> the profile data for COMDAT in module A is 0). When the COMDAT gets
>> inlined, the 0 counts on its bbs are scaled to 0, even though the
>> callsite is non-zero. I have a patch that I was planning to send as a
>> follow-up that handles this case by propagating the callsite bb's
>> count to the inlined code when it has 0 counts, scaling by the edge
>> frequencies. I can either include that patch in this one, or send it
>> for review separately right now. Do you want to give it a try with
>> this one to see if it addresses the issue?
>
> This scenario should not happen with LTO setup: the LTO symbol tables contains
> code before early optimization and should be identical with profiling or
> without (modulo the new references and call from profile code).
>
> But this patch seems useful as a backup solution for non-LTO, so yes, please
> send it separately and I can try to double check that it really do not happen
> with LTO.
> (acutally LTO symtab may just chose COMDAT from module that has counts with it.
> It has all the info for it.  I was thinkin about it few weeks back.  It is
> bit hard to do - you need to verify that all references from the function are
> the same or linking might fail if you overwrite linker's decisiosns).

I see, yes LTO can deal with this better since it has global
information. In non-LTO mode (including LIPO) we have the issue.

I take it gimp is built with LTO and therefore shouldn't be hitting
this comdat issue?

Let me do a couple things:
- port over my comdat inlining fix from the google branch to trunk and
send it for review. If you or Martin could try it to see if it helps
with function splitting to avoid the hits from the cold code that
would be great
- I'll add some new sanity checking to try to detect non-zero blocks
in the cold section, or 0 blocks reached by non-zero edges and see if
I can flush out any problems with my tests or a profiledbootstrap or
gimp.
- I'll try building and profiling gimp myself to see if I can
reproduce the issue with code executing out of the cold section.

Thanks,
Teresa

>>
>> Also, can you send me reproduction instructions for gimp? I don't
>> think I need Martin's patch, but which version of gimp and what is the
>> equivalent way for me to train it? I have some scripts to generate a
>> similar type of instruction heat map graph that I have been using to
>> tune partitioning and function reordering. Essentially it uses linux
>> perf to sample on instructions_retired and then munge the data in
>> several ways to produce various stats and graphs. One thing that has
>> been useful has been to combine the perf data with nm output to
>> determine which cold functions are being executed at runtime.
>
> Martin?
>
>>
>> However, for this to tell me which split cold bbs are being executed I
>> need to use a patch that Sri sent for review several months back that
>> gives the split cold section its own name:
>>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
>> Steven had some follow up comments that Sri hasn't had a chance to address yet:
>>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
>> (cc'ing Sri as we should probably revive this patch soon to address
>> gdb and other issues with detecting split functions properly)
>
> Intresting, I used linker script for this purposes, but that his GNU ld only...
>
> Honza
>>
>> Thanks!
>> Teresa
>>
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> Teresa
>> >>
>> >> > I think we are really looking primarily for dead parts of the functions (sanity checks/error handling)
>> >> > that should not be visited by train run.  We can then see how to make the heuristic more aggressive?
>> >> >
>> >> > Honza
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]