This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64


On Fri, Apr 30, 2010 at 1:37 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> In theory, LIPO should not generate better results than LTO+FDO. What
>> makes LIPO attractive is that it allows distributed build from the
>> beginning. Its integration with large distributed build system is also
>> easy. ?Another point is that LIPO can be decoupled from FDO as well.

>
> The integration should be pretty much same as with current FDO, right?
> Just arrange to get everything build twice and trained in between.


Right. LIPO behaves similarly to plain FDO (no IPO).

>
>> The reason is that cross module call clusters do not change that much
>> and can be determined statically or determined once using sample
>> profiling information. The grouping info can then be used for regular
>> O2 builds. This will remove the need for people to move functions into
>
> This means, build once to gather callgraph and instead of deciding grouping
> at runtime with profile info in it, just do it via some tool statically?
>

One could run LIPO instrumented binary one and get the grouping and
reuse the grouping.  It is also possible to determine this grouping
using regular binary with hardware profiling (we have an internal tool
to do that). Or if the user knows the module affinity (but does not
want to rearrange source structures for SW engineering reasons), he
can choose to specify the grouping statically (currently not supported
yet). For instance, we can invent some directives for  that:

#include_aux_module "../a/b/c.cpp"

The real scenario can be more complicated that this in order to
support different options, include search paths etc.


>> header files which tend to penalize compile time unnecessarily.
>>
>> If there is performance difference, the following unique things in
>> LIPO may contribute to it ( I have not validate them)
>>
>> 1) LIPO supports tracking indirect call targets across modules. This
>> is not feasible for plain FDO as there will be cgraph pid conflicts.
>> LIPO uses unique function id == (module_id << 32) + func_def_no, which
>> makes it possible.
>
> Interesting. ?My plan for profiling with LTO is to ultimately make it linktime
> transform. ?This will be more difficult with WHOPR (i.e. instrumenting need
> function bodies that are not available at WPA time), but I believe it is
> solvable: just assign uids to the edges and do instrumentation at ltrans. ?Then
> we will save cgraph profile in some easier way so WHOPR can read it in and read
> rest of stuff in ltrans. ?This would invovlve shipping the correct profiles for
> given function etc so it will be a bit of implementation challenge.

This can be tricky -- to maximize FDO benefit, the
profile-use/annotation needs to happen early which means
instrumentation also needs to happen early (to avoid cfg mismatches).


>
>> 2) comdat function resolution -- since LIPO uses aux module functions
>> for inlining purpose only, it has the freedom to choose which copy to
>> use. The current scheme chooses copy in current module with priority
>> for better profile data context sensitivity (see below)
>
> This is interesting. ?How do you solve the problem when given comdat function
> "loose"? I.e. it is replaced at linktime by other function that may or may
> not be profiled from other unit?

Whatever function that is selected will have profile data (assuming it
called at runtime) -- but the profile data are merged from different
contexts including from calls in different modules.   For instance,
both a.C and b.C define foo. and b.C:foo is selected at runtime, and
a.C:foo is not inlined (after instrumentation) anywhere in a.C, then
a.C:foo won't have any profile data, and b.C:foo has merged profile
data resulting from calls in both a.C and b.C.


>
> I am aware that current FDO gets this wrong (it assumes that comdat functions
> are never replaced from other unit). ?I guess situation can be improved a bit
> by doing some localization even at no -fwhole-program or teach runtime to merge
> in profiles into each individual copy of comdat...

Yes, current FDO assumption is wrong.

Thanks,

David

>
> Honza
>
>> 3) in profile-gen phase, allow more inlining for comdat functions (in
>> einline2 and ipa-inline) -- this will cause profile data to be tracked
>> with module sensitivity (note that counters are not in comdat group)
>>
>> Thanks,
>>
>> David
>>
>>
>>
>> > Honza
>> >>
>> >> Ciao!
>> >> Steven
>> >
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]