This is the mail archive of the
mailing list for the GCC project.
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
> > Interesting. ?My plan for profiling with LTO is to ultimately make it linktime
> > transform. ?This will be more difficult with WHOPR (i.e. instrumenting need
> > function bodies that are not available at WPA time), but I believe it is
> > solvable: just assign uids to the edges and do instrumentation at ltrans. ?Then
> > we will save cgraph profile in some easier way so WHOPR can read it in and read
> > rest of stuff in ltrans. ?This would invovlve shipping the correct profiles for
> > given function etc so it will be a bit of implementation challenge.
> This can be tricky -- to maximize FDO benefit, the
> profile-use/annotation needs to happen early which means
> instrumentation also needs to happen early (to avoid cfg mismatches).
I don't see much problem in this particular area.
GCC optimization queue is organized in a way that we first do early
optimizatoins that all are intended to be simple cleanups without size/speed
tradeoffs. Then we do IPA and late optimizations that are both driven by
profile (estimated or read).
Profile reading happens early because we use same infrastructure for gcov and
profile feedback. This is not giving profile feedback better benefit, quite a
converse since early passes may not be able to update profile precisely and we
also get higher profile overhead.
So I think decoupling gcov and profile feedback and pushing profile feedback
back in queue is going to be win.
Yes, optimization must match, but with LTO this is not problem and in general
the early optimization should be stable wrt memory layout (nothing else
changes). This used to be excercised before profiling was updated to tree
level in 4.x.
I would be very interested in the low overhead support - there is a lot to gain
especially because the profiling resuls are less dependent on setup and can be
better reused. I know part of code was contributed (the support for reading not
100% valid profiles). Is there any extra info available on this?
Main problem IMO is how to get profile into WHOPR without having function bodies.
I guess we will end up with summarizing the info in WHOR firendly way and
letting it to stream the other counters to LTRANS that will annotate the function
body once read in from the file.
> >> 2) comdat function resolution -- since LIPO uses aux module functions
> >> for inlining purpose only, it has the freedom to choose which copy to
> >> use. The current scheme chooses copy in current module with priority
> >> for better profile data context sensitivity (see below)
> > This is interesting. ?How do you solve the problem when given comdat function
> > "loose"? I.e. it is replaced at linktime by other function that may or may
> > not be profiled from other unit?
> Whatever function that is selected will have profile data (assuming it
> called at runtime) -- but the profile data are merged from different
> contexts including from calls in different modules. For instance,
> both a.C and b.C define foo. and b.C:foo is selected at runtime, and
> a.C:foo is not inlined (after instrumentation) anywhere in a.C, then
> a.C:foo won't have any profile data, and b.C:foo has merged profile
> data resulting from calls in both a.C and b.C.
Yes, but this is what I am concerned about. Without LTO at least when
compiling a.C with profile feedback we will have foo with 0 counts.
We might however work out that calls of foo are frequent and decide to
inline foo. We will take the counts and rescale resulting in inlining
foo optimized for size.
When comdats are resolved within LTO, this will not be deal, but LTO
still produce comdats that are later resolved with library etc., so we don't
solve the problem this way.
At very least we should be able to figure out that we are having function
that has no profile and do something more sane.
Do you have any idea how common these scenarios are?