This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: LTO inliner -- sensitivity to increasing register pressure

On Fri, Apr 18, 2014 at 2:16 PM, Jan Hubicka <> wrote:
>> On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka <> wrote:
>> >> What I've observed on power is that LTO alone reduces performance and
>> >> LTO+FDO is not significantly different than FDO alone.
>> > On SPEC2k6?
>> >
>> > This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO seems
>> > off-noise win on SPEC2k6
>> >
>> >
>> >
>> > I do not see why PPC should be significantly more constrained by register
>> > pressure.
>> >
>> > I do not have head to head comparsion of FDO and FDO+LTO for SPEC
>> >
>> > shows noticeable drop in calculix and gamess.
>> > Martin profiled calculix and tracked it down to a loop that is not trained
>> > but hot in the reference run.  That makes it optimized for size.
>> >
>> >,219672,219965,219877
>> > compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
>> > Here the benefits of LTO and FDO seems to add up nicely.
>> >>
>> >> I agree that an exact estimate of the register pressure would be a
>> >> difficult problem. I'm hoping that something that approximates potential
>> >> register pressure downstream will be sufficient to help inlining
>> >> decisions.
>> >
>> > Yep, register pressure and I-cache overhead estimates are used for inline
>> > decisions by some compilers.
>> >
>> > I am mostly concerned about the metric suffering from GIGO principe if we mix
>> > together too many estimates that are somehwat wrong by their nature. This is
>> > why I mostly tried to focus on size/time estimates and not add too many other
>> > metrics. But perhaps it is a time to experiment wit these, since obviously we
>> > pushed current infrastructure to mostly to its limits.
>> >
>> I like the word GIGO here. Getting inline signals right  requires deep
>> analysis (including interprocedural analysis). Different signals/hints
>> may also come with different quality thus different weights.
>> Another challenge is how to quantify cycle savings/overhead more
>> precisely. With that, we can abandon the threshold based scheme -- any
>> callsite with a net saving will be considered.
> Inline hints are intended to do this - at the moment we bump the limits up
> when we estimate big speedups for the inlining and with today patch and FDO
> we bypass the thresholds when we know from FDO that call matters.
> Concerning your other email, indeed we should consider heavy callees (in Open64
> terminology) that consume a lot of time and do not skip the call sites.  Easy
> way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply
> multiplies the count and estimated time.  (We probably gouth to get rid of the
> time capping and use wider arithmetics too).

That's what we did in Google branches. We had two heuristics -- hot
caller and hot callee heuristics.

1) For the hot caller heuristic, other simple analysis is checked a)
global working set size; b)  callsite argument check -- very simple
check to guess if inlining this callsite would sharpen analysis

2) We had not tuned hot callee heuristic by doing more analysis --
simply turn in on using hotness does not make a noticable differences.
Other hints are needed.


> I wonder if that is not too local and if we should not try to estimate cumulative time
> of the function and get more agressive on inlining over the whole path leading
> to hot code.
> Honza

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]