This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: LTO inliner -- sensitivity to increasing register pressure


> On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> What I've observed on power is that LTO alone reduces performance and
> >> LTO+FDO is not significantly different than FDO alone.
> > On SPEC2k6?
> >
> > This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO seems
> > off-noise win on SPEC2k6
> > http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html
> >
> > I do not see why PPC should be significantly more constrained by register
> > pressure.
> >
> > I do not have head to head comparsion of FDO and FDO+LTO for SPEC
> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
> > shows noticeable drop in calculix and gamess.
> > Martin profiled calculix and tracked it down to a loop that is not trained
> > but hot in the reference run.  That makes it optimized for size.
> >
> > http://dromaeo.com/?id=219677,219672,219965,219877
> > compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
> > Here the benefits of LTO and FDO seems to add up nicely.
> >>
> >> I agree that an exact estimate of the register pressure would be a
> >> difficult problem. I'm hoping that something that approximates potential
> >> register pressure downstream will be sufficient to help inlining
> >> decisions.
> >
> > Yep, register pressure and I-cache overhead estimates are used for inline
> > decisions by some compilers.
> >
> > I am mostly concerned about the metric suffering from GIGO principe if we mix
> > together too many estimates that are somehwat wrong by their nature. This is
> > why I mostly tried to focus on size/time estimates and not add too many other
> > metrics. But perhaps it is a time to experiment wit these, since obviously we
> > pushed current infrastructure to mostly to its limits.
> >
> 
> I like the word GIGO here. Getting inline signals right  requires deep
> analysis (including interprocedural analysis). Different signals/hints
> may also come with different quality thus different weights.
> 
> Another challenge is how to quantify cycle savings/overhead more
> precisely. With that, we can abandon the threshold based scheme -- any
> callsite with a net saving will be considered.

Inline hints are intended to do this - at the moment we bump the limits up
when we estimate big speedups for the inlining and with today patch and FDO
we bypass the thresholds when we know from FDO that call matters.

Concerning your other email, indeed we should consider heavy callees (in Open64
terminology) that consume a lot of time and do not skip the call sites.  Easy
way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply
multiplies the count and estimated time.  (We probably gouth to get rid of the
time capping and use wider arithmetics too).

I wonder if that is not too local and if we should not try to estimate cumulative time
of the function and get more agressive on inlining over the whole path leading
to hot code.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]