This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: LTO inliner -- sensitivity to increasing register pressure
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Xinliang David Li <davidxl at google dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, Aaron Sawdey <acsawdey at linux dot vnet dot ibm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 18 Apr 2014 23:16:31 +0200
- Subject: Re: LTO inliner -- sensitivity to increasing register pressure
- Authentication-results: sourceware.org; auth=none
- References: <53515624 dot 5060509 at linux dot vnet dot ibm dot com> <CAAkRFZJ4S+tT2+DtYzptvyb8BOOAatFj-dJzABd8UiVtn2KUug at mail dot gmail dot com> <1397847967 dot 19063 dot 10 dot camel at ragesh3 dot rchland dot ibm dot com> <20140418192744 dot GC10795 at kam dot mff dot cuni dot cz> <CAAkRFZ+cf1ju=+cY9Z9Nd2cDRvMx7K0pC8zuU4h5-OkXby2k5w at mail dot gmail dot com>
> On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> What I've observed on power is that LTO alone reduces performance and
> >> LTO+FDO is not significantly different than FDO alone.
> > On SPEC2k6?
> >
> > This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO seems
> > off-noise win on SPEC2k6
> > http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html
> >
> > I do not see why PPC should be significantly more constrained by register
> > pressure.
> >
> > I do not have head to head comparsion of FDO and FDO+LTO for SPEC
> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
> > shows noticeable drop in calculix and gamess.
> > Martin profiled calculix and tracked it down to a loop that is not trained
> > but hot in the reference run. That makes it optimized for size.
> >
> > http://dromaeo.com/?id=219677,219672,219965,219877
> > compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
> > Here the benefits of LTO and FDO seems to add up nicely.
> >>
> >> I agree that an exact estimate of the register pressure would be a
> >> difficult problem. I'm hoping that something that approximates potential
> >> register pressure downstream will be sufficient to help inlining
> >> decisions.
> >
> > Yep, register pressure and I-cache overhead estimates are used for inline
> > decisions by some compilers.
> >
> > I am mostly concerned about the metric suffering from GIGO principe if we mix
> > together too many estimates that are somehwat wrong by their nature. This is
> > why I mostly tried to focus on size/time estimates and not add too many other
> > metrics. But perhaps it is a time to experiment wit these, since obviously we
> > pushed current infrastructure to mostly to its limits.
> >
>
> I like the word GIGO here. Getting inline signals right requires deep
> analysis (including interprocedural analysis). Different signals/hints
> may also come with different quality thus different weights.
>
> Another challenge is how to quantify cycle savings/overhead more
> precisely. With that, we can abandon the threshold based scheme -- any
> callsite with a net saving will be considered.
Inline hints are intended to do this - at the moment we bump the limits up
when we estimate big speedups for the inlining and with today patch and FDO
we bypass the thresholds when we know from FDO that call matters.
Concerning your other email, indeed we should consider heavy callees (in Open64
terminology) that consume a lot of time and do not skip the call sites. Easy
way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply
multiplies the count and estimated time. (We probably gouth to get rid of the
time capping and use wider arithmetics too).
I wonder if that is not too local and if we should not try to estimate cumulative time
of the function and get more agressive on inlining over the whole path leading
to hot code.
Honza