This is the mail archive of the
mailing list for the GCC project.
Re: LTO inliner -- sensitivity to increasing register pressure
- From: Xinliang David Li <davidxl at google dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Aaron Sawdey <acsawdey at linux dot vnet dot ibm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 18 Apr 2014 14:32:17 -0700
- Subject: Re: LTO inliner -- sensitivity to increasing register pressure
- Authentication-results: sourceware.org; auth=none
- References: <53515624 dot 5060509 at linux dot vnet dot ibm dot com> <CAAkRFZJ4S+tT2+DtYzptvyb8BOOAatFj-dJzABd8UiVtn2KUug at mail dot gmail dot com> <1397847967 dot 19063 dot 10 dot camel at ragesh3 dot rchland dot ibm dot com> <20140418192744 dot GC10795 at kam dot mff dot cuni dot cz> <CAAkRFZ+cf1ju=+cY9Z9Nd2cDRvMx7K0pC8zuU4h5-OkXby2k5w at mail dot gmail dot com> <20140418211630 dot GA18840 at kam dot mff dot cuni dot cz>
On Fri, Apr 18, 2014 at 2:16 PM, Jan Hubicka <firstname.lastname@example.org> wrote:
>> On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka <email@example.com> wrote:
>> >> What I've observed on power is that LTO alone reduces performance and
>> >> LTO+FDO is not significantly different than FDO alone.
>> > On SPEC2k6?
>> > This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO seems
>> > off-noise win on SPEC2k6
>> > http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
>> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html
>> > I do not see why PPC should be significantly more constrained by register
>> > pressure.
>> > I do not have head to head comparsion of FDO and FDO+LTO for SPEC
>> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
>> > shows noticeable drop in calculix and gamess.
>> > Martin profiled calculix and tracked it down to a loop that is not trained
>> > but hot in the reference run. That makes it optimized for size.
>> > http://dromaeo.com/?id=219677,219672,219965,219877
>> > compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
>> > Here the benefits of LTO and FDO seems to add up nicely.
>> >> I agree that an exact estimate of the register pressure would be a
>> >> difficult problem. I'm hoping that something that approximates potential
>> >> register pressure downstream will be sufficient to help inlining
>> >> decisions.
>> > Yep, register pressure and I-cache overhead estimates are used for inline
>> > decisions by some compilers.
>> > I am mostly concerned about the metric suffering from GIGO principe if we mix
>> > together too many estimates that are somehwat wrong by their nature. This is
>> > why I mostly tried to focus on size/time estimates and not add too many other
>> > metrics. But perhaps it is a time to experiment wit these, since obviously we
>> > pushed current infrastructure to mostly to its limits.
>> I like the word GIGO here. Getting inline signals right requires deep
>> analysis (including interprocedural analysis). Different signals/hints
>> may also come with different quality thus different weights.
>> Another challenge is how to quantify cycle savings/overhead more
>> precisely. With that, we can abandon the threshold based scheme -- any
>> callsite with a net saving will be considered.
> Inline hints are intended to do this - at the moment we bump the limits up
> when we estimate big speedups for the inlining and with today patch and FDO
> we bypass the thresholds when we know from FDO that call matters.
> Concerning your other email, indeed we should consider heavy callees (in Open64
> terminology) that consume a lot of time and do not skip the call sites. Easy
> way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply
> multiplies the count and estimated time. (We probably gouth to get rid of the
> time capping and use wider arithmetics too).
That's what we did in Google branches. We had two heuristics -- hot
caller and hot callee heuristics.
1) For the hot caller heuristic, other simple analysis is checked a)
global working set size; b) callsite argument check -- very simple
check to guess if inlining this callsite would sharpen analysis
2) We had not tuned hot callee heuristic by doing more analysis --
simply turn in on using hotness does not make a noticable differences.
Other hints are needed.
> I wonder if that is not too local and if we should not try to estimate cumulative time
> of the function and get more agressive on inlining over the whole path leading
> to hot code.