This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Inliner heuristics updated


On 5/25/05, Jan Hubicka <jh@suse.cz> wrote:
> > On 5/25/05, Jan Hubicka <jh@suse.cz> wrote:
> > > > On 5/25/05, Jan Hubicka <jh@suse.cz> wrote:
> > > > > Hi,
> > > > > I am about to commit the patch in the attached form (with testcase using
> > > > > Janis' new tree-prof scripts, thanks!).
> > > > > I did some extra testing on SPEC, Gerald testcase and tramp3d.  For SPEC
> > > > > the new inliner always brings slightly better results at slightly
> > > > > smaller compile time, important differences are only in the profile
> > > > > driven runs I sent last time.
> > > >
> > > > Note that for testcases like tramp3d with loads of small functions being
> > > > inlined anyway, profiling now is several orders of magitude slower because
> > > > even very small functions are instrumented.  This results in 90% of the
> > > > generated assembly being long long increments of (redundant) counters.
> > > >
> > > > I know you are aware of this problem, I just want to remind you that a fix
> > > > for this (running some inlining before instrumentation) is necessary before 4.1.
> > >
> > > Well, I must admit that I don't consider it a must for 4.1 (adding one
> > > counter per function call in source file seems resonable), but I do have
> > > patch for this (it basically makes local inlining pass inlining all
> > > functions that are small enought to be inlined unconditoinally) and I am
> > > just going to rescuesce it and re-benchmark.  Last time I tried it it
> > > helped your testcase and also reduced memory usage for your testcasee as
> > > well as Gerald's testcase as we didn't performed that much of inlining.
> > >
> > > On the other hand it slowed down bootstrap a bit as inliner was run
> > > twice over many functions and inliner is still linear in the size of
> > > source function.  This might've changed as inliner is much cheaper now
> > > than it used to be on tree-profiling previously (not sure in what
> > > direction thought), but the linearity in size of function being inlined
> > > into is quite unnecesary.  If we had way to point into call statement
> > > from cgraph node and start inlining without walking the instruction
> > > chain, we would be happy (the time complexity would depend on size of
> > > code actually inlined).
> >
> > Would it be possible to enable this first inlining pass only for the
> > instrumented compile?  Or would this screw the profile using compile?
> > Maybe we can condition it on both, -fprofile-generate and -fprofile-use.
> 
> We can enable it for -fprofile-generate/-fprofile-use only I guess.  It
> would make it more dificult to track down differences between profiled
> and unprofiled code but that is not too critical I guess...

Yes.  In principle two stages of inlining could even make a better
code-size estimate if we're doing some simple optimization between
them.  So adding an extra switch for this early-inlining-of-small-functions
which would be automatically enabled for profile-based optimization
could be useful in general, especially at -O3 to get better size estimates.
Definitely I'd want to do some experiments if you can dig out the code
to do multiple inlining passes.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]