This is the mail archive of the
mailing list for the GCC project.
Re: Inliner heuristics updated
On 5/25/05, Jan Hubicka <email@example.com> wrote:
> > On 5/25/05, Jan Hubicka <firstname.lastname@example.org> wrote:
> > > > On 5/25/05, Jan Hubicka <email@example.com> wrote:
> > > > > Hi,
> > > > > I am about to commit the patch in the attached form (with testcase using
> > > > > Janis' new tree-prof scripts, thanks!).
> > > > > I did some extra testing on SPEC, Gerald testcase and tramp3d. For SPEC
> > > > > the new inliner always brings slightly better results at slightly
> > > > > smaller compile time, important differences are only in the profile
> > > > > driven runs I sent last time.
> > > >
> > > > Note that for testcases like tramp3d with loads of small functions being
> > > > inlined anyway, profiling now is several orders of magitude slower because
> > > > even very small functions are instrumented. This results in 90% of the
> > > > generated assembly being long long increments of (redundant) counters.
> > > >
> > > > I know you are aware of this problem, I just want to remind you that a fix
> > > > for this (running some inlining before instrumentation) is necessary before 4.1.
> > >
> > > Well, I must admit that I don't consider it a must for 4.1 (adding one
> > > counter per function call in source file seems resonable), but I do have
> > > patch for this (it basically makes local inlining pass inlining all
> > > functions that are small enought to be inlined unconditoinally) and I am
> > > just going to rescuesce it and re-benchmark. Last time I tried it it
> > > helped your testcase and also reduced memory usage for your testcasee as
> > > well as Gerald's testcase as we didn't performed that much of inlining.
> > >
> > > On the other hand it slowed down bootstrap a bit as inliner was run
> > > twice over many functions and inliner is still linear in the size of
> > > source function. This might've changed as inliner is much cheaper now
> > > than it used to be on tree-profiling previously (not sure in what
> > > direction thought), but the linearity in size of function being inlined
> > > into is quite unnecesary. If we had way to point into call statement
> > > from cgraph node and start inlining without walking the instruction
> > > chain, we would be happy (the time complexity would depend on size of
> > > code actually inlined).
> > Would it be possible to enable this first inlining pass only for the
> > instrumented compile? Or would this screw the profile using compile?
> > Maybe we can condition it on both, -fprofile-generate and -fprofile-use.
> We can enable it for -fprofile-generate/-fprofile-use only I guess. It
> would make it more dificult to track down differences between profiled
> and unprofiled code but that is not too critical I guess...
Yes. In principle two stages of inlining could even make a better
code-size estimate if we're doing some simple optimization between
them. So adding an extra switch for this early-inlining-of-small-functions
which would be automatically enabled for profile-based optimization
could be useful in general, especially at -O3 to get better size estimates.
Definitely I'd want to do some experiments if you can dig out the code
to do multiple inlining passes.