This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: cgraph based inlining heuristics
> On Wed, 2 Jul 2003, Jan Hubicka wrote:
>
> > > Ok, here they go. I tested four different setups:
> > >
> > > (0) gcc3.4
> > > (1) gcc3.4 with __attribute__((leafify)) patch
> > > (2) gcc3.4 with your patch
> > > (3) gcc3.4 with your patch and -funit-at-a-time
> > > (4) gcc3.4 with your patch and -funit-at-a-time --param max-inline-insns-auto=200 --param max-inline-insns-single=200 --param inline-unit-growth=1000 --param large-function-growth=1000
> > >
> > > flags otherwise used are
> > > -O2 -g -march=athlon -fno-math-errno -fno-trapping-math -ffinite-math-only
> > > -funroll-loops
> > >
> > > (0) (1) (2) (3) (4)
> > > binary size 10166017 10681144 8405237 8405237 9874056
> > > compile time 2m57.503s 3m40.638s 1m19.553s 1m20.742s 2m55.031s
> > > runtime performance 3.97s 1.66s 2.65s 2.64s 1.74s
> >
> > You may also try -fno-unit-at-a-time flag. They you will get old
> > heuristics with new code size estimates, so you can see how much of
> > benefits comes from each.
>
> (5) gcc3.4 with your patch and -fno-unit-at-a-time
> (6) gcc3.4 with -O3 and -funit-at-a-time
>
> (5) (6)
> binary size 10337759 8566972
> compile time 1m19.553s 1m20.691s
> runtime performance 6.41s 2.64s
>
> so it seems callgraph based inlining cuts it, not the new code size
> estimate? Can I use old code size estimates with new heuristics somehow?
I can send you the patch, but it really does not work. The callgraph
inlining is much more sensitive about the output of code estimate. Your
results seems to be consistent with what I saw on Gerald testcase - old
inlining heuristics does not benefit from new code estimate in many
cases. What is new that you get noticeable slowdown with new counting.
I got it on some testcases too, but never so high, so I didn't worry
much. I guess it is because the parameters of old inlining heuristics
are set too high (especially --param max-inline-insns).
It now defaults to 300, but it used to default to 600. What used to be
300 in the old counting is approximately 90 in the new counting, so
rough estimate is that --param max-inline-insns=200
If that does not work, it may be interesting to try trottle down the
inlining limits more. I use 100 that appears to be slightly more than
with old setting on the average. Perhaps --param max-inline-auto=80
--param max-inline-single=80 should get about the original amount of
inlining.
Honza
>
> It seems for my tests the only performance critical part is inlining into
> the loops and then the loop optimizer. So I just tested
>
> (7) gcc3.4 with your patch and -O3 -funit-at-a-time -fold-unroll-loops
>
> this gives 8564708 1m21.861s 2.68s which is a small improvement for the
> new loop optimizer.
>
> Richard.
>