This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: cgraph based inlining heuristics


> On Wed, 2 Jul 2003, Jan Hubicka wrote:
> 
> > > Ok, here they go. I tested four different setups:
> > >
> > > (0) gcc3.4
> > > (1) gcc3.4 with __attribute__((leafify)) patch
> > > (2) gcc3.4 with your patch
> > > (3) gcc3.4 with your patch and -funit-at-a-time
> > > (4) gcc3.4 with your patch and -funit-at-a-time --param max-inline-insns-auto=200 --param max-inline-insns-single=200 --param inline-unit-growth=1000 --param large-function-growth=1000
> > >
> > > flags otherwise used are
> > > -O2 -g -march=athlon -fno-math-errno -fno-trapping-math -ffinite-math-only
> > > -funroll-loops
> > >
> > >                       (0)         (1)        (2)       (3)       (4)
> > > binary size        10166017    10681144    8405237   8405237   9874056
> > > compile time      2m57.503s   3m40.638s  1m19.553s 1m20.742s 2m55.031s
> > > runtime performance   3.97s       1.66s      2.65s     2.64s     1.74s
> >
> > You may also try -fno-unit-at-a-time flag.  They you will get old
> > heuristics with new code size estimates, so you can see how much of
> > benefits comes from each.
> 
> (5) gcc3.4 with your patch and -fno-unit-at-a-time
> (6) gcc3.4 with -O3 and -funit-at-a-time
> 
>                          (5)           (6)
> binary size           10337759       8566972
> compile time         1m19.553s     1m20.691s
> runtime performance      6.41s         2.64s
> 
> so it seems callgraph based inlining cuts it, not the new code size
> estimate? Can I use old code size estimates with new heuristics somehow?

I can send you the patch, but it really does not work.  The callgraph
inlining is much more sensitive about the output of code estimate.  Your
results seems to be consistent with what I saw on Gerald testcase - old
inlining heuristics does not benefit from new code estimate in many
cases. What is new that you get noticeable slowdown with new counting.
I got it on some testcases too, but never so high, so I didn't worry
much.  I guess it is because the parameters of old inlining heuristics
are set too high (especially --param max-inline-insns).

It now defaults to 300, but it used to default to 600.  What used to be
300 in the old counting is approximately 90 in the new counting, so
rough estimate is that --param max-inline-insns=200

If that does not work, it may be interesting to try trottle down the
inlining limits more.  I use 100 that appears to be slightly more than
with old setting on the average.  Perhaps --param max-inline-auto=80
--param max-inline-single=80 should get about the original amount of
inlining.

Honza
> 
> It seems for my tests the only performance critical part is inlining into
> the loops and then the loop optimizer. So I just tested
> 
> (7) gcc3.4 with your patch and -O3 -funit-at-a-time -fold-unroll-loops
> 
> this gives  8564708 1m21.861s 2.68s which is a small improvement for the
> new loop optimizer.
> 
> Richard.
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]