This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression


------- Additional Comments From hubicka at ucw dot cz  2004-12-07 14:52 -------
Subject: Re:  [4.0 Regression] Inlining limits cause 340% performance regression

> 
> ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  2004-12-07 14:35 -------
> Subject: Re:  [4.0 Regression] Inlining limits
>  cause 340% performance regression
> 
> On 6 Dec 2004, hubicka at ucw dot cz wrote:
> 
> > Looks like I get 4fold speedup on tree profiling with profiling compared
> > to tree profiling on mainline that is equivalent to speedup you are
> > seeing for leafify patch. That sounds pretty prommising (so the new
> > heuristics can get the leafify idea without the hint from user and
> > hitting the code growth problems).
> 
> Yes, it seems so.  Really nice improvement.  Though profiling is
> sloooooow.  I guess you avoid doing any CFG changing transformation
> for the profiling stage?  I.e. not even inline the simplest functions?
> That would be the reason the Intel compiler is unusable with profiling
> for me.  -fprofile-generate comes with a 50fold increase in runtime!

Also it might be possible to change
  NEXT_PASS (pass_tree_profile);
  NEXT_PASS (pass_cleanup_cfg);
into
  NEXT_PASS (pass_cleanup_cfg);
  NEXT_PASS (pass_tree_profile);
  NEXT_PASS (pass_cleanup_cfg);
in tree-optimize.c to get cfg cleaned up.  In theory it should not have
much of effect since profiling code is already smart enought to not
instrument edges that are redundant control flow wise, but perhaps it is
not doing it all the time.  The cleanup is prevented there to avod
problems with inexact coverage info, but it is not unthinkable to extend
cfgcleanup to be coverage info safe or execute it when
-fprofile-generate is used without -ftext-coverage if it makes any
difference.

Honza
> 
> > It would be nice to experiment with this a little - in general the
> > heuristics can be viewed as having three players.  There are the limits
> > (specified via --param) that it must obey, there is the cost model
> > (estimated growth for inlining into all callees without profiling and
> > the execute_count to estimated growth for inlining to one call with
> > profiling) and the bin packing algorithm optimizing the gains while
> > obeying the limits.
> >
> > With profiling in the cost model is pretty much realistic and it would
> > be nice to figure out how the performance behave when the individual
> > limits are changed and why.  If you have some time for experimentation,
> > it would be very usefull.  I am trying to do the same with SPEC and GCC
> > but I have dificulty to play with pooma or Gerald's application as I
> > have little understanding what is going there.  I will try it myself
> > next but any feedback can be very usefull here.
> 
> I can produce some numbers for the tramp testcase.
> 
> > My plan is to try undersand the limits first and then try to get the
> > cost model better without profiling as it is bit too clumpsy to do both
> > at once.
> 
> Do you have some written overview of the cost model?
> 
> Richard.
> 
> --
> Richard Guenther <richard dot guenther at uni-tuebingen dot de>
> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
> 
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]