This is the mail archive of the
mailing list for the GCC project.
Re: Speedup CSE by 5%
On Mon, 2005-01-17 at 16:59 +0100, Arend Bayer wrote:
> On Mon, 17 Jan 2005, Jeffrey A Law wrote:
> > > > > This patch integrates approx_reg_cost() and approx_reg_cost_1() into one
> > > > > function by not using for_each_rtx(): The overhead of the additional
> > > > > function calls and some additional branches of the for_each_rtx()
> > > > > construction turn out to be significant performance-wise. I don't think
> > > > > the resulting code is less clear.
> > > >
> > > > Why is this not optimized by gcc itself? Does marking approx_reg_cost_1
> > > > inline help?
> > >
> > > Apart from the fact that this would need intermodule optimization, the
> > > problem is:
> > > GCC would first need to inline for_each_rtx, a recursive function, into
> > > approx_reg_cost, and change the recursive calls to for_each_rtx into
> > > recursive calls to approx_reg_cost. I would be highly surprised if you
> > > told me that GCC is able to do that.
> > Presumably the real gain here is the inlining of for_each_rtx, not
> > the inlining of approx_reg_cost_1 into approx_reg_cost. Right?
> If by "inlining for_each_rtx" you include the constant propagation that avoids
> the indirect function call to approx_reg_cost_1, then probably yes.
Well, if avoiding the indirection is a significant component (or the
major component), then another possibility we could look into would be
turning for_each_rtx into an inline function itself.
That wouldn't get all the specializations you're doing and could
potentially have bad icache effects. But it would have the positive
effect that everyone using for_each_rtx would avoid the indirect call