This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [tree-profiling-branch PATCH] Function cloning + IPCP extension (RESUBMISSION)
- From: Razya Ladelsky <RAZYA at il dot ibm dot com>
- To: Steven Bosscher <stevenb at suse dot de>
- Cc: Ayal Zaks <ZAKS at il dot ibm dot com>, gcc-patches at gcc dot gnu dot org, hubicka at ucw dot cz, Mircea Namolaru <NAMOLARU at il dot ibm dot com>
- Date: Thu, 14 Oct 2004 17:58:11 +0200
- Subject: Re: [tree-profiling-branch PATCH] Function cloning + IPCP extension (RESUBMISSION)
Steven Bosscher <stevenb@suse.de> wrote on 11/10/2004 18:03:51:
> On Monday 11 October 2004 16:26, Razya Ladelsky wrote:
> > the existing infrastructure. This should prove to be very useful, and
> > easily portable
> > to low-level Gimple/CFG/Tree-SSA etc. This already raised critical
issues
> > such as
> > "how to represent the outcome of IPA?" - for CP we can insert a
statement
> > to indicate
> > that a parameter is known to hold a constant value, but for other
analysis
> > (value prop,
> > alias analysis) this does not work.
>
> I do agree it is useful, but I doubt it will be so easy to port to
> low level GIMPLE. But we'll see, perhaps it will indeed not be that
> hard.
>
>
> > > To quote myself from "
http://gcc.gnu.org/projects/tree-profiling.html":
> > >
> > > ==============================================================
> > > (...) plans exist for reorganisation of the compiler passes as
follows:
> > >
> > > 1. For each SCC in the call graph in reverse topological order
> > > do simple intraprocedural optimizations in SSA form
> > > (basically, run anything that is cheap and can not
> > > create overlapping live ranges, but will improve the
> > > results of the next step);
> > > collect intraprocedural information necessary for
> > > interprocedural data flow analyses;
> > > store the function's intermediate representation somewhere (*)
> >
> > Are you convinced that tree-ssa/flow-sensitive/global intraprocedural
is
> > necessary?
> > Most of the benefit we know of can be obtained using scalable
> > flow-insensitive analysis.
>
> My reasoning is: If we're going to have to to traverse each function
> to gather function-local information, then why not clean it up first?
>
We are not sure how much there is to clean up at such an early stage,
and does not seem to justify transforming to tree-ssa.
>
> > Most of the benefit we know of can be obtained using scalable
> > flow-insensitive analysis.
>
> It would still be flow-insensitive in the above scheme (except maybe
> that you get some flow-sensitive stuff for free if we can collect
> local information on the function in SSA form).
>
Again, it's a cost-performance issue; Tree-ssa can improve performance(and
simplify the intraprocedural stage) but seems too costly for whole
program.
>
> > Your scheme implies that we hold all functions in Tree-ssa form,
right?
>
> I hope so, but I don't know if that is necessary or benificial (or cost
> effective, if you like). I don't know how expensive it would be to take
> each function out of SSA after step 1, and rewriting it a second time
> for step 3 (ie. post-IPA). We should definitely keep the function body
> in low-GIMPLE, i.e. even if we'd take the function out of SSA form, you
> wouldn't run TER during out-of-ssa.
>
>
> > > 2. Perform interprocedural analyses and optimizations (at least
> > > constant propagation, side-effects analysis, and function cloning
> > > and specialization).
> >
> > Where does the inliner kick in?
>
> I believe cloning and inliner should be the last IPO passes to run,
> cloning before inlining, or simultaneously (dunno what works, or if
> it makes a difference). You definitely want to be able to inline
> a clone, of course.
>
>
We agree, Inlining does not affect IPA, but :
> > IPA can open-up new
> > opportunities/priorities
> > for inlining
>
> That is what Zadeck said too. I was surprised by that a bit, I had
> expected the only new inlining opportunities to come from cloning
> after IPCP, and simplifying the clone. That should be pretty easy
> to keep track of.
>
> Perhaps we need something completely different. You have more
> experience with this than I do - so what do you think the framework
> for IPA should look like?
>
Jan's framework seems reasonable to us for whole program:
- Gathering intraprocedural properties (on low gimple).
- Performing IPA (recording its decisions/results).
Making inlining decisions.
- Trasformation function by function, starting with materializing the
clones and inlining.
>
> > (it already does so now!).
>
> How often? Do you have an example?
> You still haven't shown any numbers :-(
>
> > IPA (and the inliner) need access to all functions together; the
important
> > point to decide is the representation in which all functions will be
held
> > together.
>
> IPA would only need all functions together if you indeed would not just
> have a phase before IPA to collect global data. If you have a pre-pass
> that collects all global data, IPA would work with just that data, and
> you would not need to have all functions together. This does impose a
> few limitations, but it also removes the requirement that you have to be
> able to hold all functions together. You can just dump them somewhere
> and do IPA based only on the call graph and information you've gathered
> locally in each function before dumping it.
>
Agreed.
> At least, that was the idea.
>
> If you're going to run some kind of iterating IPA like Honza mentioned,
> then you of course do need all functions readily available. But would
> that be possible in general for *very* large applications, and what
> does it buy us compared to the scheme I schetched? For example, does
> IPA make new inlining opportunities available often enough to justify
> keeping all functions around at all times?
>
> To me, it appears to be a similar problem as that of classic GCSE: You
> can always do another iteration to find more redundancies, but in most
> cases the costs outweight the benefit. For IPA, I would *hope* that a
> scheme where you collect local data once and do IPA once would catch
> the vast majority of IPO opportunities. If that is indeed the case,
> then those last few inlining opportunities and that extra bit of side
> effects info that you get from some kind of iterative IPA is maybe not
> worth the trouble.
> But since both Zadeck and you argue for keeping all functions around, I
> guess it *is* worth the trouble from your points of view.
>
We are not suggesting to have iterative IPA, only to do IPA before
inlining.
> Zadeck mentioned that he could have several tens of thousands of Java
> classes in memory and do full inter-class inlining. That's mighty
> impressive, but not something GCC would be capable of anytime soon,
> unless we manage to radically change our data structures to be far more
> efficient.
>
> Oh well... I guess I really do not have enough experience with IPA in
> production compilers to make any sensible comments on all this. Hope
> you do ;-)
> If you have numbers or some references to papers on this subject, that
> would be much appreciated.
>
We'll send you pointers to relevant papers shortly.
> > AFAWCS, porting our IP/CP from trees to lowered Gimple + CFG seems
pretty
> > straightforward because it's flow insensitive (and we'll definitely do
> > that as soon as it's available for us!).
>
> Great.
>
>
> > Cloning function CFGs should be done very
> > similarly to the way they are inlined; in-fact, maybe the inliner
should
> > first clone the callee, and then tailor it into the caller.
>
> Agreed. I've always considered inlining to be just a special case of
> cloning, conceptually.
>
> Gr.
> Steven
>
>
>