This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Loop optimizer issues
- From: Pop Sébastian <pop at gauvain dot u-strasbg dot fr>
- To: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- Cc: gcc at gcc dot gnu dot org, rth at redhat dot com, jason at redhat dot com, dnovillo at redhat dot com,jh at suse dot cz
- Date: Sat, 31 May 2003 20:25:43 +0200
- Subject: Re: Loop optimizer issues
- References: <20030530183552.GA27110@atrey.karlin.mff.cuni.cz>
>
> The process of merging of the loop optimizer work I am currently doing
> on rtlopt branch. I am quite sure that I won't be able to get it to
> reasonable state for 3.4. I would however like to reuse some parts of
> this work on tree-ssa. Would it be feasible to merge this work from
> rtlopt branch to tree-ssa branch once it is in sufficiently stabilized
> state (in about a month or so)?
>
I'm also voting for this.
> What parts do we want to do on ast level and what on rtl level?
That's a difficult question, I think that the experimentation of various
combinations of the optimizers will answer the question. But for that
we'll have to write the optimizers to work at both the tree and the rtl
levels.
> The optimizations I would like to see on rtl level:
>
> doloop optimization -- clearly machine specific and hard to express
> on ast level
> unrolling (at least partially) -- to do it efficiently we should take
> scheduling into account, which we cannot until late rtl stages
Maybe we'll need this optimization at the tree level, but a drawback is
that it will increase not only generated code size, but also the
translation to rtl time...
> (maybe) prefetching -- it is machine specific, but it should not
> be hard to do it in later stages of ast, where it could still benefit
> from a more accurate analyses
> (maybe) part of induction variable optimizations that require knowledge
> of addressing modes; again could also be done in late ast stages
> loop invariant motion -- it must take register pressure into account
>
IV detection is a prerequisite to almost all loop optimizations, since it
enables us to count the number of loop iterations (this is on what I was
stopped in the loop unrolling adaptation for tree level). The IV analysis
will be needed on both tree and rtl levels.
If we place the IV detector after the SSA stuff, we get the invariant motion
done by the SSAPRE. The only optimization we should worry about
is the removal of scalar inter-iteration dependences created by recursive
definitions of secondary IVs. This will decrease the register pressure,
and make some loops suitable for high level loop optimizations.
> Most of the above mentioned optimizations that are suitable for rtl
> level would benefit from results of analyses done on ast level; we
> should also consider how to pass them there (perhaps some annotations on
> appropriate registers? It would be quite hard to keep them up-to-date).
>
Open64 keeps track of such informations by making the translators know
about the information they should keep intact. Maybe there is a lot to
do on the translation to rtl side, now that things have stabilized on the
tree interface side...
I also had the impression that for translating an expression to rtl, the
translator walks twice over it. I didn't investigated enough, but I think
that there is room for compile time improvements in the translator.
> And finally, who will do it? I of course volunteer for anything of the
> above and later implementation of additional optimizations, but on gcc
> summit I had a feeling that other people also have related plans. There
> is a lot of stuff to do, but we need to coordinate it somehow so that
> one thing is not unnecessarily done twice.
>
I'll contribute a tree level IV detector, as well as the monotonic evolution
framework. The problem on the MonEv pass is that we need the loop's count
and thus we need the IV detector be done before. Thus MonEv is suboptimal
in terms of compiling time, unless we find a way to get the loop's count
right without the help of the IV detector.