This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Loop optimizer issues


> 
> The process of merging of the loop optimizer work I am currently doing
> on rtlopt branch. I am quite sure that I won't be able to get it to
> reasonable state for 3.4. I would however like to reuse some parts of
> this work on tree-ssa. Would it be feasible to merge this work from
> rtlopt branch to tree-ssa branch once it is in sufficiently stabilized
> state (in about a month or so)?
> 
I'm also voting for this.

> What parts do we want to do on ast level and what on rtl level?

That's a difficult question, I think that the experimentation of various 
combinations of the optimizers will answer the question.  But for that
we'll have to write the optimizers to work at both the tree and the rtl 
levels.  

> The optimizations I would like to see on rtl level:
> 
> doloop optimization -- clearly machine specific and hard to express
>   on ast level
> unrolling (at least partially) -- to do it efficiently we should take
>   scheduling into account, which we cannot until late rtl stages

Maybe we'll need this optimization at the tree level, but a drawback is 
that it will increase not only generated code size, but also the 
translation to rtl time...

> (maybe) prefetching -- it is machine specific, but it should not
>   be hard to do it in later stages of ast, where it could still benefit
>   from a more accurate analyses
> (maybe) part of induction variable optimizations that require knowledge
>   of addressing modes; again could also be done in late ast stages
> loop invariant motion -- it must take register pressure into account
> 
IV detection is a prerequisite to almost all loop optimizations, since it 
enables us to count the number of loop iterations (this is on what I was 
stopped in the loop unrolling adaptation for tree level).  The IV analysis
will be needed on both tree and rtl levels.  

If we place the IV detector after the SSA stuff, we get the invariant motion 
done by the SSAPRE.  The only optimization we should worry about 
is the removal of scalar inter-iteration dependences created by recursive 
definitions of secondary IVs.  This will decrease the register pressure,
and make some loops suitable for high level loop optimizations.  

> Most of the above mentioned optimizations that are suitable for rtl
> level would benefit from results of analyses done on ast level; we
> should also consider how to pass them there (perhaps some annotations on
> appropriate registers? It would be quite hard to keep them up-to-date).
> 

Open64 keeps track of such informations by making the translators know 
about the information they should keep intact.  Maybe there is a lot to
do on the translation to rtl side, now that things have stabilized on the 
tree interface side...

I also had the impression that for translating an expression to rtl, the 
translator walks twice over it.  I didn't investigated enough, but I think 
that there is room for compile time improvements in the translator.

> And finally, who will do it? I of course volunteer for anything of the
> above and later implementation of additional optimizations, but on gcc
> summit I had a feeling that other people also have related plans.  There
> is a lot of stuff to do, but we need to coordinate it somehow so that
> one thing is not unnecessarily done twice.
> 

I'll contribute a tree level IV detector, as well as the monotonic evolution
framework.  The problem on the MonEv pass is that we need the loop's count
and thus we need the IV detector be done before.  Thus MonEv is suboptimal 
in terms of compiling time, unless we find a way to get the loop's count
right without the help of the IV detector.  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]