cfg merge part 23 - simple loop analysis code

Thu Jun 20 10:42:00 GMT 2002

> > The code is used by unrolling/peeling to do runtime preconditioning and
> > full peeling epsecially.
> > I am not quite sure whether we can do it for 3.2.  What is missing is
> > about 800 lines of code to duplicate loops, remove edges and update
> > dominance and 600 lines of the unroller itself.
> 
> Hum.  It's a quandry.  On the one hand, it sounds like your
> code isn't quite ready for 3.2.

I am not at completely sure about the unroller. 2 days deadline looks
crazy, on the other hand I think it is quite ready for about few weeks.
You may take a look at the code.  There simple loop analysis are by far
most crazy part of the unroller.  All other parts are already handled by
other modules, so the loop unrolling itself is very simple piece of code
(most of the size comes from decision heuristics).  I don't expect much
problems with that.

Loop duplication code is somewhat tricky as it handles updating of the
loop tree even when duplicating non-innermost loops, but there has not
been major problems in it for a while and the duplication framework is
already well excersised by tracer/bb reorder (on the branch).
> 
> Someone who writes such paradoxical loops gets what they deserve.
> So long as we don't have the losing behaviour on the normal case,
> I think this is acceptable.

Perhaps we can add a switch "nonparadoxical loops".  The cost for normal
case exists, but is is not terribly high.  There are all kind of
weirdness one needs to care, like overflows/underflows of iteration
counters etc.

The current cost is at the moment one extra peeling, when no peeling
would suffice - this is not major problem - when the loop body is small
enought of unrolling and scheduling first iteration of loop may be even
win.  Other problem is disabling iteration counting of loop writen with
== test, instead < or >.
> 
> We do have a problem though.  It sounds like you're almost, but 
> not quite ready to merge this in, and yet the cuttoff date is in
> two days.

Yes, I am aware of that.

We have two possibilities:

1) ignore this stuff for now
2) merge in at least the analysis, they are most tricky and valuable per
se, at least Andreas has benchmarked 3% speedup on bootstrap
3) try merge both of them as the unroller can be disabled by single
switch anyway.  We are not breaking the old one, they can live together.
> 
> Question: does your loop unrolling code fix the regression we have
> from 3.0 to 3.1 in e.g. opt/6405?
I am investigating...
> 
> How about unsharing the loop header for 
>   http://gcc.gnu.org/ml/gcc-patches/2002-06/msg01392.html

Unsharing loop headers is now done automatically by loop discovery code,
so simple loop discovery before GCSE may do the trick.
When GCSE will create irreducible CFG, we will lose (there is no code to
handle them). In the testcase this does not happen, all that happen is
confusing of LOOP notes, that is not problem for us.
> 
> If the answer for both of these is positive (if with a bit extra
> code), I think we ought to consider merging this anyway.
> 
> Mark, what do you think?

Honza
> 
> 
> r~