Tweak loop peeling limits

Zdenek Dvorak rakdver@atrey.karlin.mff.cuni.cz
Sat Feb 21 08:35:00 GMT 2004


Hello,

> > If you have 20 insn loop, you are better of to peel it 8 times (so the
> > loop itself likely won't execute) rather than unroll it in a way that it
> > loops 8 times.
> 
> That would be an unroll factor of 1, which is nonsectical.  With an unroll
> factor of four, you have four loop body copies, and two unrolled iterations.
> 
> Peeling a loop which is know to execute eight times eight times is the
> same as completely unrolling it.
> 
> > Especially with iteration counters, the loops are often having expensive
> > statup times after unrolling.
> 
> I have to agree that our preconditioning sucks.  I think we would be better
> off just using a plain non-unrolled loop copy for the preconditioning.
> Or we might just turn things around and have the unrolled loop first,
> and mop up any iterations that were left with the simple loop.
> The branching logic gets much simpler, and we safe a lot of code which we
> could use to increase the unrolling factor, which is particularily
> beneficial for high-iteration count loops.

I have considered this some time ago (not for reasons you mention -- I
was more concerned with getting a better data alignment for array
accesses).  I did not get to it so far because handling of loops that
iterate "negative" number of times are slightly more complicated to
solve in this case, especially with the old simple loop analysis that
was not able to detect them.  This should not be a problem any more,
so I will try it (once I find some time).

On the other hand, the way it is done now may be faster in some cases --
the copies done from preconditioning form basically a straight line code
without any branches, which might be a win (especially if we used a more
effective code for the initial switch statement than we do now -- it
currently just cancels the possible benefits).

> With multiple loop optimizer
> passes, of course we would have to make sure not to unroll these
> preconditioning / mop up loops.
> 
> OTOH, if you peel a loop with unknown iteration count, you'll get
> branches and compares in every iteration, peeled or not.  I don't
> see what you win over the non-unrolled non-peeled loop then.

I don't know exactly why, but just peeling the loops showed up to
be almost as effective as unrolling them in some tests on x86_64.
I call it a magic :-).

Zdenek



More information about the Gcc-patches mailing list