[PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Segher Boessenkool segher@kernel.crashing.org
Tue May 26 17:35:54 GMT 2020


Hi!

On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > Yes, cunroll does not have its own option, and that is a problem.  But
> > that is easy to fix!  Either with an option, or just with params (the
> > option wouldn't do more than set params anyway?)
> 
> Well, given coming up with different names for essentially the same
> transform is going to be challenging how about sth like
> 
> -funroll-loops={early,late,static,dynamic}[insert better names here]

User interface is hard :-)  I think luckily we don't need to change
anything there yet, just have an internal flag?

But complete unrolling is something quite different, so it should have
its own flag anyway (at least internally).

> note there's also -fpeel-loops which may match the transform
> done on GIMPLE better?

Peeling and unrolling are the same thing, if doing complete unrolling
(or complete peeling), followed by DCE in both cases.  Peeling is a
nicer name here I think, yeah.

> I'm not sure which are the technically
> correct terms for unrollings that elide the loop (the backedge).

I don't know a better term than "complete", I don't remember ever seeing
something else either.

> We're doing such kind of unrolling even if we cannot statically
> decide which of a set of possible exits we take (and internally
> call that peeling, if we can statically decide we call it complete
> unrolling).

"Peeling" is placing some copies of the loop before the loop;
"unrolling" is placing a few copies of the loop inside the loop body.
Does that match usage here?

> The RTL side OTOH only performs classical unrolling,
> preserving the backedge with various strategies for the
> remaining iterations.

And if you do complete unrolling that way, the backedge can be removed,
since it can be shown never to be taken.

> As said, for the regression on the 10 branch with ppc I'd add
> [a hidden] flag that controls the RTL unroller, also set by
> -funroll-loops and triggered by the ppc specific heuristics.

But the problem is in cunroll?  This is so backwards...  Because some
other transform abuses the unroller flags, adding a second level flag
with the same meaning :-(  It will work for fixing the regression,
sure, and it is slightly less code as well.


Segher


More information about the Gcc-patches mailing list