[PATCH 1/4] unroll: Add middle-end unroll factor estimation

Richard Biener rguenther@suse.de
Tue Jan 26 10:53:23 GMT 2021


On Tue, 26 Jan 2021, Kewen.Lin wrote:

> Hi Segher/Richard B./Richard S.,
> 
> Many thanks for your all helps and comments on this!
> 
> on 2021/1/25 下午3:56, Richard Biener wrote:
> > On Fri, 22 Jan 2021, Segher Boessenkool wrote:
> > 
> >> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote:
> >>> On Thu, 21 Jan 2021, Segher Boessenkool wrote:
> >>>> What is holding up this patch still?  Ke Wen has pinged it every month
> >>>> since May, and there has still not been a review.
> >>
> >> Richard Sandiford wrote:
> >>> FAOD (since I'm on cc:), I don't feel qualified to review this.
> >>> Tree-level loop stuff isn't really my area.
> >>
> >> And Richard Biener wrote:
> >>> I don't like it, it feels wrong but I don't have a good suggestion
> >>> that had positive feedback.  Since a reviewer / approver is indirectly
> >>> responsible for at least the design I do not want to ack this patch.
> >>> Bin made forward progress on the other parts of the series but clearly
> >>> there's somebody missing with the appropriate privileges who feels
> >>> positive about the patch and its general direction.
> >>>
> >>> Sorry to be of no help here.
> >>
> >> How unfortunate :-(
> >>
> >> So, first off, this will then have to work for next stage 1 to make any
> >> progress.  Rats.
> >>
> >> But what could have been done differently that would have helped?  Of
> >> course Ke Wen could have written a better patch (aka one that is more
> >> acceptable); either of you could have made your current replies earlier,
> >> so that it is clear help needs to be sought elsewhere; and I could have
> >> pushed people earlier, too.  No one really did anything wrong, I'm not
> >> seeking who to blame, I'm just trying to find out how to prevent
> >> deadlocks like this in the future (where one party waits for replies
> >> that will never come).
> >>
> >> Is it just that we have a big gaping hole in reviewers with experience
> >> in such loop optimisations?
> > 
> > May be.  But what I think is the biggest problem is that we do not
> > have a good way to achieve what the patch tries (if you review the
> > communications you'll see many ideas tossed around) first and foremost
> > because IV selection is happening early on GIMPLE and unrolling
> > happens late on RTL.  Both need a quite accurate estimate of costs
> > but unrolling has an ever harder time than IV selection where we've
> > got along with throwing dummy RTL at costing functions.
> > 
> 
> Yeah, exactly.
> 
> > IMHO the patch is the wrong "start" to try fixing the issue and my
> > fear is that wiring this kind of "features" into the current
> > (fundamentally broken) state will make it much harder to rework
> > that state without introducing regressions on said features (I'm
> > there with trying to turn the vectorizer upside down - for three
> > years now, struggling to not regress any of the "features" we've
> > accumulated for various targets where most of them feel a
> > "bolted-on" rather than well-designed ;/).
> > 
> 
> OK, understandable.
> 
> > I think IV selection and unrolling (and scheduling FWIW) need to move
> > closer together.  I do not have a good idea how that can work out
> > though but I very much believe that this "most wanted" GIMPLE unroller
> > will not be a good way of progressing here.  Maybe taking the bullet
> > and moving IV selection back to RTL is the answer.
> > 
> 
> I haven't looked into loop-iv.c, but IVOPTS in gimple can leverage
> SCEV analysis for iv detection, if moving it to RTL, it could be
> very heavier to detect the full set there?
> 
> > For a "short term" solution I still think that trying to perform
> > unrolling and IV selection (for the D-form case you're targeting)
> > at the same time is a better design, even if it means complicating
> > the IV selection pass (and yeah, it'll still be at GIMPLE and w/o
> > any good idea about scheduling).  There are currently 20+ GIMPLE
> > optimization passes and 10+ RTL optimization passes between
> > IV selection and unrolling, the idea that you can have transform
> > decision and transform apply this far apart looks scary.
> > 
> 
> I have some questions in mind for this part, for "perform unrolling
> and IV selection at the same time", it can be interpreted to two
> different implementation ways to me:
> 
> 1) Run one gimple unrolling pass just before IVOPTS, probably using
>    the same gate for IVOPTS.  The unrolling factor is computed by
>    the same method as that of RTL unrolling.  But this sounds very
>    like "most wanted gimple unrolling" which is what we want to avoid.
> 
>    The positive aspect here is what IVOPTS faces is already one unrolled
>    loop, it can see the whole picture and get the optimal IV set.  The
>    downside/question is how we position these gimple unrolling and RTL
>    unrolling passes, whether we still need RTL unrolling.  If no, it's
>    doubtable that one gimple unrolling can fully replace the RTL
>    unrolling probably lacking some actual target information/instructions.
>    If yes, it's still possible to have inconsistent unrolling factors
>    between what IVOPTS optimizes on and what late RTL unrolling pass
>    ends with.
> 
> 2) Make IVOPTS determine the unrolling factor by considering the
>    reg-offset addressing (D-form), unroll the loop and do the remainings.

Yes, that's what I meant.

>    I don't think you referred to this though.  Since comparing to
>    reg-reg addressing, reg-offset addressing can only save some bumps,
>    it's too weak to be one deciding factor of unrolling factor.  Unlike
>    vectorizer or modulo scheduling, it's more likely there are more
>    important factors for unrolling factor computation than this one.

But you _are_ using that bump cost to compute the unroll estimate
(and IIRC you are doing the D-form IV selection then).

>    For IVOPTS, it's more like that it doesn't care what the unrolling
>    factor should be, but it needs to know what the unrolling factor
>    would be, then do optimal IV selection based on that.  So it's not
>    good to get it to decide the unrolling factor.

Well, currently with the stupid RTL unroller you will know it will
be unrolled 8 times unless it is cold or doesn't iterate enough.
There's not much costing going on in the RTL unroller (unless you
go wild with target hooks).

Richard.


More information about the Gcc-patches mailing list