[PATCH 1/4] unroll: Add middle-end unroll factor estimation

Tue Jan 26 10:53:23 GMT 2021

On Tue, 26 Jan 2021, Kewen.Lin wrote:

> Hi Segher/Richard B./Richard S.,
> 
> Many thanks for your all helps and comments on this!
> 
> on 2021/1/25 下午3:56, Richard Biener wrote:
> > On Fri, 22 Jan 2021, Segher Boessenkool wrote:
> > 
> >> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote:
> >>> On Thu, 21 Jan 2021, Segher Boessenkool wrote:
> >>>> What is holding up this patch still?  Ke Wen has pinged it every month
> >>>> since May, and there has still not been a review.
> >>
> >> Richard Sandiford wrote:
> >>> FAOD (since I'm on cc:), I don't feel qualified to review this.
> >>> Tree-level loop stuff isn't really my area.
> >>
> >> And Richard Biener wrote:
> >>> I don't like it, it feels wrong but I don't have a good suggestion
> >>> that had positive feedback.  Since a reviewer / approver is indirectly
> >>> responsible for at least the design I do not want to ack this patch.
> >>> Bin made forward progress on the other parts of the series but clearly
> >>> there's somebody missing with the appropriate privileges who feels
> >>> positive about the patch and its general direction.
> >>>
> >>> Sorry to be of no help here.
> >>
> >> How unfortunate :-(
> >>
> >> So, first off, this will then have to work for next stage 1 to make any
> >> progress.  Rats.
> >>
> >> But what could have been done differently that would have helped?  Of
> >> course Ke Wen could have written a better patch (aka one that is more
> >> acceptable); either of you could have made your current replies earlier,
> >> so that it is clear help needs to be sought elsewhere; and I could have
> >> pushed people earlier, too.  No one really did anything wrong, I'm not
> >> seeking who to blame, I'm just trying to find out how to prevent
> >> deadlocks like this in the future (where one party waits for replies
> >> that will never come).
> >>
> >> Is it just that we have a big gaping hole in reviewers with experience
> >> in such loop optimisations?
> > 
> > May be.  But what I think is the biggest problem is that we do not
> > have a good way to achieve what the patch tries (if you review the
> > communications you'll see many ideas tossed around) first and foremost
> > because IV selection is happening early on GIMPLE and unrolling
> > happens late on RTL.  Both need a quite accurate estimate of costs
> > but unrolling has an ever harder time than IV selection where we've
> > got along with throwing dummy RTL at costing functions.
> > 
> 
> Yeah, exactly.
> 
> > IMHO the patch is the wrong "start" to try fixing the issue and my
> > fear is that wiring this kind of "features" into the current
> > (fundamentally broken) state will make it much harder to rework
> > that state without introducing regressions on said features (I'm
> > there with trying to turn the vectorizer upside down - for three
> > years now, struggling to not regress any of the "features" we've
> > accumulated for various targets where most of them feel a
> > "bolted-on" rather than well-designed ;/).
> > 
> 
> OK, understandable.
> 
> > I think IV selection and unrolling (and scheduling FWIW) need to move
> > closer together.  I do not have a good idea how that can work out
> > though but I very much believe that this "most wanted" GIMPLE unroller
> > will not be a good way of progressing here.  Maybe taking the bullet
> > and moving IV selection back to RTL is the answer.
> > 
> 
> I haven't looked into loop-iv.c, but IVOPTS in gimple can leverage
> SCEV analysis for iv detection, if moving it to RTL, it could be
> very heavier to detect the full set there?
> 
> > For a "short term" solution I still think that trying to perform
> > unrolling and IV selection (for the D-form case you're targeting)
> > at the same time is a better design, even if it means complicating
> > the IV selection pass (and yeah, it'll still be at GIMPLE and w/o
> > any good idea about scheduling).  There are currently 20+ GIMPLE
> > optimization passes and 10+ RTL optimization passes between
> > IV selection and unrolling, the idea that you can have transform
> > decision and transform apply this far apart looks scary.
> > 
> 
> I have some questions in mind for this part, for "perform unrolling
> and IV selection at the same time", it can be interpreted to two
> different implementation ways to me:
> 
> 1) Run one gimple unrolling pass just before IVOPTS, probably using
>    the same gate for IVOPTS.  The unrolling factor is computed by
>    the same method as that of RTL unrolling.  But this sounds very
>    like "most wanted gimple unrolling" which is what we want to avoid.
> 
>    The positive aspect here is what IVOPTS faces is already one unrolled
>    loop, it can see the whole picture and get the optimal IV set.  The
>    downside/question is how we position these gimple unrolling and RTL
>    unrolling passes, whether we still need RTL unrolling.  If no, it's
>    doubtable that one gimple unrolling can fully replace the RTL
>    unrolling probably lacking some actual target information/instructions.
>    If yes, it's still possible to have inconsistent unrolling factors
>    between what IVOPTS optimizes on and what late RTL unrolling pass
>    ends with.
> 
> 2) Make IVOPTS determine the unrolling factor by considering the
>    reg-offset addressing (D-form), unroll the loop and do the remainings.

Yes, that's what I meant.

>    I don't think you referred to this though.  Since comparing to
>    reg-reg addressing, reg-offset addressing can only save some bumps,
>    it's too weak to be one deciding factor of unrolling factor.  Unlike
>    vectorizer or modulo scheduling, it's more likely there are more
>    important factors for unrolling factor computation than this one.

But you _are_ using that bump cost to compute the unroll estimate
(and IIRC you are doing the D-form IV selection then).

>    For IVOPTS, it's more like that it doesn't care what the unrolling
>    factor should be, but it needs to know what the unrolling factor
>    would be, then do optimal IV selection based on that.  So it's not
>    good to get it to decide the unrolling factor.

Well, currently with the stupid RTL unroller you will know it will
be unrolled 8 times unless it is cold or doesn't iterate enough.
There's not much costing going on in the RTL unroller (unless you
go wild with target hooks).

Richard.